status and perceptions of luaproc

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

status and perceptions of luaproc

Mason Bogue
http://www.inf.puc-rio.br/~roberto/docs/ry08-05.pdf

This library -- which seems to show that Lua can handle concurrent
programming quite well -- doesn't appear to be maintained. That's
unfortunate, because I don't know if the other solutions available
replicate the advantages of luaproc, viz. performance and simplicity.
The major alternative, Lua Lanes, doesn't seem to be terrifically fast
and seems to have been in bugfix mode for two years (
https://github.com/LuaLanes/lanes/commits/master ), while the
commonly-accepted solution is to use llthreads with 0mq, and I have a
hard time believing either of these even comes close to luaproc,
performance-wise, considering that Skyrme et al went as far as to draw
comparisons to Erlang while spinning up myriad fibers; Lanes' green
threads are much heavier than fibers and its scheduler was good but
not spectacular the last time I checked, whereas llthreads is 1:1, not
M:N (thus a very different model).

I noticed this when I looked at luaproc a month ago and submitted a pr
that fixes creating threads from pure Lua functions. Nobody said
anything and I later realized that two other pull requests (of
admittedly variable code quality) have been languishing for even
longer. I'm not sure if Dr. Skyrme is simply away or doesn't want
contributions via github.

I'd also like to know the community's opinion on luaproc, and whether
it's features are desirable or generally considered to be superseded
by something of which I'm not aware. I could take over maintainership
of the library resp. bugfixes and minor PRs (e.g. v5.3 / v5.4
compatibility), but I don't have the time or energy to break new
ground and would want to pass it off to someone else if development
takes off for some crazy reason. One exception: it is missing yield(),
kind of convenient for any fiber library, which I could add easily
enough.

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Nagaev Boris
On Sat, Jul 25, 2015 at 12:40 PM, Mason Bogue <[hidden email]> wrote:

> http://www.inf.puc-rio.br/~roberto/docs/ry08-05.pdf
>
> This library -- which seems to show that Lua can handle concurrent
> programming quite well -- doesn't appear to be maintained. That's
> unfortunate, because I don't know if the other solutions available
> replicate the advantages of luaproc, viz. performance and simplicity.
> The major alternative, Lua Lanes, doesn't seem to be terrifically fast
> and seems to have been in bugfix mode for two years (
> https://github.com/LuaLanes/lanes/commits/master ), while the
> commonly-accepted solution is to use llthreads with 0mq, and I have a
> hard time believing either of these even comes close to luaproc,
> performance-wise, considering that Skyrme et al went as far as to draw
> comparisons to Erlang while spinning up myriad fibers; Lanes' green
> threads are much heavier than fibers and its scheduler was good but
> not spectacular the last time I checked, whereas llthreads is 1:1, not
> M:N (thus a very different model).

(offtopic)

There is also llthreads2, `llthreads` library rewritten without
`LuaNativeObjects` code generator. It seems to be maintained.

https://github.com/moteus/lua-llthreads2

> I noticed this when I looked at luaproc a month ago and submitted a pr
> that fixes creating threads from pure Lua functions. Nobody said
> anything and I later realized that two other pull requests (of
> admittedly variable code quality) have been languishing for even
> longer. I'm not sure if Dr. Skyrme is simply away or doesn't want
> contributions via github.
>
> I'd also like to know the community's opinion on luaproc, and whether
> it's features are desirable or generally considered to be superseded
> by something of which I'm not aware. I could take over maintainership
> of the library resp. bugfixes and minor PRs (e.g. v5.3 / v5.4
> compatibility), but I don't have the time or energy to break new
> ground and would want to pass it off to someone else if development
> takes off for some crazy reason. One exception: it is missing yield(),
> kind of convenient for any fiber library, which I could add easily
> enough.
>



--


Best regards,
Boris Nagaev

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

phlnc8
In reply to this post by Mason Bogue
On Sat, Jul 25, 2015 at 5:40 AM, Mason Bogue <[hidden email]> wrote:
> http://www.inf.puc-rio.br/~roberto/docs/ry08-05.pdf
>
> This library -- which seems to show that Lua can handle concurrent
> programming quite well -- doesn't appear to be maintained. That's
> unfortunate, because I don't know if the other solutions available
> replicate the advantages of luaproc, viz. performance and simplicity.

+1

It looks like the thesis has been completed, the paper has been
published, and that's it.

I also think that luaproc approach is _really_ promising. I would be
interested in having Roberto's position on this.  Does he consider it
a one-shot student work?  Or, at the other end of the spectrum, are
there minions in the Lua secret labs working on a future "parallel
programming" addition to Lua 5.4?  :-)

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Alexandre Skyrme-2
In reply to this post by Mason Bogue
Hi there.

Mason Bogue <scythe+lua <at> ortsz.com> writes:
> I noticed this when I looked at luaproc a month ago and submitted a pr
> that fixes creating threads from pure Lua functions. Nobody said
> anything and I later realized that two other pull requests (of
> admittedly variable code quality) have been languishing for even
> longer. I'm not sure if Dr. Skyrme is simply away or doesn't want
> contributions via github.

Dr. Skyrme was kind of away indeed, finishing his PhD thesis. However,
now that is over with, we intend to resume maintaining and evolving
luaproc (and surely enough, we do want contributions via GitHub). As
soon as I get my act together I'll check the patches you submitted and
get back to you.

> I'd also like to know the community's opinion on luaproc, and whether
> it's features are desirable or generally considered to be superseded
> by something of which I'm not aware. I could take over maintainership
> of the library resp. bugfixes and minor PRs (e.g. v5.3 / v5.4
> compatibility), but I don't have the time or energy to break new
> ground and would want to pass it off to someone else if development
> takes off for some crazy reason. One exception: it is missing yield(),
> kind of convenient for any fiber library, which I could add easily
> enough.

Of course my opinion is biased, but we've been studying concurrency in
scripting languages for a quite a while now and we still feel luaproc
implements a sound model for safe concurrent programming with good
performance.

Regards,
--Alexandre


Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

彭 书呆
In reply to this post by Mason Bogue
在 2015/7/25 17:40, Mason Bogue 写道:

> http://www.inf.puc-rio.br/~roberto/docs/ry08-05.pdf
>
> This library -- which seems to show that Lua can handle concurrent
> programming quite well -- doesn't appear to be maintained. That's
> unfortunate, because I don't know if the other solutions available
> replicate the advantages of luaproc, viz. performance and simplicity.
> The major alternative, Lua Lanes, doesn't seem to be terrifically fast
> and seems to have been in bugfix mode for two years (
> https://github.com/LuaLanes/lanes/commits/master ), while the
> commonly-accepted solution is to use llthreads with 0mq, and I have a
> hard time believing either of these even comes close to luaproc,
> performance-wise, considering that Skyrme et al went as far as to draw
> comparisons to Erlang while spinning up myriad fibers; Lanes' green
> threads are much heavier than fibers and its scheduler was good but
> not spectacular the last time I checked, whereas llthreads is 1:1, not
> M:N (thus a very different model).
>

after reading the thesis, I recall a library called 'ltask'
from this thread[1] not very long ago.
you might want to check that.
it uses an M:N scheduler and channels.
I feel it similar to 'luaproc'.

[1] http://lua-users.org/lists/lua-l/2015-04/msg00441.html

--
the nerdy Peng / 书呆彭 / Sent from Thunderbird



Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

彭 书呆
In reply to this post by Mason Bogue
在 2015/7/25 17:40, Mason Bogue 写道:

> http://www.inf.puc-rio.br/~roberto/docs/ry08-05.pdf
>
> This library -- which seems to show that Lua can handle concurrent
> programming quite well -- doesn't appear to be maintained. That's
> unfortunate, because I don't know if the other solutions available
> replicate the advantages of luaproc, viz. performance and simplicity.
> The major alternative, Lua Lanes, doesn't seem to be terrifically fast
> and seems to have been in bugfix mode for two years (
> https://github.com/LuaLanes/lanes/commits/master ), while the
> commonly-accepted solution is to use llthreads with 0mq, and I have a
> hard time believing either of these even comes close to luaproc,
> performance-wise, considering that Skyrme et al went as far as to draw
> comparisons to Erlang while spinning up myriad fibers; Lanes' green
> threads are much heavier than fibers and its scheduler was good but
> not spectacular the last time I checked, whereas llthreads is 1:1, not
> M:N (thus a very different model).
>

after reading the thesis, I recall a library called 'ltask'
from this thread[1] not very long ago.
you might want to check that.
it uses an M:N scheduler and channels.
I feel it similar to 'luaproc'.

[1] http://lua-users.org/lists/lua-l/2015-04/msg00441.html

--
the nerdy Peng / 书呆彭 / Sent from Thunderbird




Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Roberto Ierusalimschy
In reply to this post by phlnc8
> I also think that luaproc approach is _really_ promising. I would be
> interested in having Roberto's position on this.  Does he consider it
> a one-shot student work?  Or, at the other end of the spectrum, are
> there minions in the Lua secret labs working on a future "parallel
> programming" addition to Lua 5.4?  :-)

Sorry for not answering that. (I was without email during the last
week.)  I think Skyrme's message answered that. (But I would not call
him a "minion" :-)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Rena
In reply to this post by phlnc8

On Jul 27, 2015 12:43 PM, "phlnc8" <[hidden email]> wrote:
>
> On Sat, Jul 25, 2015 at 5:40 AM, Mason Bogue <[hidden email]> wrote:
> > http://www.inf.puc-rio.br/~roberto/docs/ry08-05.pdf
> >
> > This library -- which seems to show that Lua can handle concurrent
> > programming quite well -- doesn't appear to be maintained. That's
> > unfortunate, because I don't know if the other solutions available
> > replicate the advantages of luaproc, viz. performance and simplicity.
>
> +1
>
> It looks like the thesis has been completed, the paper has been
> published, and that's it.
>
> I also think that luaproc approach is _really_ promising. I would be
> interested in having Roberto's position on this.  Does he consider it
> a one-shot student work?  Or, at the other end of the spectrum, are
> there minions in the Lua secret labs working on a future "parallel
> programming" addition to Lua 5.4?  :-)
>

I really hope a future Lua version will provide (or allow a module to provide in an unmodified Lua) true multithreading. With multi-core processors being so common now, not being able to make use of them is rather limiting. Even Python, the "batteries, charger and solar cells included" language, seems to have missed this boat - it provides "threading", but only one thread can actually run at a time (global interpreter state is locked while executing) and it doesn't sound like it'd be easy to fix without breaking other things.

I recall Lua provides some macros lua_lock and lua_unlock that an implementation can override (by default they do nothing) to allow thread-safe access. Maybe future versions could include e.g. LUA_USE_PTHREAD, LUA_USE_WINAPI_THREAD, etc, enabled by default on appropriate platforms, which would implement a basic mutex? Or would that add too much overhead to code that isn't using threads? Anyway it sounds like those macros would do much the same as Python's global lock, i.e. no actual parallel execution. So maybe not even worth doing?

The thread libraries I've used run a separate Lua state in each thread and provide some type of communication channel between them. This is a nice simple model, but maybe not the most efficient. In particular, while it's usually possible to pass most types of Lua objects between threads, tables can only be recursively copied, functions have to be dumped and recompiled (and can't have upvalues), and userdata can't safely be passed around unless it's designed to be referenced by multiple Lua states.

Even just a module that uses separate Lua states for each thread but can work around some of those limitations (which I suspect would require a modified Lua), e.g. copying tables/functions/userdata directly between states without the need for slow dumping/reloading or recursive copying, would be great. If it were possible for the same object to be referenced by multiple states, even shared tables and functions with upvalues should be possible - mutexes could be attached to them automatically when they're made available to another thread, so you avoid the problem that some coffee-inspired languages have, wasting time and memory with locking and unlocking every object even when only one thread uses them.

The holy grail of course would be true parallel execution, using multiple CPU cores, within one Lua state. It's just unfortunate that threading is one of those annoying things that's extremely difficult to do correctly, but extremely easy to do in a way that seems to work but contains a subtle, critical flaw that only manifests in production at 16:53 on a Friday, just once, and then can't be reproduced until everyone has dismissed it as cosmic ray induced error and given up looking for the bug.

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Javier Guerra Giraldez
On Fri, Aug 7, 2015 at 8:51 AM, Rena <[hidden email]> wrote:
> Even Python, the "batteries, charger and solar cells included" language,
> seems to have missed this boat - it provides "threading", but only one
> thread can actually run at a time (global interpreter state is locked while
> executing) and it doesn't sound like it'd be easy to fix without breaking
> other things.

that's why there's also the "multiprocessing" module, that uses
multiple processes,each with their own interpreter stack and GIL
(sound familiar?).  It's been there for several years now, and it's
usually regarding ad "the replacement" for threads.


> I recall Lua provides some macros lua_lock and lua_unlock that an
> implementation can override (by default they do nothing) to allow
> thread-safe access. Maybe future versions could include e.g.
> LUA_USE_PTHREAD, LUA_USE_WINAPI_THREAD, etc, enabled by default on
> appropriate platforms, which would implement a basic mutex? Or would that
> add too much overhead to code that isn't using threads? Anyway it sounds
> like those macros would do much the same as Python's global lock, i.e. no
> actual parallel execution. So maybe not even worth doing?

exactly.  on top of that, even if those locks protect the LuaVM
integrity, by incorporing multithreading into the execution model, you
inherit all the fun things, like having to care about other threads
meddling with your values while you're modifying them.  say hello to
locks...

and it's terribly inefficient: in multicore hardware, most simple
locks are by nature system-wide, having to propagate to _every_ core
in the system.  By experience, even in a high-memory-bandwidth system
like modern Xeon familes, there's a low maximum number of inter-core
messages per second.  Think like a core can do several thousand
operations in the same time as propagating just one lock.



> The thread libraries I've used run a separate Lua state in each thread and
> provide some type of communication channel between them. This is a nice
> simple model, but maybe not the most efficient.

Au contraire, passing messages is far more efficient than sharing
variables; because you only incur in one inter-process fault, instead
of trashing caches on almost each access of shared variables.


> Even just a module that uses separate Lua states for each thread but can
> work around some of those limitations (which I suspect would require a
> modified Lua), e.g. copying tables/functions/userdata directly between
> states without the need for slow dumping/reloading or recursive copying,
> would be great.

I'm succesfuly faking a similar thing using FFI objects on LuaJIT; by
manually using mmap() to allocate memory it's easy to get them shared
between processes.  The consistency is "just" a matter of discipline.

but even then, i have to be _very_ careful to make each message count,
or performance is trashed.

on a related note, i'm using processes (spawned with fork()) and not
threads because if I create many LuaJIT VMs on the same address space,
they all limit to the same "low 2G" addresses, making it unusable for
anything beyond a hundred threads.  with enough RAM, a thousand
processes is quite doable.


> The holy grail of course would be true parallel execution, using multiple
> CPU cores, within one Lua state. It's just unfortunate that threading is one
> of those annoying things that's extremely difficult to do correctly, but
> extremely easy to do in a way that seems to work but contains a subtle,
> critical flaw that only manifests in production at 16:53 on a Friday, just
> once, and then can't be reproduced until everyone has dismissed it as cosmic
> ray induced error and given up looking for the bug.

with care, it's not so hard to get multithreading consistency; but
making it perform well? that's a real challenge.


--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Rena

On Aug 7, 2015 10:28 AM, "Javier Guerra Giraldez" <[hidden email]> wrote:
>
> On Fri, Aug 7, 2015 at 8:51 AM, Rena <[hidden email]> wrote:
> > Even Python, the "batteries, charger and solar cells included" language,
> > seems to have missed this boat - it provides "threading", but only one
> > thread can actually run at a time (global interpreter state is locked while
> > executing) and it doesn't sound like it'd be easy to fix without breaking
> > other things.
>
> that's why there's also the "multiprocessing" module, that uses
> multiple processes,each with their own interpreter stack and GIL
> (sound familiar?).  It's been there for several years now, and it's
> usually regarding ad "the replacement" for threads.

Yes, I thought about that but decided not to mention it since it's running multiple processes, not multiple threads in a process. Still a useful tool, but not quite the same thing. No shared variables for example.

> and it's terribly inefficient: in multicore hardware, most simple
> locks are by nature system-wide, having to propagate to _every_ core
> in the system.  By experience, even in a high-memory-bandwidth system
> like modern Xeon familes, there's a low maximum number of inter-core
> messages per second.  Think like a core can do several thousand
> operations in the same time as propagating just one lock.

Ouch. It's really necessary to allocate a hardware lock every time? Not enough to use an atomic test-and-set instruction on a per-shared-object lock flag?

> > The thread libraries I've used run a separate Lua state in each thread and
> > provide some type of communication channel between them. This is a nice
> > simple model, but maybe not the most efficient.
>
> Au contraire, passing messages is far more efficient than sharing
> variables; because you only incur in one inter-process fault, instead
> of trashing caches on almost each access of shared variables.

Right, I meant the Lua implementations aren't very efficient, because nothing is actually shared between the states; to pass strings, the receiving state has to make its own copy (wasted memory) and hash it (wasted time) and the sender still has to garbage-collect its own copy (wasted time); to pass tables requires recursively copying each element (slow and error-prone - watch out for cycles, don't forget to copy the metatables too, and what do you do if one of them can't be copied due to upvalues?); to pass functions requires converting to a bytecode string and back (and some can't be copied); to pass userdata requires the underlying C code to be aware of the possibility (e.g. use reference counting outside of Lua to track how many states have an object - many libraries destroy the native object in the __gc metamethod, which won't work well if multiple states reference it).

If Lua provided some methods to deal with these cases, perhaps by allowing objects to be owned by multiple states, giving the __gc metamethod a "remaining ref count" parameter, letting states share memory blocks (especially for immutable objects such as strings) or even just providing a lua_xmove that can copy between independent states (which could probably be done more efficiently by a built-in function that can poke at the internals than by an external module), I'm sure a lot of the overhead could be eliminated.

> --
> Javier
>

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Javier Guerra Giraldez
On Fri, Aug 7, 2015 at 12:18 PM, Rena <[hidden email]> wrote:

>> and it's terribly inefficient: in multicore hardware, most simple
>> locks are by nature system-wide, having to propagate to _every_ core
>> in the system.  By experience, even in a high-memory-bandwidth system
>> like modern Xeon familes, there's a low maximum number of inter-core
>> messages per second.  Think like a core can do several thousand
>> operations in the same time as propagating just one lock.
>
> Ouch. It's really necessary to allocate a hardware lock every time? Not
> enough to use an atomic test-and-set instruction on a per-shared-object lock
> flag?


AFAIK, there's no such thing as a "hardware lock" in common x86 chips.
lately, i've been doing most of my inter-process synchronization via a
shared small integer.  the special case of single-producer,
single-consumer fixed-size ring buffer can be safe even without
explicit locking.

Still, when two different cores hold the same address in their
respective caches, as soon as one writes there, the other one has to
be notified to invalidate its copy.  note that the other core hasn't
even read the flag yet, just invalidated a single cache line.

Just last night, i succeeded in passing 12x10^6 packets per second
(12Mpps) from one process to another, and that's after almost a week
stuck in a very variable 3-6Mpps.  To get there, i had to ensure to
gather as many packets as possible before writing a single integer to
shared memory.  The limitation I'm hitting is just around 55,000
interprocess cache trashings per second.

That same code, in the same machine moves around 58Mpps when keeping
single core!  (but doing heavy processing on each packet makes it
worthwhile to recruit more cores, even if it's so expensive to
communicate with them)

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Coda Highland
On Fri, Aug 7, 2015 at 10:33 AM, Javier Guerra Giraldez
<[hidden email]> wrote:

> On Fri, Aug 7, 2015 at 12:18 PM, Rena <[hidden email]> wrote:
>>> and it's terribly inefficient: in multicore hardware, most simple
>>> locks are by nature system-wide, having to propagate to _every_ core
>>> in the system.  By experience, even in a high-memory-bandwidth system
>>> like modern Xeon familes, there's a low maximum number of inter-core
>>> messages per second.  Think like a core can do several thousand
>>> operations in the same time as propagating just one lock.
>>
>> Ouch. It's really necessary to allocate a hardware lock every time? Not
>> enough to use an atomic test-and-set instruction on a per-shared-object lock
>> flag?
>
>
> AFAIK, there's no such thing as a "hardware lock" in common x86 chips.
> lately, i've been doing most of my inter-process synchronization via a
> shared small integer.  the special case of single-producer,
> single-consumer fixed-size ring buffer can be safe even without
> explicit locking.

Oh, there quite is:

http://x86.renejeschke.de/html/file_module_x86_id_159.html

Most of the time this is the mechanism used for implementing mutexes.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Javier Guerra Giraldez
On Fri, Aug 7, 2015 at 12:48 PM, Coda Highland <[hidden email]> wrote:
> Oh, there quite is:
>
> http://x86.renejeschke.de/html/file_module_x86_id_159.html

ah, ok.  yes, this is avoidable in some cases, like CAS and simple
atomics.  still, the time penalty is the invalidation request that
must be propagated to interested cores. (fortunately, this is only
sent to those other cores that actually hold the relevant address in
cache)

the numbers i've shared are totally "lock- and cas-free"  :-)

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Coda Highland
On Fri, Aug 7, 2015 at 10:56 AM, Javier Guerra Giraldez
<[hidden email]> wrote:

> On Fri, Aug 7, 2015 at 12:48 PM, Coda Highland <[hidden email]> wrote:
>> Oh, there quite is:
>>
>> http://x86.renejeschke.de/html/file_module_x86_id_159.html
>
> ah, ok.  yes, this is avoidable in some cases, like CAS and simple
> atomics.  still, the time penalty is the invalidation request that
> must be propagated to interested cores. (fortunately, this is only
> sent to those other cores that actually hold the relevant address in
> cache)
>
> the numbers i've shared are totally "lock- and cas-free"  :-)
>

Unfortunately, the cores that hold the affected region in cache are
your own other threads -- in other words, the very things you're most
concerned about the performance of.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Coda Highland
On Fri, Aug 7, 2015 at 10:59 AM, Coda Highland <[hidden email]> wrote:

> On Fri, Aug 7, 2015 at 10:56 AM, Javier Guerra Giraldez
> <[hidden email]> wrote:
>> On Fri, Aug 7, 2015 at 12:48 PM, Coda Highland <[hidden email]> wrote:
>>> Oh, there quite is:
>>>
>>> http://x86.renejeschke.de/html/file_module_x86_id_159.html
>>
>> ah, ok.  yes, this is avoidable in some cases, like CAS and simple
>> atomics.  still, the time penalty is the invalidation request that
>> must be propagated to interested cores. (fortunately, this is only
>> sent to those other cores that actually hold the relevant address in
>> cache)
>>
>> the numbers i've shared are totally "lock- and cas-free"  :-)
>>
>
> Unfortunately, the cores that hold the affected region in cache are
> your own other threads -- in other words, the very things you're most
> concerned about the performance of.
>
> /s/ Adam

That is to say, that's why you should minimize the number of locks you
have to use, which is exactly what batching stuff up and signaling
does, which is exactly what you're doing. This statement was meant in
support of your results, not opposition to it. :P

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Javier Guerra Giraldez
In reply to this post by Coda Highland
On Fri, Aug 7, 2015 at 12:59 PM, Coda Highland <[hidden email]> wrote:
> Unfortunately, the cores that hold the affected region in cache are
> your own other threads -- in other words, the very things you're most
> concerned about the performance of.


exactly.  that's why it's so important to keep shared memory at a
minimum, and only update it when you've accumulated a significant
chunk of work.  IOW: please, don't share the Lua VM!

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

William Ahern
In reply to this post by Rena
On Fri, Aug 07, 2015 at 09:51:30AM -0400, Rena wrote:
<snip>
> I really hope a future Lua version will provide (or allow a module to
> provide in an unmodified Lua) true multithreading. With multi-core
> processors being so common now, not being able to make use of them is
> rather limiting. Even Python, the "batteries, charger and solar cells
> included" language, seems to have missed this boat - it provides
> "threading", but only one thread can actually run at a time (global
> interpreter state is locked while executing) and it doesn't sound like it'd
> be easy to fix without breaking other things.
<snip>
> The holy grail of course would be true parallel execution, using multiple
> CPU cores, within one Lua state. It's just unfortunate that threading is
> one of those annoying things that's extremely difficult to do correctly,
> but extremely easy to do in a way that seems to work but contains a subtle,
> critical flaw that only manifests in production at 16:53 on a Friday, just
> once, and then can't be reproduced until everyone has dismissed it as
> cosmic ray induced error and given up looking for the bug.

  If you were to design a new language today, he said, you would make it
  without mutable (changeable) objects, or with limited mutability.

  -- Guido on Python, discussing the GIL. See https://lwn.net/Articles/651967/

Shared, mutable memory and scalable multithreading don't mix. If you're
using a language (like Lua or Python) fundamentally based on mutable data
structures, you're going to do best with message passing. Trying to tweak
the language to be multi-thread safe will destroy performance all around.

Some groups (e.g. PyPy) have experimented with transactional memory, but the
amount of complexity needed (in amount of code, in number of
pipeline-stalling branches and atomic operations) to transparently give
transactional semantics to operations on complex primitive data structures
saps performance. It's not going to be much better, if at all, than message
passing. And it will effect performance even when you don't need those
semantics, which is most of the time. For example, it will dramatically
complicate the garbage collector, especially if you to preserve performance.
Lest we forget, the PUC team removed the generational collector, which
theretically should have given significantly better performance, because the
code complexity relative to the existing, simpler collector overwhelmed the
algorithmic gains.

But message passing is better for more reasons than just performance.

1) When you're dealing with languages with mutable data, bugs are easier to
introduce and more difficult to discover. This is compounded many fold when
you add in shared-memory parallelism. Message passing moves you toward the
ideal of immutable data, with concomitantly fewer bugs.

2) With message passing designs you can more easily move to multi-server
frameworks.

It's also a good idea to minimize mutability even when you're not
multithreading. RAII patterns reduce the complexity of code by reducing the
number of places and ways objects can mutate, thus reducing the number of
possible program states and number of failure paths.

Basically, you should strive for immutability all around. There's nothing
inelegant about message passing except in the most simplistic of scenarios.

Reply | Threaded
Open this post in threaded view
|

Re: status and perceptions of luaproc

Rena

On Aug 7, 2015 3:42 PM, "William Ahern" <[hidden email]> wrote:
>
>   If you were to design a new language today, he said, you would make it
>   without mutable (changeable) objects, or with limited mutability.
>
>   -- Guido on Python, discussing the GIL. See https://lwn.net/Articles/651967/
>

Well, that's pretty convincing. 🐺 Though I still feel like Lua could benefit from a "copy string from other state" function that can avoid the duplication and rehashing. (Does POSIX provide a thread-safe reference-counted memory block API?) Same for tables and functions, assuming it could be done any more efficiently on the "inside" compared to using the public APIs. Faster/less wasteful passing of objects (especially strings) between independent states within a process would be quite helpful for message passing.

Reply | Threaded
Open this post in threaded view
|

Serializing Lua functions (was Re: status and perceptions of luaproc)

Sean Conner
In reply to this post by Rena
It was thus said that the Great Rena once stated:
>
> The thread libraries I've used run a separate Lua state in each thread and
> provide some type of communication channel between them. This is a nice
> simple model, but maybe not the most efficient. In particular, while it's
> usually possible to pass most types of Lua objects between threads, tables
> can only be recursively copied, functions have to be dumped and recompiled
> (and can't have upvalues), and userdata can't safely be passed around
> unless it's designed to be referenced by multiple Lua states.

  Actually, you can serialize Lua functions with upvalues.  I've done it
(just not shown the code).  In fact, I was able to serialize quite a bit,
including tables with circular references.  The only things I couldn't
handle were coroutines, user data and Lua functions written in C [1].

  As it stands, the function serialization is only good across the same
architecture (everything else I could serialize was CPU independent) and I
started working on fixing that by diving into the "dump format" used by Lua.

  -spc (I kind of lost interest when I realized I didn't have a need for
        this)

[1] At least not directly.  I could "serialize" the function io.write(),
        but I did so by reference, assuming the receiver side could resolve
        such a reference.  I could even handle known userdata types like
        io.stdout by again, using references. [2]

[2] Strings basically.  I just passed the name along.


Reply | Threaded
Open this post in threaded view
|

Re: Serializing Lua functions (was Re: status and perceptions of luaproc)

Daurnimator
On 8 August 2015 at 08:38, Sean Conner <[hidden email]> wrote:

> It was thus said that the Great Rena once stated:
>>
>> The thread libraries I've used run a separate Lua state in each thread and
>> provide some type of communication channel between them. This is a nice
>> simple model, but maybe not the most efficient. In particular, while it's
>> usually possible to pass most types of Lua objects between threads, tables
>> can only be recursively copied, functions have to be dumped and recompiled
>> (and can't have upvalues), and userdata can't safely be passed around
>> unless it's designed to be referenced by multiple Lua states.
>
>   Actually, you can serialize Lua functions with upvalues.  I've done it
> (just not shown the code).  In fact, I was able to serialize quite a bit,
> including tables with circular references.  The only things I couldn't
> handle were coroutines, user data and Lua functions written in C [1].
>
>   As it stands, the function serialization is only good across the same
> architecture (everything else I could serialize was CPU independent) and I
> started working on fixing that by diving into the "dump format" used by Lua.
>
>   -spc (I kind of lost interest when I realized I didn't have a need for
>         this)
>
> [1]     At least not directly.  I could "serialize" the function io.write(),
>         but I did so by reference, assuming the receiver side could resolve
>         such a reference.  I could even handle known userdata types like
>         io.stdout by again, using references. [2]
>
> [2]     Strings basically.  I just passed the name along.
>
>

The lua 'full' serialisation project was 'Pluto' this has been
succeeded (for 5.2+) by Eris.
https://github.com/fnuecke/eris
They have the concept of "special persistence". see
https://github.com/fnuecke/eris#special-persistence
Which allows you to add a '__persist' field to a metatable.

I think this is a good approach, and have been meaning to add such a
field to my lua libraries.

The other interesting thing they bring up is duplication (and
serialisation) of coroutines.
This unfortunately requires modifications to lua. So if there was a
single change to come out of this discussion
merged into lua, I'd love to see an api to do this without modifications.

12