LuaThread sources?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

LuaThread sources?

Sven Olsen
I've started to wonder what it would look like to mod the Lua VM to allow multiple threads to run in parallel while referencing the same global Lua state.  I'm imagining an approach that basically just boils down to slipping some mutex locks into the VM to avoid possible memory corruption when two threads try to write to the same table at the same time, which feels like it might be more or less sufficient to allow a preemptive shared-state multithreading mod that would be relatively safe.

More generally, I'm trying to think through what kinds of things might go wrong if one were to just create a collection of coroutines using new_thread(), and then run them in parallel using C multi threading.  Simultaneous table writes, string creation, or garbage collection invocation seem like the most obvious things that could create real trouble, but there's probably some other cases I'm overlooking.

As best as I can tell, there's been one prototype of this kind of Lua mod, Diego Nehab's LuaThread.  I can find the documentation pages on the wayback machine, but, not Diego's source code.

Does anyone have a copy of Diego's LuaThread sources that they could share?

Alternatively, are there any other preemptive, shared state Lua mods out there that people could point me at?

Thanks,

-Sven



Reply | Threaded
Open this post in threaded view
|

Re: LuaThread sources?

Sean Conner
It was thus said that the Great Sven Olsen once stated:
> I've started to wonder what it would look like to mod the Lua VM to allow
> multiple threads to run in parallel while referencing the same global Lua
> state.  I'm imagining an approach that basically just boils down to
> slipping some mutex locks into the VM to avoid possible memory corruption
> when two threads try to write to the same table at the same time, which
> feels like it might be more or less sufficient to allow a preemptive
> shared-state multithreading mod that would be relatively safe.

  In theory, all you need to do is provide definitions for lua_lock() and
lua_unlock() (which are currently #defined to be nothing) and recompile Lua.

  -spc

Reply | Threaded
Open this post in threaded view
|

Re: LuaThread sources?

Sven Olsen
  In theory, all you need to do is provide definitions for lua_lock() and
lua_unlock() (which are currently #defined to be nothing) and recompile Lua.

I have a strong hunch that the use-case behind the lua_lock/lua_unlock macros is fairly different than the one I'm imagining. Specifically, it looks like they're intended for the case where several different C threads may all want to share access to the same lua state, so we might want to compile lua in a way where C calls against a given lua state are thread safe.

Thus, you'd probably implement lua_lock/lua_unlock by sticking a mutex inside the global_State struct, and locking L->l_G->mutex.  (You emphatically wouldn't implement it by having a mutex in lua_State though, as that could lead to memory corruption when different states sharing the same global data ran in parallel.)

The use case I'm imagining is quite different though -- I'd like several "child" states, all sharing access the same global state, to be able to run simultaneously.  And, again, if I'm reading the code right, that's just not what lua_lock is intended for.  I think it *is* what Diego's LuaThread mod was intended to do though, which is why I'm interested in the details of his locking strategy.

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: LuaThread sources?

Sven Olsen
In reply to this post by Sven Olsen
I'm trying to think through what kinds of things might go wrong if one were to just create a collection of coroutines using new_thread(), and then run them in parallel using C multi threading.  

Conversely, I'm wondering if there's a subset of Lua VM ops that are already (nearly) thread safe, and if there might be relatively easy ways of exploiting this.

For example, I'll often run through the entries of some table, looking for the key/value pair that gets the highest score from some kind of valuation function.  I'm becoming suspicious that as long as the GC is off, and my valuation functions are sufficiently well behaved, I could perhaps safely parallelize these searches.

One hole I see in this concept is luaC_newobj, which looks like it would need to be serialized to prevent temporary tables from dropping off the allgc list.  That's a pretty trivial diff though.  luaS_newlstr would potentially also need a lock.  And luaD_throw would need some extra logic of some sort.  (That's relative to the 5.2 sources.)

The biggest problem seems to be table writes, as simultaneously writing new entries into the same table from different threads is obviously a recipe for disaster.  (In fact, even reading while writing is probably a disaster.)  There are various ways I might be able to ensure that table sets were in fact thread-safe-ish, though that would probably mean adding a per-Table mutex, which would lead to a bit of memory bloat; in addition to a lot of locking/unlocking overhead for table read/writes that were, most of the time, not in danger of doing anything troublesome.

But, if I'm imagining a parallel evaluation mode as something that would only run some of the time, I could disable all of this extra locking logic when running inside a single thread.  Or I could even just write up some profiling code that would detect the case where multiple threads had written to the same table during a parallel evaluation; use that code to test whether or not my evaluation functions were in fact "well behaved", and then run with the safeties off for maximum speed...

luaT_gettm() looks like it might still cause some kinds of trouble (as it's a place where reading from a table can actually write data to that table value).  But after looking a little closer I'm hopeful that caching logic it's used to implement may be harmless, provided that the "no parallel writes to shared tables" rule holds.  (Worst case, we'll just end up writing the same value into the same flags variable multiple times.)

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: LuaThread sources?

Diego Nehab-4
For what it's worth, I can provide the source-code for LuaThread. Since different threads running Lua code cannot execute simultaneously, there never was enough interest in the library. So I stopped supporting it.

Kind regards,
Diego

On Wed, Jan 2, 2019 at 10:00 PM Sven Olsen <[hidden email]> wrote:
I'm trying to think through what kinds of things might go wrong if one were to just create a collection of coroutines using new_thread(), and then run them in parallel using C multi threading.  

Conversely, I'm wondering if there's a subset of Lua VM ops that are already (nearly) thread safe, and if there might be relatively easy ways of exploiting this.

For example, I'll often run through the entries of some table, looking for the key/value pair that gets the highest score from some kind of valuation function.  I'm becoming suspicious that as long as the GC is off, and my valuation functions are sufficiently well behaved, I could perhaps safely parallelize these searches.

One hole I see in this concept is luaC_newobj, which looks like it would need to be serialized to prevent temporary tables from dropping off the allgc list.  That's a pretty trivial diff though.  luaS_newlstr would potentially also need a lock.  And luaD_throw would need some extra logic of some sort.  (That's relative to the 5.2 sources.)

The biggest problem seems to be table writes, as simultaneously writing new entries into the same table from different threads is obviously a recipe for disaster.  (In fact, even reading while writing is probably a disaster.)  There are various ways I might be able to ensure that table sets were in fact thread-safe-ish, though that would probably mean adding a per-Table mutex, which would lead to a bit of memory bloat; in addition to a lot of locking/unlocking overhead for table read/writes that were, most of the time, not in danger of doing anything troublesome.

But, if I'm imagining a parallel evaluation mode as something that would only run some of the time, I could disable all of this extra locking logic when running inside a single thread.  Or I could even just write up some profiling code that would detect the case where multiple threads had written to the same table during a parallel evaluation; use that code to test whether or not my evaluation functions were in fact "well behaved", and then run with the safeties off for maximum speed...

luaT_gettm() looks like it might still cause some kinds of trouble (as it's a place where reading from a table can actually write data to that table value).  But after looking a little closer I'm hopeful that caching logic it's used to implement may be harmless, provided that the "no parallel writes to shared tables" rule holds.  (Worst case, we'll just end up writing the same value into the same flags variable multiple times.)

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: LuaThread sources?

Sven Olsen
For what it's worth, I can provide the source-code for LuaThread. Since different threads running Lua code cannot execute simultaneously, there never was enough interest in the library. So I stopped supporting it.

Ah, I didn't realize LuaThread had that restriction.  Then I'm guessing LuaThread was implemented more or less just by defining the lua_lock/lua_unlock macros (as Sean suggested)?  

I have done some crude experiments, throwing caution to the wind and simultaneously running "lua_resume()" on a collection of different child states.  For simple test functions, it does appear to work fine; and as I said earlier, I'm starting to suspect that there might be some relatively simple changes you could make in the guts of the interpreter that would make the approach more or less safe even while running more complex coroutines in parallel.

But, maybe my optimism about that is misplaced? Do you know if there are other areas of the interpreter that would need locking logic to make simultaneous execution safe, apart from table accesses, luaD_throw, luaC_newobj, and luaS_newlstr?  Did you give up on true simultaneous shared-state execution because you saw some nasty collection of problems that I'm overlooking?  Or was it just that table-access, by itself, looked troublesome enough to not dig into?

Thanks,

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: LuaThread sources?

nobody
On 03/01/2019 22.45, Sven Olsen wrote:
> Do you know if there are other areas of the interpreter that would
> need locking logic to make simultaneous execution safe, apart from
> table accesses, luaD_throw, luaC_newobj, and luaS_newlstr?

Don't forget stack reallocation/growth. luaY_parser / load / require
could also be fun…

(That's some fragments I recall from some experiments a while back…)

--nobody

Reply | Threaded
Open this post in threaded view
|

Re: LuaThread sources?

Sven Olsen
Don't forget stack reallocation/growth.

Ah, I see, luaM_realloc_() writes to G(L)->GCdebt, so it would need a lock or something as well.  I'd missed that, thanks.  (Maybe just converting GCdebt to an atomic ptrdiff_t would about solve this one -- that's probably cleaner than messing around with the implied nested locks you'd get from the interactions with luaC_newobj.)

luaY_parser / load / require
could also be fun…

Yeah, they shouldn't come up in my hypothetical use cases, but, they do make me nervous. Apart from the implied luaC_newobj / luaM_realloc / luaS_newlstr calls, I haven't actually found any places where I can see that they'd need a lock, but, I haven't really looked that hard either.  Might be wise to just disallow them in my parallel evaluation mode, and save myself the worry.

I'm feeling increasingly confident that the big issue for any parallel coroutine execution scheme is table writes.  And I'm starting to think that maybe, in practice, the most sensible way of handling the issue might be to just say that while I'm doing parallel executions, any writes into tables are just errors.  That's pretty darn restrictive, but, I think I might have some use cases where the ability to run even such heavily restricted lua code in parallel would be useful, and my initial tests are suggesting that I might get as much as a 4X speedup for my trouble, which is big enough to be tempting.  (I could even write a thread-safe table-like userdata class to use in those cases where I really do need to write into something table-ish during a parallel execution.  But adding locking logic around all table accesses feels like it would add too much bloat to an otherwise pretty lean interpreter.)

-Sven

Reply | Threaded
Open this post in threaded view
|

Re: LuaThread sources?

Diego Nehab-4
In reply to this post by Sven Olsen
Indeed. It was 2001 and the "lucky" among us had a Pentium 4... Threads were mostly used to hide latency accessing the network, or the hard drive (or to keep the UI interactive during computations). Giving threads exclusive access to the Lua state while they were running Lua code was not absurd back then. I wanted to avoid modifying the Lua source code, so the only alternative was to use lua_lock and lua_unlock.

If I were working on LuaThread today, I would try to make it more functional (map/scan/reduce) or like OpenMP (with pragmas), and modify even the compiler. :)

On Thu, Jan 3, 2019 at 7:47 PM Sven Olsen <[hidden email]> wrote:
For what it's worth, I can provide the source-code for LuaThread. Since different threads running Lua code cannot execute simultaneously, there never was enough interest in the library. So I stopped supporting it.

Ah, I didn't realize LuaThread had that restriction.  Then I'm guessing LuaThread was implemented more or less just by defining the lua_lock/lua_unlock macros (as Sean suggested)?  

I have done some crude experiments, throwing caution to the wind and simultaneously running "lua_resume()" on a collection of different child states.  For simple test functions, it does appear to work fine; and as I said earlier, I'm starting to suspect that there might be some relatively simple changes you could make in the guts of the interpreter that would make the approach more or less safe even while running more complex coroutines in parallel.

But, maybe my optimism about that is misplaced? Do you know if there are other areas of the interpreter that would need locking logic to make simultaneous execution safe, apart from table accesses, luaD_throw, luaC_newobj, and luaS_newlstr?  Did you give up on true simultaneous shared-state execution because you saw some nasty collection of problems that I'm overlooking?  Or was it just that table-access, by itself, looked troublesome enough to not dig into?

Thanks,

-Sven
Reply | Threaded
Open this post in threaded view
|

Re: LuaThread sources?

Sven Olsen
In reply to this post by Sven Olsen
I have done some crude experiments, throwing caution to the wind and simultaneously running "lua_resume()" on a collection of different child states.  For simple test functions, it does appear to work fine; and as I said earlier, I'm starting to suspect that there might be some relatively simple changes you could make in the guts of the interpreter that would make the approach more or less safe even while running more complex coroutines in parallel.

Just for the fun of it, I did take a shot at hacking together a thread-safe lua variant the other day.  My approach was pretty simple, I used Slim Read/Writer locks to put optional shared-mode read locks or exclusive mode write locks around all table accesses.  Each table had it's own mutex, which increased GCObject sizes, but, I only triggered the locking logic if I knew I was running in a multithreaded mode.  Then I stuck another mutex in the global state, and used that to serialize the various newobj calls (but, again, only triggering the locking logic if I knew I was executing in a multithreaded mode).

It wasn't that hard to code up; but, it didn't really work very well.  The first problem is that my own application involves a /lot/ of tables, so even being very conservative, and using a light weight locking mechanism that was if-ed out most of the time was still enough code/memory bloat to increase the runtimes of serial code executions by nearly 50%.  And when I turned those locks on, and ran my threads in parallel, it appears that the cost of the locking logic meant I lost more than I gained in runtime performance -- I could keep all 8 of my virtual cores busy, but, my multithreaded executions times where, if anything, slightly slower than just turning the locking logic off and running everything in serial.

My sense is that a multithreaded lua hack might start to make sense on machines with 16+ cores, or perhaps in an application where table reads and writes form a smaller chunk of the total operations.  But, in my own use case, it looks like a dead end.

In future, I'll be returning to my old mantra: "If it's not fast enough in Lua, rewrite it in C."

-Sven