LuaThreads with LuaJIT (and maybe lua_tcc)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

LuaThreads with LuaJIT (and maybe lua_tcc)

Jerome Vuarand-2
Hi list,
 
I'm working on a project that absolutely require premptive multithreading (multiple hardware threads are available) and some good runtime performance. I'm trying to write it almost exclusively in Lua to reduce development time and complexity (I have a 90%-lua/10%-C/C++ ratio). I'm using Lua 5.1. I'm trying to use standard unpatched Lua to ease evolution and maintenance. I don't need to support multiple Lua versions, but I want to incorporate easily Lua evolutions in the project. I use custom Makefiles and was used to build a lua.dll and lualib.dll pair instead of a single lua50.dll. With Lua 5.1 I had to patch some files to allow seperation of core and lib.
 
So I'm using LuaThreads (I did a custom port to 5.1) and I tried to use LuaJIT with it. All my personnal code is in seperate modules, so I modified LuaThreads (both for integration in 5.1, seperation of core and lib and for better granularity) and I'm in the process to do the same with LuaJIT, to make things clear (however LuaJIT seems to be developped with statically linked core and lib since there are heavy dependencies between ljitlib and lua core internal functions). The problem is that both use LUAI_EXTRASPACE in lua_State-s and need their callback to be called properly.
 
I'm also planning to use lua_tcc at some point to optimize by hand some part of my engine. The principle would be to turn static data of my problem into ultra-specific hardcoded C routines that would be almost optimal with no cost in data complexity.
 
So here are my important questions :
- is there a framework somewhere to ease coexistence of multiple heavy Lua mods (those that use LUAI_EXTRASPACE and eventually patch lua source code), some standard mod in which Lua plugins could register their callbacks ?
- did somebody achieved to make LuaThreads and LuaJIT coexist in the same build (even experience with 5.0.x would help me) ?
- are there some expected fundamental issues or incompatibilities that would theoretically prevent me from using LuaJIT and LuaThreads together ?
- did somebody achieved to make LuaJIT and lua_tcc coexist ? (I haven't tried yet since I have problem with tcc relocation, but I guess two relocating runtime compilers in the same dynamically linked process may not like each other)
 
And some less important questions :
- did someone achieved to seperate LuaJIT core and lib code ?
- did someone achieved running lua_tcc under win32 with linking and relocation to liblua.a while lua.dll is already loaded in the process ?
 
Doub.
Reply | Threaded
Open this post in threaded view
|

Re: LuaThreads with LuaJIT (and maybe lua_tcc)

Javier Guerra Giraldez
On Tuesday 11 April 2006 10:47 am, Jerome Vuarand wrote:
> I'm also planning to use lua_tcc at some point to optimize by hand some
> part of my engine. The principle would be to turn static data of my
> problem into ultra-specific hardcoded C routines that would be almost
> optimal with no cost in data complexity.

i don't want to discourage you, but don't rely too much on lua_tcc.  
unfortunately, it seems TCC has some unclean usage of global variables; so
it's usable to use it with very static environment (usual C program
initialisation and such), but might be unstable with a dynamic language
environment.

in particular, each time you use it to compile some C, lua_tcc allocates a new
TCC state. this state has to be there as long as you want to use the compiled
code, so the returned function(s) are closure with a 'hidden' reference to a
userdata that holds the TCC state.  that way, when the last function is
disposed, the lua_tcc userdata can be collected and the TCC state
deallocated; but if you create two TCC states, the first deallocation
succeeds, and the second one fails (it tries to deallocate some globals
_again_).

for this, lua_tcc is nice to experiment and try new things; but when you're
satisfied with your C code, you should use a real compiler.


--
Javier

attachment0 (207 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LuaThreads with LuaJIT (and maybe lua_tcc)

Mike Pall-4-2
In reply to this post by Jerome Vuarand-2
Hi,

Jerome Vuarand wrote:
> I'm working on a project that absolutely require premptive
> multithreading (multiple hardware threads are available) and
> some good runtime performance.

Then LuaThreads is not what you want. Apart from performance
reasons (see below) there is a serious limitation: there is only
a single lock around the Lua core. This means that only one
thread at a time can execute Lua code. A thread can only run in
parallel if it is executing C code which does _not_ call back
into the Lua core (i.e. blocking in an I/O operation or some CPU
bound C code).

I suggest you look into the mailing list archive for other
approaches. You can use multiple Lua universes and connect them
with message queues. Or you can use helper threads to do some
isolated CPU bound operations in C code (only accessing the Lua
core at the start and end).

Or you can use purely non-preemptive coroutines. It's inherently
single-threaded, though. This is sensible for most I/O bound
apps. You can run an app multiple times if it is CPU bound or
just rely on other processes keeping all CPUs busy. Avoids the
"mutex hell" of preemptive programming, too.

[Note that all of this is not a flame against LuaThreads. It's an
inherent limitation of the concept and not the implementation.]

BTW: There is another approach which avoids most of the lock
overhead. But it requires active cooperation on the C side and
you can't use the current lua_lock/lua_unlock macros:

  Basically the universe lock is held all the time, even when
  crossing the Lua/C boundary. All I/O operations which could
  potentially block need to be wrapped with an unlock before the
  operation and a lock after the operation. Ditto for very CPU
  intensive operations (like an RSA computation). This means you
  need to modify io.* and any other external library which may do
  some I/O (like LuaSocket).

  Then you can either choose to use active preemption (requires
  explicitly calling a no-op function which unlocks/locks) or
  modify the main VM loop to unlock/lock after a certain number
  of bytecodes. The former is easier for I/O bound code and you
  can avoid locking global objects, too.

  This is basically the same as the Python GIL approach. See
  their archives for a discussion of the advantages and
  disadvantages.

> I use custom Makefiles and was used to build a lua.dll and
> lualib.dll pair instead of a single lua50.dll. With Lua 5.1 I had to
> patch some files to allow seperation of core and lib.

Umm, why? There is no functional reason to do so. That's why this
was dropped for the Lua 5.1 Makefiles.

If you really want to keep backwards compatibility to Lua 5.0 (I
don't see the point for a new project), then it's considerably
easier to change the Lua 5.0 Makefiles to use a single library.

> - did somebody achieved to make LuaThreads and LuaJIT coexist in the
> same build (even experience with 5.0.x would help me) ?

This is a lot of work. You'd need extensive modifications to
LuaJIT to catch all cases where the core is left/entered or where
the lock needs to be released temporarily for loops.

> - are there some expected fundamental issues or incompatibilities that
> would theoretically prevent me from using LuaJIT and LuaThreads together
> ?

LuaThreads may seriously slow down your application because of
the excessive mutex overhead. Every C function needs a few calls
back into the Lua core and each of them locks/unlocks the mutex.
Just registering the standard Lua libraries (luaL_openlibs)
requires more than 1100 locks+unlocks! The heavy lock contention
is bad enough for single CPUs and will slow everything to a crawl
on multi-processor machines.

LuaJIT is there to speed up your application. Using it together
with LuaThreads would be rather pointless since you'd loose much
of the gain.

> - did somebody achieved to make LuaJIT and lua_tcc coexist ? (I haven't
> tried yet since I have problem with tcc relocation, but I guess two
> relocating runtime compilers in the same dynamically linked process may
> not like each other)

I have not tried, but I do not see a problem with coexistence.
Both compilers manage memory on their own and should not get into
conflict. But read Javier's post, too.

> - did someone achieved to seperate LuaJIT core and lib code ?

See above. IMHO there is no point in trying. You need both
anyway.

Bye,
     Mike
Reply | Threaded
Open this post in threaded view
|

Re: LuaThreads with LuaJIT (and maybe lua_tcc)

Javier Guerra Giraldez
On Tuesday 11 April 2006 1:23 pm, Mike Pall wrote:
> Then LuaThreads is not what you want. Apart from performance
> reasons (see below) there is a serious limitation: there is only
> a single lock around the Lua core. This means that only one
> thread at a time can execute Lua code. A thread can only run in
> parallel if it is executing C code which does _not_ call back
> into the Lua core (i.e. blocking in an I/O operation or some CPU
> bound C code).

that's very instructive, i was aware of the single lock issue, but hadn't
realized that it prevents even pure Lua code from multitasking.

> I suggest you look into the mailing list archive for other
> approaches. You can use multiple Lua universes and connect them

this is what LuaTask does (http://luaforge.net/projects/luatask)

> with message queues. Or you can use helper threads to do some
> isolated CPU bound operations in C code (only accessing the Lua
> core at the start and end).

unfortunately, there's still no windows port of the Helper Threads Toolkit.

> [Note that all of this is not a flame against LuaThreads. It's an
> inherent limitation of the concept and not the implementation.]

other bytecode languages (those that start with j and p) solve this using much
finer locks, in most cases one lock per data object, i think.  of course, the
performance of both on the core VM and GC suffer heavily.

> Just registering the standard Lua libraries (luaL_openlibs)
> requires more than 1100 locks+unlocks! The heavy lock contention
> is bad enough for single CPUs and will slow everything to a crawl
> on multi-processor machines.

it would be interesting to compare the futex-based Linux pthreads and windows
locks on this.  if i understand it right, locking and unlocking a futex is
just a couple of asm instructions when there's no contention.


--
Javier

attachment0 (207 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LuaThreads with LuaJIT (and maybe lua_tcc)

Mike Pall-4-2
Hi,

Javier Guerra wrote:
> unfortunately, there's still no windows port of the Helper Threads Toolkit.

You could borrow the Windows pthreads emulation layer found in
LuaThreads. It should have most functions you need.

> other bytecode languages (those that start with j and p) solve this using much
> finer locks, in most cases one lock per data object, i think.  of course, the
> performance of both on the core VM and GC suffer heavily.

I'm not sure this is feasible for the Lua core as it is now. One
could avoid stack locking, except that upvalues may point to
other stacks (ouch). Table and global string table locking is
straightforward. But userdata locking would need new API
functions. And then there's the incremental GC ... oh well. :-/

> it would be interesting to compare the futex-based Linux pthreads and windows
> locks on this.  if i understand it right, locking and unlocking a futex is
> just a couple of asm instructions when there's no contention.

Any recent glibc with NPTL and a recent kernel will use futexes
for pthread mutexes.

But this lock _is_ heavily contended. This means
a) a kernel call for all threads involved for each (un)lock and
b) a lot of ping-pong traffic in the multiple CPU case.

Bye,
     Mike
Reply | Threaded
Open this post in threaded view
|

RE: LuaThreads with LuaJIT (and maybe lua_tcc)

Jerome Vuarand-2
In reply to this post by Jerome Vuarand-2
Thanks a lot for all that information, some ideas you talked about may help me progress.

I've considered LuaTasks and HelperThreads, but that do not fulfil my problem. The purpose of my threads are to implement a kind of multi agent system, and the objective is to impose absolutely no constraint to the agent writer (for several reasons, one of them being that the code writer may be another agent). Ultimately I want him to be able to write an infinite loop with some behaviour accessing some global shared memory. Concurrent access at the user language level (Lua) is to be managed by the user himself, I just want that underlying system (Lua implementation in C and OS API) to be stable and to keep its data structures from corruption.

The hardware where all that is supposed to run is not well defined yet, but it surely have multiple CPU cores (a core per agent/thread is not out of question) which access a shared memory space. So I have to manage some form of program execution parallelism while accessing concurrently to data.

To avoid C structure corruption locks are a must, whatever implementation I use. I used LuaThreads because the principle and the API are what I need (true and free parallelism while in the same world), though I know its locking mechanism is too basic and not meant to be used on multi-cpu architectures. Finer lock granularity is something that can be implemented while keeping the same API (and I think its a natural evolution path of LuaThreads). Multiple concurrent read access do not need locking. I think lua implementation has no C writing side effects on Lua reads (correct me if I'm wrong), so that should be feasible and reasonnably fast to implement a smart locking system at the object level (table, string, userdata) allowing concurent reads but only a single write at a given time.

And just a final word about core/lib distinction. Initially Lua attracted me because almost all its features are optionnal and can be cut (all lib modules, the Lua compiler). In that project I see Lua as an environment more than as a language, and lualib modules are things I'd like to cut to keep the core the smallest possible (following an approach similar to that found in micro kernels based OSes) and then add and remove at runtime at the finest granularity level possible. That means that I'd like to be able to unload io, update my io.dll and reinject it in my environment. I think Lua is capable to do that with only small modifications.

It's a kind of personnal lifelong project, I'm just trying to reuse the existing to save time, and sometimes tweaking what I have may be faster than spending time on google.

Doub.

-----Message d'origine-----
De : [hidden email] [mailto:[hidden email]] De la part de Mike Pall
Envoyé : 11 avril 2006 14:23
À : Lua list
Objet : Re: LuaThreads with LuaJIT (and maybe lua_tcc)

Hi,

Jerome Vuarand wrote:
> I'm working on a project that absolutely require premptive
> multithreading (multiple hardware threads are available) and some good
> runtime performance.

Then LuaThreads is not what you want. Apart from performance reasons (see below) there is a serious limitation: there is only a single lock around the Lua core. This means that only one thread at a time can execute Lua code. A thread can only run in parallel if it is executing C code which does _not_ call back into the Lua core (i.e. blocking in an I/O operation or some CPU bound C code).

I suggest you look into the mailing list archive for other approaches. You can use multiple Lua universes and connect them with message queues. Or you can use helper threads to do some isolated CPU bound operations in C code (only accessing the Lua core at the start and end).

Or you can use purely non-preemptive coroutines. It's inherently single-threaded, though. This is sensible for most I/O bound apps. You can run an app multiple times if it is CPU bound or just rely on other processes keeping all CPUs busy. Avoids the "mutex hell" of preemptive programming, too.

[Note that all of this is not a flame against LuaThreads. It's an inherent limitation of the concept and not the implementation.]

BTW: There is another approach which avoids most of the lock overhead. But it requires active cooperation on the C side and you can't use the current lua_lock/lua_unlock macros:

  Basically the universe lock is held all the time, even when
  crossing the Lua/C boundary. All I/O operations which could
  potentially block need to be wrapped with an unlock before the
  operation and a lock after the operation. Ditto for very CPU
  intensive operations (like an RSA computation). This means you
  need to modify io.* and any other external library which may do
  some I/O (like LuaSocket).

  Then you can either choose to use active preemption (requires
  explicitly calling a no-op function which unlocks/locks) or
  modify the main VM loop to unlock/lock after a certain number
  of bytecodes. The former is easier for I/O bound code and you
  can avoid locking global objects, too.

  This is basically the same as the Python GIL approach. See
  their archives for a discussion of the advantages and
  disadvantages.

> I use custom Makefiles and was used to build a lua.dll and lualib.dll
> pair instead of a single lua50.dll. With Lua 5.1 I had to patch some
> files to allow seperation of core and lib.

Umm, why? There is no functional reason to do so. That's why this was dropped for the Lua 5.1 Makefiles.

If you really want to keep backwards compatibility to Lua 5.0 (I don't see the point for a new project), then it's considerably easier to change the Lua 5.0 Makefiles to use a single library.

> - did somebody achieved to make LuaThreads and LuaJIT coexist in the
> same build (even experience with 5.0.x would help me) ?

This is a lot of work. You'd need extensive modifications to LuaJIT to catch all cases where the core is left/entered or where the lock needs to be released temporarily for loops.

> - are there some expected fundamental issues or incompatibilities that
> would theoretically prevent me from using LuaJIT and LuaThreads
> together ?

LuaThreads may seriously slow down your application because of the excessive mutex overhead. Every C function needs a few calls back into the Lua core and each of them locks/unlocks the mutex.
Just registering the standard Lua libraries (luaL_openlibs) requires more than 1100 locks+unlocks! The heavy lock contention is bad enough for single CPUs and will slow everything to a crawl on multi-processor machines.

LuaJIT is there to speed up your application. Using it together with LuaThreads would be rather pointless since you'd loose much of the gain.

> - did somebody achieved to make LuaJIT and lua_tcc coexist ? (I
> haven't tried yet since I have problem with tcc relocation, but I
> guess two relocating runtime compilers in the same dynamically linked
> process may not like each other)

I have not tried, but I do not see a problem with coexistence.
Both compilers manage memory on their own and should not get into conflict. But read Javier's post, too.

> - did someone achieved to seperate LuaJIT core and lib code ?

See above. IMHO there is no point in trying. You need both anyway.

Bye,
     Mike
Reply | Threaded
Open this post in threaded view
|

Re: LuaThreads with LuaJIT (and maybe lua_tcc)

Mildred Ki'Lya
In reply to this post by Jerome Vuarand-2
Hi,

I wrote something that may be useful. It compiles but it isn't tested
yet.
It's a function lua_xcopy that copies n lua values from one lua Stack
to another, like lua_xmove does. But contrary to lua_xmove, I try to
make a full copy, that is without making referances between the two
lua_Stacks.
It's not yet tested but if it works, it should copy without problems
numbers, strings, light userdatas, booleans, nils, tables (with
metatables) and functions (both C and lua, with upvalues and
environment) but not lua threads.
It will also copy userdatas with their metatables. The problem there is
that if a userdata is collected in one lua_Stack, lua will call the
__gc metamethod, and on the C side, some memory may be freed at this
time. So the same userdata on the other lua Stack may cause bugs
(trying to access freed memory). To prevent that, a metamethod __xcopy
is called on both metatables. Then the C side will know that it needs
two calls to __gc to free the data :)

I looked at pluto to help me with that.
I coded that because I also want multithreading in Lua :) A solution
that copies lua datas like that can make independant lua stacks
communicate without the need to lock the whole state by lua_lock and
lua_unlock.

Any thoughts about that ?

Mildred
--
Mildred       <xmpp:[hidden email]> <http://mildred632.free.fr/>
Clef GPG :    <hkp://pgp.mit.edu> ou <http://mildred632.free.fr/gpg_key>
Fingerprint : 197C A7E6 645B 4299 6D37 684B 6F9D A8D6 [9A7D 2E2B]


--
Mildred       <xmpp:[hidden email]> <http://mildred632.free.fr/>
Clef GPG :    <hkp://pgp.mit.edu> ou <http://mildred632.free.fr/gpg_key>
Fingerprint : 197C A7E6 645B 4299 6D37 684B 6F9D A8D6 [9A7D 2E2B]

lua_xcopy.c (9K) Download Attachment