Lua in Parallel System

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Lua in Parallel System

Nelson Wong
Hi all,

Does anyone has use Lua in any parallel system? Perheps, my question
should be if Lua is thread-safe. I am working on a system that is in
highly parallelism model, (just like those super computer), which has
several dedicated processors.
I would be so much appreciated if someone out there can share the
experince with me.
Also, I wonder what is the smallest size of memory that Lua is good
enough to do the work. In other word, (exmaple), Will lua (include
code+stack+some scripts) possibly fit in a small memory segement (e.g.
256k).

I think it is a very interesting topic, as for a scripting system
natual (sequence base) use in parallel/concurrent system.

Thanks in advance.

Nelson

Reply | Threaded
Open this post in threaded view
|

RE: Lua in Parallel System

Nick Trout
The cell? Just a guess. 

Lua is re-entrant and has 2 ways of doing threads. You can create
multiple Lua states or you can create Lua threads (coroutines). Don't
know how memory architecture works with that though.

I'd compile offline using luac and strip interpreter. You'll need stubs
which I think are in etc in distrib. Think you'll get core down to
70-80k. Libraries are optional and I'd think about using custom ones
(really bare) for an embedded or tight memory system. See wiki for notes
on optimising script memory usage. Basically declare everything local
and use luac -s.

Be interested to hear how you get on. You could email me directly if you
have issues with public list.

Nick


> -----Original Message-----
> From: [hidden email] [[hidden email]
> [hidden email]] On Behalf Of Nelson Wong
> Sent: Tuesday, February 08, 2005 9:20 PM
> To: ML-Lua
> Subject: Lua in Parallel System
> 
> Hi all,
> 
> Does anyone has use Lua in any parallel system? Perheps, my question
> should be if Lua is thread-safe. I am working on a system that is in
> highly parallelism model, (just like those super computer), which has
> several dedicated processors.
> I would be so much appreciated if someone out there can share the
> experince with me.
> Also, I wonder what is the smallest size of memory that Lua is good
> enough to do the work. In other word, (exmaple), Will lua (include
> code+stack+some scripts) possibly fit in a small memory segement (e.g.
> 256k).
> 
> I think it is a very interesting topic, as for a scripting system
> natual (sequence base) use in parallel/concurrent system.
> 
> Thanks in advance.
> 
> Nelson


Reply | Threaded
Open this post in threaded view
|

Re: Lua in Parallel System

Asko Kauppi-3

Well that is certainly a place where Python would have problems to squeeze in!

Yes, interesting.


9.2.2005 kello 23:49, Nick Trout kirjoitti:


The cell? Just a guess.

Lua is re-entrant and has 2 ways of doing threads. You can create
multiple Lua states or you can create Lua threads (coroutines). Don't
know how memory architecture works with that though.

I'd compile offline using luac and strip interpreter. You'll need stubs
which I think are in etc in distrib. Think you'll get core down to
70-80k. Libraries are optional and I'd think about using custom ones
(really bare) for an embedded or tight memory system. See wiki for notes
on optimising script memory usage. Basically declare everything local
and use luac -s.

Be interested to hear how you get on. You could email me directly if you
have issues with public list.

Nick


-----Original Message-----
From: [hidden email] [[hidden email]
[hidden email]] On Behalf Of Nelson Wong
Sent: Tuesday, February 08, 2005 9:20 PM
To: ML-Lua
Subject: Lua in Parallel System

Hi all,

Does anyone has use Lua in any parallel system? Perheps, my question
should be if Lua is thread-safe. I am working on a system that is in
highly parallelism model, (just like those super computer), which has
several dedicated processors.
I would be so much appreciated if someone out there can share the
experince with me.
Also, I wonder what is the smallest size of memory that Lua is good
enough to do the work. In other word, (exmaple), Will lua (include
code+stack+some scripts) possibly fit in a small memory segement (e.g.
256k).

I think it is a very interesting topic, as for a scripting system
natual (sequence base) use in parallel/concurrent system.

Thanks in advance.

Nelson



Reply | Threaded
Open this post in threaded view
|

Re: Lua in Parallel System

Ashwin Hirschi-2
In reply to this post by Nelson Wong

Does anyone has use Lua in any parallel system?

Yes, we do. A bit over two years ago [phew... where *does* the time go?!?] I created a runtime environment where separate Lua states run parallel and communicate using light-weight, asynchronous message queues.

This model is very easy to program, since stuff running in one state is blissfully unaware of everything outside that state. And since the messages are asynchronous, code can pretty much run at full speed and decide when to handle what in an efficient manner (without silly waiting or polling or whatnot).

An interesting twist to our scheme is that we actually (can) have 2 threads per Lua state. While many seem to go for symmetrical scenarios I decided for some fearless asymmetry and came up with a "dual-threading" approach, where a controller and worker (OS) thread share one state.

The controller thread has higher priority, that is: it can interrupt the worker to provide (initial) handling of incoming messages. It's also the one that starts the worker thread and will be signalled when the worker is done. The worker thread runs a tad slower, since it runs under the line-hook [to give the controller a means to butt in]. But that's okay, since it's completely up to the programmer to decide which thread to use for what logic.

Initially I came up with the whole "extended dual-threading" model to make sure that the GUI of the "smart client" I was developing stayed responsive, especially when there's some serious communication going on and such. As things turned out, it all worked so well we decided to use the same runtime for our back-end.

A typical scenario here is a server that has one "box" [read: Lua state + message queue + controller/worker thread pair] for the console GUI, one for TCP/IP communication (based on Luasocket, btw.) and one for each service the back-end provides (typically database access and such).

Anyway, I'm not at all sure the above information is in any way relevant to "parallel systems". I'm seeing references to the recently unveiled Cell processor... But since threading was mentioned I thought I'd toss the story in to give people an idea of what Lua can do for developers.

Ashwin.
--
no signature is a signature.

Reply | Threaded
Open this post in threaded view
|

Re: Lua in Parallel System

Mark Hamburg-4
The dual thread system sounds very interesting. Can you share more details?

Mark

on 2/9/05 3:34 PM, Ashwin Hirschi at [hidden email] wrote:

> An interesting twist to our scheme is that we actually (can) have 2 threads
> per Lua state. While many seem to go for symmetrical scenarios I decided for
> some fearless asymmetry and came up with a "dual-threading" approach, where a
> controller and worker (OS) thread share one state.


Reply | Threaded
Open this post in threaded view
|

extended dual threading [was: Re: Lua in Parallel System]

Ashwin Hirschi-2

An interesting twist to our scheme is that we actually (can) have 2 threads
per Lua state. While many seem to go for symmetrical scenarios I decided for
some fearless asymmetry and came up with a "dual-threading" approach, where a
controller and worker (OS) thread share one state.

The dual thread system sounds very interesting. Can you share more details?

Sure. Well, where to start...

First off, the "dual threading" model is event-driven. This means that once a "box" (= Lua state + controller/worker thread pair + message queue) is created, its controller thread waits for a message to arrive in the box queue.

An incoming message acts as an event and triggers the controller thread to call a  (named) Lua handler. That handler can basically do what it pleases. It can inspect the message data and process it itself, or it can decide to pass it on to the worker thread.

If a controller decides to transfer handling to the worker thread, it calls the "worker" function to make its request. This worker (start) function accepts a function and optional parameters, so the controller doesn't need to pass incoming messages on literally, but can do some preprocessing and "control" what the worker does when.

Now, if the worker thread is idling it'll pick the request up immediately (i.e. right after the controller is done). Otherwise the request gets queued (in the workers' queue). At the moment we're simply using a worker queue size of 1. This "one buffered request" approach means we can guarantee a nice worker flow: if the worker is active, it can still accept one request and will handle it immediately after it has completed its current run.

If starting the worker fails (i.e. the worker is active and the control request cannot be buffered) the controller will have to try again later. To enable this scenario, the controller will need to know how the worker is doing. Specifically, it has to be signalled once the worker has finished.

Therefore we implemented a second handler (next to the basic incoming box messages one) for internal "box" events. This handler will receive a "ready" signal every time the worker has finished handling 1 request. That way the controller always has optimal control over what the worker does.

Okay, so that's the basic event model out of the way. On to a couple of threading details...

Of course, the controller and worker threads can never access the Lua state simultaneously. To avoid conflicts, access to the state gets orchestrated using the familiar mutex/event tactic. Both threads will try to obtain the lock and only continue on success.

But the threads of the box pair behave differently once they have gained access. While the controller thread runs unyieldingly, the worker thread sets the linehook before starting its run. In the linehook it'll check a simple flag to see if the controller needs access. And, if true, will release the lock and wait for a signal to continue.

This creates the nice high/low-priority approach of the controller/worker thread pair, I mentioned earlier. The controller always runs uninterrupted [with its stack on top of the worker stack] and at full speed. While the worker will yield on request (i.e. when the controller wants access) by running at a lower speed [under the line hook]. Note also that there's no mutex-locking/unlocking per Lua API call!

Well, there you have it (or at least, part of it). I hope this gives an idea of what's possible and how. If any of the above sounds complicated, I'm not explaining it right. Because, really, it's not [though I admit it took me a few days before I got the details just right... ;-)].

I would also like to stress that all this is made possible simply because Lua is re-entrant. Without the re-entrancy, the controller thread would never be able to take over the state from the worker thread.

Also, remember that this "dual threading" model can be "extended" to any number of "boxes": separate states all have their own controller/worker thread pair and message queue. Using a simple "sendbox" primitive these boxes can run parallel, communicate asynchronously, and thus interact quite easily.

To implement a useful dialog between parts, all that is needed is a basic protocol. And because of Lua's dynamic nature, implementing (or, maybe more importantly, evolving) such a protocol is something that's incredibly easy to do. The "extended dual threading" model takes care of the rest.

I realise there's much more to be said about how things are implemented and why. For instance, boxes are created fairly "dumb", i.e. with fairly empty states and without any dedicated logic. So, a mechanism is needed to ensure the box can be "booted", in order to install the necessary handlers and such. Other housekeeping techniques, like cleanly closing down a box, are needed as well. I'll be happy to elaborate if anyone's interested.

Okay... enough postponing... it's back to work... [:-)]

Ashwin.
--
no signature is a signature.

Reply | Threaded
Open this post in threaded view
|

Re: extended dual threading [was: Re: Lua in Parallel System]

Mark Hamburg-4
Cool.

The one "issue" I could see with your approach is that reading the signal
tested in the line hook actually depends on memory synchronization and doing
that properly is almost as expensive as a mutex lock/unlock. Is this not an
issue because you potentially just churn through enough memory that
eventually the value gets propagated?

Mark


Reply | Threaded
Open this post in threaded view
|

Re: extended dual threading [was: Re: Lua in Parallel System]

Ashwin Hirschi-2

The one "issue" I could see with your approach is that reading the signal
tested in the line hook actually depends on memory synchronization and doing
that properly is almost as expensive as a mutex lock/unlock. Is this not an
issue because you potentially just churn through enough memory that
eventually the value gets propagated?

The approach I've implemented simply has the line hook code test a (zero/non-zero) flag variable.

At first I wanted to use a more atomic set/test mechanism, but to make a long story short: the flag variable just worked. So I kept it there [:-)]. It's been in production code for about 2 years now and we haven't had any problems with it.

As far as performance is concerned, I did quite a bit of benchmarking at the time. Running under the linehook does slow things down a bit, but to be honest I've forgotten the actual numbers. And because everything performed more than satisfactorily, I haven't look at it since.

Of course, if ever speed does become an issue, you simply have the controller thread do most of the processing. You still keep the parallel processing and the messaging (between boxes).

Ashwin.
--
no signature is a signature.

Reply | Threaded
Open this post in threaded view
|

Re: extended dual threading [was: Re: Lua in Parallel System]

David Jones-2

On Feb 12, 2005, at 01:58, Ashwin Hirschi wrote:


The one "issue" I could see with your approach is that reading the signal tested in the line hook actually depends on memory synchronization and doing that properly is almost as expensive as a mutex lock/unlock. Is this not an
issue because you potentially just churn through enough memory that
eventually the value gets propagated?

The approach I've implemented simply has the line hook code test a (zero/non-zero) flag variable.

The issue that Mark is talking about (I think) comes up on machines with more than processor. The thread that sets the flag to 1 may be on a different processor to the thread that is reading it. It can take arbitrarily long for effects of the write to the memory location that is executed on one processor to be seen on another processor. In actual practice it can be anything from almost instantly to basically forever.

On relaxed memory order architectures (Alpha, Sparc v9 in certain modes, maybe others), where a processor writes to memory in a different order than the instructions get executed, you may need a barrier. Basically if you are relying on the flag being set to 1 indicating that some other piece of memory is in a certain state, then you'll need a write barrier before setting it to 1.

David Jones


Reply | Threaded
Open this post in threaded view
|

Re: extended dual threading [was: Re: Lua in Parallel System]

Ashwin Hirschi-2

The one "issue" I could see with your approach is that reading the
signal tested in the line hook actually depends on memory
synchronization and doing that properly is almost as
expensive as a mutex lock/unlock. Is this not an issue because
you potentially just churn through enough memory that
eventually the value gets propagated?

The approach I've implemented simply has the line hook code test a
(zero/non-zero) flag variable.

The issue that Mark is talking about (I think) comes up on machines
with more than processor.  The thread that sets the flag to 1 may be on
a different processor to the thread that is reading it.  It can take
arbitrarily long for effects of the write to the memory location that
is executed on one processor to be seen on another processor.  In
actual practice it can be anything from almost instantly to basically
forever.

Okay, I get it now. If memory synchronization becomes an issue on these architectures it certainly seems necessary to get things in-sync before state access is handed from one party to another.

But, thinking about it: wouldn't such a mechanism then always have to be in-place?

I mean, indirect mutex locking through the LUA API seems inadequate in these situations as well. So, there would be an additional cost there too, right?

Anyway, there have been surprisingly few other reactions to Nelson's original question. Or did I miss something?

I would love to hear about other people's experiences with Lua in threaded/parallel/whatever environments!

Ashwin.
--
no signature is a signature.

Reply | Threaded
Open this post in threaded view
|

Re: extended dual threading [was: Re: Lua in Parallel System]

Adrian Sietsma
Ashwin Hirschi wrote:
...
I would love to hear about other people's experiences with Lua in threaded/parallel/whatever environments!


i'm playing sound from a buffer refreshed every 1e6+ lines, using linehook.

Adrian

Reply | Threaded
Open this post in threaded view
|

Re: extended dual threading [was: Re: Lua in Parallel System]

Mark Hamburg-4
In reply to this post by Ashwin Hirschi-2
Mutex locks and unlocks always include a memory sync and/or barrier. This is
what makes them expensive even when there is no contention.

The issue at hand is that if thread A on processor A sets a piece of memory:

    sharedMemory->flag = 1

And thread B on processor B is testing this flag:

    if( sharedMemory->flag ) {
        /* yield time */
    }

And no memory synchronization/barrier exists, then thread B may not see
thread A's changes for an arbitrarily long period of time.

But the whole point is that we don't want to pay for a memory barrier.

I haven't tried it, but my reading of the documentation for pthread_kill
would suggest that one could use it to trigger a signal handler in thread B
from thread A. This would allow thread B to run all out until a signal from
thread A instructed thread B to install a hook proc which would yield a
mutex back to thread A.

Mark

on 2/14/05 4:25 PM, Ashwin Hirschi at [hidden email] wrote:

> 
>>>> The one "issue" I could see with your approach is that reading the
>>>> signal tested in the line hook actually depends on memory
>>>> synchronization and doing that properly is almost as
>>>> expensive as a mutex lock/unlock. Is this not an issue because
>>>> you potentially just churn through enough memory that
>>>> eventually the value gets propagated?
>>> 
>>> The approach I've implemented simply has the line hook code test a
>>> (zero/non-zero) flag variable.
>> 
>> The issue that Mark is talking about (I think) comes up on machines
>> with more than processor.  The thread that sets the flag to 1 may be on
>> a different processor to the thread that is reading it.  It can take
>> arbitrarily long for effects of the write to the memory location that
>> is executed on one processor to be seen on another processor.  In
>> actual practice it can be anything from almost instantly to basically
>> forever.
> 
> Okay, I get it now. If memory synchronization becomes an issue on these
> architectures it certainly seems necessary to get things in-sync before state
> access is handed from one party to another.
> 
> But, thinking about it: wouldn't such a mechanism then always have to be
> in-place?
> 
> I mean, indirect mutex locking through the LUA API seems inadequate in these
> situations as well. So, there would be an additional cost there too, right?
> 
> Anyway, there have been surprisingly few other reactions to Nelson's original
> question. Or did I miss something?
> 
> I would love to hear about other people's experiences with Lua in
> threaded/parallel/whatever environments!
> 
> Ashwin.