Algorithm help: passing strings between threads with minimal copying

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Algorithm help: passing strings between threads with minimal copying

Rena
This isn't really specific to Lua, but since I'm doing it in Lua, and people on this list seem to know these things pretty well, I'll ask here.

I'm implementing a system which allows a Lua script to create a new OS thread. That thread creates its own, independent Lua script and runs some code given to it at creation time. The two threads need to be able to communicate with eachother. The new thread can create more new threads as well. This is a module that needs to work under standard Lua, so unfortunately I can't take advantage of lua_lock and lua_unlock macros.

What I'm having trouble with is passing messages between threads. The interface is simple enough: when you have a reference to a thread (because you created it or it was given to you as a parameter), you can send a message (a string) to it. The message goes into a queue and the thread can call a function to retrieve the next message from the queue.

Of course one thread can't touch another thread's Lua state, since a state can only be used by one thread at a time. So we need to store the received messages somewhere until the receiving thread asks for them. No problem, we have a struct for each thread already (holding its lua_State* and other nice things); we can copy the string into a list in that struct, using mutexes to ensure it's not being read and written at the same time.

The issue with that design, though, is excess copying. If thread A sends a message to thread B, that means A has a copy of the string, B's incoming message queue will have another copy, and B's Lua state will make a third copy when lua_pushlstring() is used to receive the message. I'd quite like to eliminate that extra copy.

Alright, so we don't copy the message into B's queue; we give it a pointer to the string in A's Lua state, and anchor it in the registry to keep it from being collected before B receives the message. Now B copies directly from A, no third copy, no problem?

Well, two problems actually. Those strings have to be removed from A's registry sometime so they aren't wasting space, and if A exits before B checks its messages, the string is still going to get collected and the pointer will be invalid.

So we'll have B notify A in some fashion when it receives a message, so that A can allow the message to be collected. Maybe we'll keep the ID from luaL_ref in the message struct as well, and then B can add that ID to a list in A's thread struct (the same way A adds the message to B's message queue) telling it that the message is now safe to collect. Then A can check that list periodically to collect messages, and can wait before exiting to ensure all messages are received.

But, what if B never checks its messages? Perhaps B is in a loop waiting for some other event that never comes? Then A will sit waiting forever before it's allowed to exit, waiting for those sent messages to become collectible, which will never happen.

The two solutions to this I came up with are:

1) When A exits, it can copy all unreceived messages into malloc()'d buffers and update the pointers to them in B's message queue (along with a flag telling it to free() them itself). Thus there's a third copy like in the original design, but only in the "worst case scenario" when A exits before B receives its message. However I found this complicated the design considerably, as now when A exits (or collects a reference to B) it has to look through B's message queue and copy any messages that came from A - which means the message has to identify which thread it came from - which means now the messages have to have pointers back to their senders, which could become invalid suddenly, and becomes difficult to keep track of.

2) Ensure B checks its messages periodically, by using a hook[1] which is called every so often. In the hook, B copies all received messages into its registry (so that A can then collect them), and when the script actually asks for messages, it retrieves them from there (and looks at the queue if the registry doesn't have any messages). This seems like it'd be effective except in the "worst case" when B gets stuck in an endless loop and the hook never gets called, which could potentially cause a cascading failure (A never exits because it's waiting for B which never calls its hook).

I feel like the second solution is ideal (I already have such a hook in place for other reasons, so overhead isn't a huge issue), but I wondered if anyone else had any feedback on this.

[1] The method I use isn't really a "hook" (as in debug hook). Rather, when creating the Lua state I leave an object on the stack (a 1-byte userdata) with a __gc metamethod attached to it. Since the object is left on the stack but not reference anywhere, it will be collected. Its __gc metamethod performs whatever periodic tasks need doing, and then leaves another such object with the same metamethod on the stack, so that the metamethod will fire again on the next GC cycle. (It has to create a new object, since each object's __gc is called only once.) This way the "hook" executes periodically as long as the garbage collector is running.

--
Sent from my Game Boy.
Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Tim Hill

On Apr 5, 2014, at 10:25 AM, Rena <[hidden email]> wrote:

>
> Of course one thread can't touch another thread's Lua state, since a state can only be used by one thread at a time. So we need to store the received messages somewhere until the receiving thread asks for them. No problem, we have a struct for each thread already (holding its lua_State* and other nice things); we can copy the string into a list in that struct, using mutexes to ensure it's not being read and written at the same time.
>
> The issue with that design, though, is excess copying. If thread A sends a message to thread B, that means A has a copy of the string, B's incoming message queue will have another copy, and B's Lua state will make a third copy when lua_pushlstring() is used to receive the message. I'd quite like to eliminate that extra copy.
>

Are you really sure the copying is a significant enough problem to justify the complexity of your solutions? I’ve designed several marshaling systems around Lua, and the “extra hop” really wasn’t significant in terms of performance. Depending on your message volume, you may even find that a fixed pool of buffers is very efficient, as the CPU caches will quickly “warm” and the extra copy overhead will get even smaller.

—Tim



Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

katlogic
In reply to this post by Rena
> The issue with that design, though, is excess copying. If thread A sends a

This is an issue only if you are using small CPUs which are not heavily
pipelined/superscalar (GPU cores, MIPS microcontroller..).

Your bottleneck will be most likely lock contention caused by threading
in the first place (mutex [sic]). Keep the shared state out of marshalling
code and you'll have to lock less, thus major performance win (but still
a lot of locking overhead).

If you are serious about this (and not just overoptimizing prematurely),
consider http://www.liblfds.org/

Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Sean Conner
In reply to this post by Tim Hill
It was thus said that the Great Tim Hill once stated:

>
> On Apr 5, 2014, at 10:25 AM, Rena <[hidden email]> wrote:
>
> > Of course one thread can't touch another thread's Lua state, since a
> > state can only be used by one thread at a time. So we need to store the
> > received messages somewhere until the receiving thread asks for them. No
> > problem, we have a struct for each thread already (holding its
> > lua_State* and other nice things); we can copy the string into a list in
> > that struct, using mutexes to ensure it's not being read and written at
> > the same time.
> >
> > The issue with that design, though, is excess copying. If thread A sends
> > a message to thread B, that means A has a copy of the string, B's
> > incoming message queue will have another copy, and B's Lua state will
> > make a third copy when lua_pushlstring() is used to receive the message.
> > I'd quite like to eliminate that extra copy.
>
> Are you really sure the copying is a significant enough problem to justify
> the complexity of your solutions? I’ve designed several marshaling systems
> around Lua, and the “extra hop” really wasn’t significant in terms of
> performance. Depending on your message volume, you may even find that a
> fixed pool of buffers is very efficient, as the CPU caches will quickly
> “warm” and the extra copy overhead will get even smaller.

  I'll second this.  At work, I'm currently writing a server [1] in Lua.
Yes, I do have some C in there, but that's to handle the sockets, the calls
to select() [2] and other "interfacing to the operating system" type calls.
The data that comes in is all text (SIP message, which follows the Internet
Text Message Format of RFC-822 for the most part) so there's quite a bit of
parsing.  I also have to deal with encoding and decoding packets for a
proprietary protocol (a mixture of binary and text).

  I've profiled the code under what I would call a moderate load and the
results have been quite interesting.  On the C side of things, the top *two
dozen* routines were all in the Lua core [3].  Profiling the code in Lua [4]
showed the main loop [5] was taking al the time; the parsing/marshalling
code was a few orders of magnitude less than the main loop.

  I also recorded the memory usage of Lua during all this (a few nearly 20
hour runs).  When I was first running tests (usually around 10 minutes,
maybe half an hour) I would get alarmed at the memory growth, and even spent
some time playing around with the garbage collection parameters.  I
shouldn't have bothered.  The extended runs, with the default settings, were
fine and memory usage was consistent (after a period of time---the graph is
somewhat wild for the first hour or two, then it settles down).  

  Don't be afraid of the copies.

  -spc (Obviously, I am unafraid of the footnotes ... )

[1] For lack of a better term.  It's actually a network component that
        receives messages, does some processing, may query another network
        component and then reply to the message.

[2] Actually, epoll_wait() under Linux, poll() everywhere else.  I do
        have code to support select() but I don't know of any modern Unix
        system (which is what we use) that only has select().

[3] The number one spot?  luaV_execute().  Not terribly surprising.  And
        yes, this server is using stock Lua [6].

[4] Every 1,000,000 Lua instructions, obtain the file, function and line
        of the Lua state.  Save, and when done, print it out, sorted first
        by count (highest first) then by file:function:line number.  It's
        about 40 lines of Lua code, with about a quarter devoted to sorting
        the results.

[5] Basically (simplified a bit, but not much)

                while not process.sig.caught() do
                  schedule_sleeping_coroutines()
                  local events = SOCKETS:events(0) -- select loop
                  for i = 1,#events do
                    events[i].obj(event[i]) -- handle packet from socket
                  end
                  execute_run_queue_of_coroutines()
                end

        Yes, I create coroutines like crazy.  Each coroutine handles a
        transaction.  Sure, I have to manually yield when making a blocking
        call, but it allows the actual processing code to be pretty much:

                read_packet()
                parse_packet()
                if need_more_info()
                  create_request()
                  send_request()
                  coroutine.yield()
                  read_request_packet()
                  parse_request_packet()
                  get_data()
                end
                make_reply()
                send_packet() -- reply to original packet

        instead of a mess of callback hell.

[6] I would love to use LuaJIT, but the target platform is Sparc, and
        LuaJIT doesn't support that architecture.  Sigh.

Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Rena
On Sat, Apr 5, 2014 at 4:14 PM, Sean Conner <[hidden email]> wrote:
It was thus said that the Great Tim Hill once stated:
>
> On Apr 5, 2014, at 10:25 AM, Rena <[hidden email]> wrote:
>
> > Of course one thread can't touch another thread's Lua state, since a
> > state can only be used by one thread at a time. So we need to store the
> > received messages somewhere until the receiving thread asks for them. No
> > problem, we have a struct for each thread already (holding its
> > lua_State* and other nice things); we can copy the string into a list in
> > that struct, using mutexes to ensure it's not being read and written at
> > the same time.
> >
> > The issue with that design, though, is excess copying. If thread A sends
> > a message to thread B, that means A has a copy of the string, B's
> > incoming message queue will have another copy, and B's Lua state will
> > make a third copy when lua_pushlstring() is used to receive the message.
> > I'd quite like to eliminate that extra copy.
>
> Are you really sure the copying is a significant enough problem to justify
> the complexity of your solutions? I’ve designed several marshaling systems
> around Lua, and the “extra hop” really wasn’t significant in terms of
> performance. Depending on your message volume, you may even find that a
> fixed pool of buffers is very efficient, as the CPU caches will quickly
> “warm” and the extra copy overhead will get even smaller.

  I'll second this.  At work, I'm currently writing a server [1] in Lua.
Yes, I do have some C in there, but that's to handle the sockets, the calls
to select() [2] and other "interfacing to the operating system" type calls.
The data that comes in is all text (SIP message, which follows the Internet
Text Message Format of RFC-822 for the most part) so there's quite a bit of
parsing.  I also have to deal with encoding and decoding packets for a
proprietary protocol (a mixture of binary and text).

  I've profiled the code under what I would call a moderate load and the
results have been quite interesting.  On the C side of things, the top *two
dozen* routines were all in the Lua core [3].  Profiling the code in Lua [4]
showed the main loop [5] was taking al the time; the parsing/marshalling
code was a few orders of magnitude less than the main loop.

  I also recorded the memory usage of Lua during all this (a few nearly 20
hour runs).  When I was first running tests (usually around 10 minutes,
maybe half an hour) I would get alarmed at the memory growth, and even spent
some time playing around with the garbage collection parameters.  I
shouldn't have bothered.  The extended runs, with the default settings, were
fine and memory usage was consistent (after a period of time---the graph is
somewhat wild for the first hour or two, then it settles down).

  Don't be afraid of the copies.

  -spc (Obviously, I am unafraid of the footnotes ... )

[1]     For lack of a better term.  It's actually a network component that
        receives messages, does some processing, may query another network
        component and then reply to the message.

[2]     Actually, epoll_wait() under Linux, poll() everywhere else.  I do
        have code to support select() but I don't know of any modern Unix
        system (which is what we use) that only has select().

[3]     The number one spot?  luaV_execute().  Not terribly surprising.  And
        yes, this server is using stock Lua [6].

[4]     Every 1,000,000 Lua instructions, obtain the file, function and line
        of the Lua state.  Save, and when done, print it out, sorted first
        by count (highest first) then by file:function:line number.  It's
        about 40 lines of Lua code, with about a quarter devoted to sorting
        the results.

[5]     Basically (simplified a bit, but not much)

                while not process.sig.caught() do
                  schedule_sleeping_coroutines()
                  local events = SOCKETS:events(0) -- select loop
                  for i = 1,#events do
                    events[i].obj(event[i]) -- handle packet from socket
                  end
                  execute_run_queue_of_coroutines()
                end

        Yes, I create coroutines like crazy.  Each coroutine handles a
        transaction.  Sure, I have to manually yield when making a blocking
        call, but it allows the actual processing code to be pretty much:

                read_packet()
                parse_packet()
                if need_more_info()
                  create_request()
                  send_request()
                  coroutine.yield()
                  read_request_packet()
                  parse_request_packet()
                  get_data()
                end
                make_reply()
                send_packet() -- reply to original packet

        instead of a mess of callback hell.

[6]     I would love to use LuaJIT, but the target platform is Sparc, and
        LuaJIT doesn't support that architecture.  Sigh.


Thanks for the feedback everyone. I'm just going to go ahead and accept the potential performance hit of an extra copy in exchange for how much simpler it makes the code.

--
Sent from my Game Boy.
Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Andrew Starks


On Saturday, April 5, 2014, Rena <[hidden email]> wrote:
On Sat, Apr 5, 2014 at 4:14 PM, Sean Conner <[hidden email]> wrote:
It was thus said that the Great Tim Hill once stated:
>
> On Apr 5, 2014, at 10:25 AM, Rena <[hidden email]> wrote:
>
> > Of course one thread can't touch another thread's Lua state, since a
> > state can only be used by one thread at a time. So we need to store the
> > received messages somewhere until the receiving thread asks for them. No
> > problem, we have a struct for each thread already (holding its
> > lua_State* and other nice things); we can copy the string into a list in
> > that struct, using mutexes to ensure it's not being read and written at
> > the same time.
> >
> > The issue with that design, though, is excess copying. If thread A sends
> > a message to thread B, that means A has a copy of the string, B's
> > incoming message queue will have another copy, and B's Lua state will
> > make a third copy when lua_pushlstring() is used to receive the message.
> > I'd quite like to eliminate that extra copy.
>
> Are you really sure the copying is a significant enough problem to justify
> the complexity of your solutions? I’ve designed several marshaling systems
> around Lua, and the “extra hop” really wasn’t significant in terms of
> performance. Depending on your message volume, you may even find that a
> fixed pool of buffers is very efficient, as the CPU caches will quickly
> “warm” and the extra copy overhead will get even smaller.

  I'll second this.  At work, I'm currently writing a server [1] in Lua.
Yes, I do have some C in there, but that's to handle the sockets, the calls
to select() [2] and other "interfacing to the operating system" type calls.
The data that comes in is all text (SIP message, which follows the Internet
Text Message Format of RFC-822 for the most part) so there's quite a bit of
parsing.  I also have to deal with encoding and decoding packets for a
proprietary protocol (a mixture of binary and text).

  I've profiled the code under what I would call a moderate load and the
results have been quite interesting.  On the C side of things, the top *two
dozen* routines were all in the Lua core [3].  Profiling the code in Lua [4]
showed the main loop [5] was taking al the time; the parsing/marshalling
code was a few orders of magnitude less than the main loop.

  I also recorded the memory usage of Lua during all this (a few nearly 20
hour runs).  When I was first running tests (usually around 10 minutes,
maybe half an hour) I would get alarmed at the memory growth, and even spent
some time playing around with the garbage collection parameters.  I
shouldn't have bothered.  The extended runs, with the default settings, were
fine and memory usage was consistent (after a period of time---the graph is
somewhat wild for the first hour or two, then it settles down).

  Don't be afraid of the copies.

  -spc (Obviously, I am unafraid of the footnotes ... )

[1]     For lack of a better term.  It's actually a network component that
        receives messages, does some processing, may query another network
        component and then reply to the message.

[2]     Actually, epoll_wait() under Linux, poll() everywhere else.  I do
        have code to support select() but I don't know of any modern Unix
        system (which is what we use) that only has select().

[3]     The number one spot?  luaV_execute().  Not terribly surprising.  And
        yes, this server is using stock Lua [6].

[4]     Every 1,000,000 Lua instructions, obtain the file, function and line
        of the Lua state.  Save, and when done, print it out, sorted first
        by count (highest first) then by file:function:line number.  It's
        about 40 lines of Lua code, with about a quarter devoted to sorting
        the results.

[5]     Basically (simplified a bit, but not much)

                while not process.sig.caught() do
                  schedule_sleeping_coroutines()
                  local events = SOCKETS:events(0) -- select loop
                  for i = 1,#events do
                    events[i].obj(event[i]) -- handle packet from socket
                  end
                  execute_run_queue_of_coroutines()
                end

        Yes,
Thanks for the feedback everyone. I'm just going to go ahead and accept the potential performance hit of an extra copy in exchange for how much simpler it makes the code.

--
Sent from my Game Boy.

Nanomsg supports zero copy messaging with ipc. It also uses iocompletion ports, epoll or whatever is native to the system. My binding exports the socket/file handle (windows) in a way that co-mingles with luasocket, if needed. In our testing, we don't see it, because our clock steps I. 240th of a second chunks and it's waaaay below that. 

It's alpha but it's been very stable. That plus llthreads2 sounds like a good first prototype to attempt. 

-Andrew
Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

William Ahern
In reply to this post by Sean Conner
On Sat, Apr 05, 2014 at 04:14:48PM -0400, Sean Conner wrote:
<snip>
> [2] Actually, epoll_wait() under Linux, poll() everywhere else.  I do
> have code to support select() but I don't know of any modern Unix
> system (which is what we use) that only has select().
<snip>
> [6] I would love to use LuaJIT, but the target platform is Sparc, and
> LuaJIT doesn't support that architecture.  Sigh.

Linux/Sparc or Solaris/Sparc?

I presume the former, but if not it's really easy to support Solaris' ports
API, which is the Solaris equivalent to epoll.

Adding/modifying:
https://github.com/wahern/cqueues/blob/master/src/cqueues.c#L532

Polling:
https://github.com/wahern/cqueues/blob/master/src/cqueues.c#L666


Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Sean Conner
It was thus said that the Great William Ahern once stated:

> On Sat, Apr 05, 2014 at 04:14:48PM -0400, Sean Conner wrote:
> <snip>
> > [2] Actually, epoll_wait() under Linux, poll() everywhere else.  I do
> > have code to support select() but I don't know of any modern Unix
> > system (which is what we use) that only has select().
> <snip>
> > [6] I would love to use LuaJIT, but the target platform is Sparc, and
> > LuaJIT doesn't support that architecture.  Sigh.
>
> Linux/Sparc or Solaris/Sparc?

  Solaris/Sparc.  Using the Sol, um, Oracle, provided compiler (not GCC).  

> I presume the former, but if not it's really easy to support Solaris' ports
> API, which is the Solaris equivalent to epoll.

  Hmm ... I'll have to look into these. Thanks.

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Ross Bencina
In reply to this post by Rena
Hi Rena,

> I wondered if anyone else had any feedback on this.

I like using atomic reference counting for this kind of thing.

Assuming that the string buffers are immutable, here's an idea:

Arrange for your allocator to allocate these string buffers specially.
Lua state A will still think that they are normal memory blocks, but
under the hood you will have made room for an extra reference count
that's modified using atomic increment/decrement.

State A gets a 1 count to the buffer when it allocates the block. That
will be decremented when the GC frees the memory.

When you send the string to State B, increment the count. When you've
finished with the buffer in state B, decrement the count. When the count
drops to zero (in whichever thread), free the buffer.

Make sure you use an appropriate library/api for doing the atomic inc/decs.

Ross.

Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Matthias Kluwe
In reply to this post by Rena
Hi!

2014-04-06 1:25 GMT+02:00 Rena <[hidden email]>:
> On Sat, Apr 5, 2014 at 4:14 PM, Sean Conner <[hidden email]> wrote:
>>
>> It was thus said that the Great Tim Hill once stated:
>> >
>> > On Apr 5, 2014, at 10:25 AM, Rena <[hidden email]> wrote:

>> > > Of course one thread can't touch another thread's Lua state, since a
>> > > state can only be used by one thread at a time. So we need to store
>> > > the
>> > > received messages somewhere until the receiving thread asks for them.

>> > > [...]

> Thanks for the feedback everyone. I'm just going to go ahead and accept the
> potential performance hit of an extra copy in exchange for how much simpler
> it makes the code.

Yes, your first thoughts on the problem sound quite complicated. If
you want to do message passing between threads, consider ready-to-copy
designs that just work, instead of fiddling yourself with mutexes and
the like. I've tried ZeroMQ recently for similar problems with
acceptable success
(nanomsg as its "successor" has already been mentioned).

Regards,
Matthias

Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Peng Zhicheng
In reply to this post by Ross Bencina
On 04/06/2014 08:38 PM, Ross Bencina wrote:

> Hi Rena,
>
>> I wondered if anyone else had any feedback on this.
>
> I like using atomic reference counting for this kind of thing.
>
> Assuming that the string buffers are immutable, here's an idea:
>
> Arrange for your allocator to allocate these string buffers specially. Lua state A will still think that they are normal memory blocks, but under the hood you will have made room for an extra reference count that's modified using atomic increment/decrement.
>
> State A gets a 1 count to the buffer when it allocates the block. That will be decremented when the GC frees the memory.
>
> When you send the string to State B, increment the count. When you've finished with the buffer in state B, decrement the count. When the count drops to zero (in whichever thread), free the buffer.
>
> Make sure you use an appropriate library/api for doing the atomic inc/decs.
>
> Ross.
>
I like this solution. Although the allocation is complicated (not that much I suppose),
you keep the Lua side natual and clean.

I actually had thought of using ref count as well, but with something wrapped in userdata;
not a good solution of course.

Reply | Threaded
Open this post in threaded view
|

RE: Algorithm help: passing strings between threads with minimal copying

Thijs Schreijer


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On
> Behalf Of Peng Zhicheng
> Sent: dinsdag 8 april 2014 2:34
> To: [hidden email]
> Subject: Re: Algorithm help: passing strings between threads with minimal
> copying
>
> On 04/06/2014 08:38 PM, Ross Bencina wrote:
> > Hi Rena,
> >
> >> I wondered if anyone else had any feedback on this.
> >
> > I like using atomic reference counting for this kind of thing.
> >
> > Assuming that the string buffers are immutable, here's an idea:
> >
> > Arrange for your allocator to allocate these string buffers specially. Lua
> state A will still think that they are normal memory blocks, but under the
> hood you will have made room for an extra reference count that's modified
> using atomic increment/decrement.

How would you distinguish a string allocation from some other allocation?

> >
> > State A gets a 1 count to the buffer when it allocates the block. That
> will be decremented when the GC frees the memory.
> >
> > When you send the string to State B, increment the count. When you've
> finished with the buffer in state B, decrement the count. When the count
> drops to zero (in whichever thread), free the buffer.
> >
> > Make sure you use an appropriate library/api for doing the atomic
> inc/decs.
> >
> > Ross.
> >
> I like this solution. Although the allocation is complicated (not that much
> I suppose),
> you keep the Lua side natual and clean.
>
> I actually had thought of using ref count as well, but with something
> wrapped in userdata;
> not a good solution of course.

Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Rena
On Tue, Apr 8, 2014 at 3:22 AM, Thijs Schreijer <[hidden email]> wrote:


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On
> Behalf Of Peng Zhicheng
> Sent: dinsdag 8 april 2014 2:34
> To: [hidden email]
> Subject: Re: Algorithm help: passing strings between threads with minimal
> copying
>
> On 04/06/2014 08:38 PM, Ross Bencina wrote:
> > Hi Rena,
> >
> >> I wondered if anyone else had any feedback on this.
> >
> > I like using atomic reference counting for this kind of thing.
> >
> > Assuming that the string buffers are immutable, here's an idea:
> >
> > Arrange for your allocator to allocate these string buffers specially. Lua
> state A will still think that they are normal memory blocks, but under the
> hood you will have made room for an extra reference count that's modified
> using atomic increment/decrement.

How would you distinguish a string allocation from some other allocation?

> >
> > State A gets a 1 count to the buffer when it allocates the block. That
> will be decremented when the GC frees the memory.
> >
> > When you send the string to State B, increment the count. When you've
> finished with the buffer in state B, decrement the count. When the count
> drops to zero (in whichever thread), free the buffer.
> >
> > Make sure you use an appropriate library/api for doing the atomic
> inc/decs.
> >
> > Ross.
> >
> I like this solution. Although the allocation is complicated (not that much
> I suppose),
> you keep the Lua side natual and clean.
>
> I actually had thought of using ref count as well, but with something
> wrapped in userdata;
> not a good solution of course.


The `osize` field of lua_Alloc() when `ptr=NULL` tells you the type of object being allocated. But I'm not sure how to detect that it's *that particular string* Lua is allocating memory for, other than setting a flag just before the lua_pushlstring() and having the allocator check for that flag and a string of that length. But that seems quite fragile.

--
Sent from my Game Boy.
Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Ross Bencina
On 9/04/2014 5:48 PM, Rena wrote:
> The `osize` field of lua_Alloc() when `ptr=NULL` tells you the type of
> object being allocated. But I'm not sure how to detect that it's *that
> particular string* Lua is allocating memory for, other than setting a
> flag just before the lua_pushlstring() and having the allocator check
> for that flag and a string of that length. But that seems quite fragile.

That was the kind of thing I had in mind.

Why do you think that it's fragile?

I was thinking you would define a C function to allocate these
particular "shared strings". That function sets the flag, calls
lua_pushlstring(), and then unsets the flag.

Alternatively, simply allocate all strings with the refcount (or all
strings above a certain size).

Ross.

Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Rena
On Wed, Apr 9, 2014 at 4:05 AM, Ross Bencina <[hidden email]> wrote:
On 9/04/2014 5:48 PM, Rena wrote:
The `osize` field of lua_Alloc() when `ptr=NULL` tells you the type of
object being allocated. But I'm not sure how to detect that it's *that
particular string* Lua is allocating memory for, other than setting a
flag just before the lua_pushlstring() and having the allocator check
for that flag and a string of that length. But that seems quite fragile.

That was the kind of thing I had in mind.

Why do you think that it's fragile?

I was thinking you would define a C function to allocate these particular "shared strings". That function sets the flag, calls lua_pushlstring(), and then unsets the flag.

Alternatively, simply allocate all strings with the refcount (or all strings above a certain size).

Ross.


The code path I'm imagining is:

-receiveMessage()
-find the next message in the queue
-set a flag saying "we're receiving a message of length n"
-lua_pushlstring(L, msg, n);
-the allocator sees the flag is set, sees Lua asking for a string of length n, and gives it the message's memory block (with refcount set appropriately)
-receiveMessage() returns

But I wonder if it's possible that for whatever internal reasons, Lua happens to allocate an unrelated string as well during this time, also of length n?

If not then this would definitely be an interesting solution.

--
Sent from my Game Boy.
Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Rena
On Wed, Apr 9, 2014 at 4:10 AM, Rena <[hidden email]> wrote:
On Wed, Apr 9, 2014 at 4:05 AM, Ross Bencina <[hidden email]> wrote:
On 9/04/2014 5:48 PM, Rena wrote:
The `osize` field of lua_Alloc() when `ptr=NULL` tells you the type of
object being allocated. But I'm not sure how to detect that it's *that
particular string* Lua is allocating memory for, other than setting a
flag just before the lua_pushlstring() and having the allocator check
for that flag and a string of that length. But that seems quite fragile.

That was the kind of thing I had in mind.

Why do you think that it's fragile?

I was thinking you would define a C function to allocate these particular "shared strings". That function sets the flag, calls lua_pushlstring(), and then unsets the flag.

Alternatively, simply allocate all strings with the refcount (or all strings above a certain size).

Ross.


The code path I'm imagining is:

-receiveMessage()
-find the next message in the queue
-set a flag saying "we're receiving a message of length n"
-lua_pushlstring(L, msg, n);
-the allocator sees the flag is set, sees Lua asking for a string of length n, and gives it the message's memory block (with refcount set appropriately)
-receiveMessage() returns

But I wonder if it's possible that for whatever internal reasons, Lua happens to allocate an unrelated string as well during this time, also of length n?

If not then this would definitely be an interesting solution.

--
Sent from my Game Boy.


On further consideration I don't think this can work anyway, as it requires that I control the allocators of all Lua states from the time of creation, including the one that first loads my library, which is a chicken-and-egg problem. Otherwise, when that first state tries to send a message, it could be using a memory block which it allocated with the standard allocator instead of mine, which means I won't have a refcount for that block. I can of course fix this by keeping a list of valid blocks, but that probably nullifies any optimizations over just using memcpy.

--
Sent from my Game Boy.
Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Tim Hill
In reply to this post by Rena

On Apr 9, 2014, at 1:10 AM, Rena <[hidden email]> wrote:



The code path I'm imagining is:

-receiveMessage()
-find the next message in the queue
-set a flag saying "we're receiving a message of length n"
-lua_pushlstring(L, msg, n);
-the allocator sees the flag is set, sees Lua asking for a string of length n, and gives it the message's memory block (with refcount set appropriately)
-receiveMessage() returns

But I wonder if it's possible that for whatever internal reasons, Lua happens to allocate an unrelated string as well during this time, also of length n?

If not then this would definitely be an interesting solution.

--
Sent from my Game Boy.

For the very reason you stated I think this is a fragile solution. You have no guarantee that Lua will directly call your allocator for the string push .. all sorts of other things could happen, and indeed change from release to release of Lua. It could intern the string. it could re-use an internal buffer cache. It could allocate some OTHER string internally before allocating space for your string.

—Tim

Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Rena
On Fri, Apr 11, 2014 at 7:47 PM, Tim Hill <[hidden email]> wrote:

On Apr 9, 2014, at 1:10 AM, Rena <[hidden email]> wrote:



The code path I'm imagining is:

-receiveMessage()
-find the next message in the queue
-set a flag saying "we're receiving a message of length n"
-lua_pushlstring(L, msg, n);
-the allocator sees the flag is set, sees Lua asking for a string of length n, and gives it the message's memory block (with refcount set appropriately)
-receiveMessage() returns

But I wonder if it's possible that for whatever internal reasons, Lua happens to allocate an unrelated string as well during this time, also of length n?

If not then this would definitely be an interesting solution.

--
Sent from my Game Boy.

For the very reason you stated I think this is a fragile solution. You have no guarantee that Lua will directly call your allocator for the string push .. all sorts of other things could happen, and indeed change from release to release of Lua. It could intern the string. it could re-use an internal buffer cache. It could allocate some OTHER string internally before allocating space for your string.

—Tim


I don't think interning strings or reusing existing ones would cause an issue. But unfortunately I think the other issues make this solution unworkable. Shame, I rather liked it.

--
Sent from my Game Boy.
Reply | Threaded
Open this post in threaded view
|

RE: Algorithm help: passing strings between threads with minimal copying

Thijs Schreijer
In reply to this post by Tim Hill


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On
> Behalf Of Tim Hill
> Sent: zaterdag 12 april 2014 1:47
> To: Lua mailing list
> Subject: Re: Algorithm help: passing strings between threads with minimal
> copying
>
>
> On Apr 9, 2014, at 1:10 AM, Rena <[hidden email]> wrote:
>
>
>
>
> The code path I'm imagining is:
> -receiveMessage()
> -find the next message in the queue
> -set a flag saying "we're receiving a message of length n"
> -lua_pushlstring(L, msg, n);
> -the allocator sees the flag is set, sees Lua asking for a string of length
> n, and gives it the message's memory block (with refcount set appropriately)
> -receiveMessage() returns
> But I wonder if it's possible that for whatever internal reasons, Lua
> happens to allocate an unrelated string as well during this time, also of
> length n?
>
> If not then this would definitely be an interesting solution.
>
>
> --
> Sent from my Game Boy.
>
> For the very reason you stated I think this is a fragile solution. You have
> no guarantee that Lua will directly call your allocator for the string push
> .. all sorts of other things could happen, and indeed change from release to
> release of Lua. It could intern the string. it could re-use an internal
> buffer cache. It could allocate some OTHER string internally before
> allocating space for your string.
>
> -Tim

Slightly modified;
-receiveMessage()
-find the next message in the queue
-set a flag saying "we're receiving a message of length n"
-lua_pushlstring(L, msg, n);
-the allocator sees the flag is set, and allocates memory blocks (with refcount set appropriately), also keeps a list of elements allocated while flag set
-unset the flag
-check the list of allocated items (if more than 1), for the one containing the string. Remove all others from the refcount list.
-receiveMessage() returns

(still doesn't cover reusing internal buffer)

Question;
When Lua returns a string, it returns a 'const', is that already a copy of the original Lua string, or is it the actual Lua string? If so, could that returned pointer then be used to link it to the refcount?

Thijs



Reply | Threaded
Open this post in threaded view
|

Re: Algorithm help: passing strings between threads with minimal copying

Tim Hill

On Apr 12, 2014, at 10:32 AM, Thijs Schreijer <[hidden email]> wrote:

>
> Question;
> When Lua returns a string, it returns a 'const', is that already a copy of the original Lua string, or is it the actual Lua string? If so, could that returned pointer then be used to link it to the refcount?
>
> Thijs
>
>
>

lua_tolstring() and lua_tostring() both return a direct pointer to the string within the Lua state; you should treat it as read-only (hence the const) and valid only while the stack index referenced is unchanged. So yes, you could find the refcount from the pointer, but I still think this is a risky approach, as it depends on assumptions about undocumented behaviors of the Lua VM.

—Tim


12