The Lua 5.1(work) GC

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

The Lua 5.1(work) GC

Jasmin Patry
I've been trying Lua 5.1(work) with our game, and I love the new
incremental GC.  It doesn't appear to be aggressive enough for our use
in stock form, however.  For our application (a console game), we need
to keep the memory high-water mark as low as possible, without
sacrificing too much performance.

After some experimenting, I see a couple possible solutions.  One is to
increase the per-step limit in luaC_step:

void luaC_step (lua_State *L) {
  global_State *g = G(L);
  l_mem lim = (g->nblocks - (g->GCthreshold - GCSTEPSIZE)) * 2;
  do {
    lim = singlestep(L, lim);
    if (g->gcstate == GCSfinalize && g->tmudata == NULL)
      break;  /* do not start new collection */
  } while (lim > 0);
  g->GCthreshold = g->nblocks + GCSTEPSIZE - lim/2;
  lua_assert((long)g->nblocks + (long)GCSTEPSIZE >= lim/2);
}

>From my analysis, the "* 2" factor seems to be the controlling element.
Increasing GCSTEPSIZE also increases the limit, but decreases the
frequency at which the GC is run.  AFAICT, the factor of 2 has been
chosen more or less arbitrarily; increasing it to 8 (while also changing
the "/2" factors to "/8" in the last two lines) has a very noticeable
impact on the high-water mark in our game.  My question to the Lua
authors is: is the factor of 2 "special" in any way?

[My speculative reasoning why the value might be 2: if you assume that
singlestep() collects lim bytes (it doesn't, but let's assume it does),
then each run through luaC_step() decreases the GCthreshold by
GCSTEPSIZE bytes (and decreases nblocks by 2*GCSTEPSIZE bytes).  If
instead of 2, we have a factor alpha, then each step would decrease the
threshold by (alpha-1)*GCSTEPSIZE bytes (and decrease nblocks by
alpha*GCSTEPSIZE bytes).]

So that's one way to increase the aggressiveness of the GC.  The other
way would be to run the GC a fixed number of steps each frame.  This is
attractive because the cost is predictable, and based on some quick
benchmarking, it looks like we can run the GC at an average consumption
rate that well exceeds our average generation rate without incurring a
significant performance hit (this is great news!).

Unfortunately the current API doesn't support this, but I made a stab at
adding this functionality to lua_gc():

    case LUA_GCSTEP: {
      int i;
      lua_lock(L);
      for( i=0; i<data; i++ ) {
        if( g->nblocks < g->GCthreshold )
          g->GCthreshold = g->nblocks;
        luaC_step(L);
      }
      lua_unlock(L);
      return 0;
    }

I'd appreciate any other suggestions at how we might tweak the GC to
suit our purposes.

Thanks,
Jasmin



Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

Roberto Ierusalimschy
> My question to the Lua authors is: is the factor of 2 "special" in any
> way?

Not at all. Quite the opposite: we also have little idea of what this
factor (and the other constants as well) should be. We need real users
with real uses (like you!!) to give us feedback about these tunnings.
(One concern is that, if the collector is too much agressive, it may
slow down regular applications. Maybe this factor should be variable??)


> Unfortunately the current API doesn't support this, but I made a stab
> at adding this functionality to lua_gc():

This is an interesting addition. My only question is: why the loop? Why
not a single call to luaC_step? Each program can make its own loop (with
its own criteria). After all, "data" has no clear meaning anyway.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

RE: The Lua 5.1(work) GC

Jasmin Patry
In reply to this post by Jasmin Patry
 
> Not at all. Quite the opposite: we also have little idea of 
> what this factor (and the other constants as well) should be. 
> We need real users with real uses (like you!!) to give us 
> feedback about these tunnings.
> (One concern is that, if the collector is too much agressive, 
> it may slow down regular applications. Maybe this factor 
> should be variable??)

I think making it a variable would make sense.  Some applications can
easily live with a less aggressive GC (e.g., relatively small footprints
apps running on PCs), while others would need a more aggressive version
(e.g., apps for embedded devices, console games).  Some apps might even
want to control the aggressiveness of the GC at runtime...?


> This is an interesting addition. My only question is: why the 
> loop? Why not a single call to luaC_step? Each program can 
> make its own loop (with its own criteria). After all, "data" 
> has no clear meaning anyway.

The only reason the loop is there is that it was convenient for my
purpose.  Also,

   lua_gc( L, LUA_GCSTEP, 8 ) /* run the GC for 8 steps */

seemed fairly intuitive.  But it's by no means essential, since as you
said the loop can be done in the user app.

Thanks,
Jasmin


Reply | Threaded
Open this post in threaded view
|

RE: The Lua 5.1(work) GC

Bilyk, Alex
In reply to this post by Jasmin Patry
I am very much in favor of as much control over GC as we can get with a reasonably good default behavior for those who don't need to fine-tune it. Having been using Lua 5.0 I have reached a point where all is needed for making GC real-time friendly is its ability to collect a given number of Kilobytes on demand in const time. If the GC is capable of doing this, one could implement any app-specific GC scheme desired. With Lua 5.0 the collection time is proportional to the number of objects in the system, it seems, that is collecting a given amount of memory (M) takes time in proportion to the number of Lua objects in existance.

Would incremental GC in Lua 5.1 allow to do this? Will we have/be able to impelent by standard means a function to the tune of 
	collect_given_amount_of_ram(mem_kb)	
whose execution time would only depend on 'mem_kb' and be virtually independent of the number of Lua objects? The ability to turn the GC off would be another desirable [debug] feature.

Thanks,
Alex 

>-----Original Message-----
>From: Jasmin Patry [[hidden email]]
>Sent: Monday, May 24, 2004 12:52 PM
>To: Lua list
>Subject: RE: The Lua 5.1(work) GC 
>
>
> 
>> Not at all. Quite the opposite: we also have little idea of 
>> what this factor (and the other constants as well) should be. 
>> We need real users with real uses (like you!!) to give us 
>> feedback about these tunnings.
>> (One concern is that, if the collector is too much agressive, 
>> it may slow down regular applications. Maybe this factor 
>> should be variable??)
>
>I think making it a variable would make sense.  Some applications can
>easily live with a less aggressive GC (e.g., relatively small 
>footprints
>apps running on PCs), while others would need a more aggressive version
>(e.g., apps for embedded devices, console games).  Some apps might even
>want to control the aggressiveness of the GC at runtime...?
>
>
>> This is an interesting addition. My only question is: why the 
>> loop? Why not a single call to luaC_step? Each program can 
>> make its own loop (with its own criteria). After all, "data" 
>> has no clear meaning anyway.
>
>The only reason the loop is there is that it was convenient for my
>purpose.  Also,
>
>   lua_gc( L, LUA_GCSTEP, 8 ) /* run the GC for 8 steps */
>
>seemed fairly intuitive.  But it's by no means essential, since as you
>said the loop can be done in the user app.
>
>Thanks,
>Jasmin
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

John Belmonte
In reply to this post by Roberto Ierusalimschy
Roberto wrote:
Not at all. Quite the opposite: we also have little idea of what this
factor (and the other constants as well) should be. We need real users
with real uses (like you!!) to give us feedback about these tunnings.
(One concern is that, if the collector is too much agressive, it may
slow down regular applications. Maybe this factor should be variable??)

I posted about my dream garbage collector for Lua over three years ago (http://lua-users.org/lists/lua-l/2001-02/msg00178.html). Implementing an incremental GC only covers half. The other half is having a good way to control it. Inspired by a paper referenced in that post, this is what I wanted: "just specify the amount of wasted memory to be tolerated as a percentage of in-use memory, with higher tolerance translating to lower GC overhead".

-John


--
http:// ift ile.org/

Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

Trevor Powell
> this is
> what I wanted: "just specify the amount of wasted memory to be tolerated
> as a percentage of in-use memory, with higher tolerance translating to
> lower GC overhead".

Whereas we, working on a very limited hardware platform, need to be able
to limit Lua's memory usage absolutely;  we pre-allocate a small block of
memory (currently about 60k), which we use exclusively for Lua.  We do
this so that Lua's frequent requests for very small blocks of memory don't
cause memory fragmentation out in the heap.

When you only have 32 megabytes of RAM available, and load files that are
often as much as 18 megabytes in size, you get paranoid about heap
fragmentation very quickly.  :)


We currently do this by explicitly resetting Lua's garbage collection
threshhold immediately after each call into Lua;  that's the only way I
could find to keep Lua's garbage collection threshhold constant, since Lua
multiplies its gc threshhold by two every time it performs garbage
collection.

Trevor Powell

Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

John Belmonte
Trevor Powell wrote:
this is
what I wanted: "just specify the amount of wasted memory to be tolerated
as a percentage of in-use memory, with higher tolerance translating to
lower GC overhead".

Whereas we, working on a very limited hardware platform, need to be able
to limit Lua's memory usage absolutely;  we pre-allocate a small block of
memory (currently about 60k), which we use exclusively for Lua.  We do
this so that Lua's frequent requests for very small blocks of memory don't
cause memory fragmentation out in the heap.

When you only have 32 megabytes of RAM available, and load files that are
often as much as 18 megabytes in size, you get paranoid about heap
fragmentation very quickly.  :)

We currently do this by explicitly resetting Lua's garbage collection
threshhold immediately after each call into Lua;  that's the only way I
could find to keep Lua's garbage collection threshhold constant, since Lua
multiplies its gc threshhold by two every time it performs garbage
collection.

I've already completed a project that uses Lua on a PS2, and in fact that's what I was working on when I wrote about the dream garbage collector.

Incremental GC (the subject of this thread) is there to reduce worst case memory and CPU usage. It's mostly orthogonal to the issue of memory fragmentation.

-John


--
http:// ift ile.org/

Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

Roberto Ierusalimschy
In reply to this post by John Belmonte
> Inspired by a paper referenced in that post, this is what I wanted:
> "just specify the amount of wasted memory to be tolerated as a
> percentage of in-use memory, with higher tolerance translating to lower
> GC overhead".

The problem here is how to compute the speed of an incremental
collector given those parameters. Suggestions are wellcome.


> Incremental GC (the subject of this thread) is there to reduce worst 
> case memory and CPU usage.

I'm not sure about that. It seems much easier to ensure worst-case
memory with a mark-and-sweep collector. CPU usage (average) also seems
to increase with an incremental GC, due to higher administrative
overhead.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

Roberto Ierusalimschy
In reply to this post by Bilyk, Alex
> Will we have/be able to impelent by standard means a function to the
> tune of
>         collect_given_amount_of_ram(mem_kb)     
> whose execution time would only depend on 'mem_kb' and be virtually
> independent of the number of Lua objects?

More or less. First, the meaning of `collect_given_amount_of_ram' is
far from obvious. When the collector is in the mark phase, that can
mean "traverse that amount of memory". But when it is in the sweep
phase, what would be an adequate meaning? (To sweep some amount of
ram is much faster than to traverse it, and is almost independent
of the amount of ram.)

Second, the execution time cannot depend *only* on mem_kb. There are
some tasks that must be atomic. Mainly, to finish a collection we must
traverse all stacks and all weak tables atomically. We assume that
programs will not have a very large number of stacks (threads) and weak
tables, but if it has then that final step may take some time. But it is
independent of the number of "regular Lua objects" (tables, closures,
strings).

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

Mark Hamburg-4
In reply to this post by Roberto Ierusalimschy
The benefit of incremental GC is eliminating (or significantly reducing)
pause times. It comes at the price of reduced overall efficiency because of
increased book keeping and possibly increased memory thrashing.

Reference counting by comparison is very good at avoiding pauses and very
good at keeping memory and resource usage down, but comes at the cost of
lots of book keeping and an inability to collect cycles. (If you worry about
the pause times for freeing large structures, you can do that
incrementally.)

I'm just getting ready to look at 5.1 (after I clean up some uses of .n).
One thing I'm curious about is whether it does anything to reduce the need
to lock the Lua state when manipulating the stack. Reducing the number of
cases that need to acquire the global lock would be useful when running Lua
in a multi-threaded environment with shared Lua data structures.

Mark

on 5/25/04 10:34 AM, Roberto Ierusalimschy at [hidden email] wrote:

>> Inspired by a paper referenced in that post, this is what I wanted:
>> "just specify the amount of wasted memory to be tolerated as a
>> percentage of in-use memory, with higher tolerance translating to lower
>> GC overhead".
> 
> The problem here is how to compute the speed of an incremental
> collector given those parameters. Suggestions are wellcome.
> 
> 
>> Incremental GC (the subject of this thread) is there to reduce worst
>> case memory and CPU usage.
> 
> I'm not sure about that. It seems much easier to ensure worst-case
> memory with a mark-and-sweep collector. CPU usage (average) also seems
> to increase with an incremental GC, due to higher administrative
> overhead.
> 
> -- Roberto


Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

John Belmonte
In reply to this post by Roberto Ierusalimschy
Roberto wrote:
Incremental GC (the subject of this thread) is there to reduce worst case memory and CPU usage.

I'm not sure about that. It seems much easier to ensure worst-case
memory with a mark-and-sweep collector. CPU usage (average) also seems
to increase with an incremental GC, due to higher administrative
overhead.

I think you are right about the memory usage, but the incremental GC benefits to worst case CPU usage are clear. I'm talking about peak GC CPU usage per unit of time of course (such as the duration of a graphics frame), not total GC CPU usage over the lifetime of the application.

-John


--
http:// ift ile.org/

Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

Roberto Ierusalimschy
> I'm talking about peak GC CPU usage [...]

Sorry. I parsed your statement incorrectly. Of course the whole point of
an incremental colector is to improve worst case CPU usage.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

Roberto Ierusalimschy
In reply to this post by Mark Hamburg-4
> I'm just getting ready to look at 5.1 (after I clean up some uses
> of .n).  One thing I'm curious about is whether it does anything to
> reduce the need to lock the Lua state when manipulating the stack.

No. We do not think that "real multithreading" in Lua (that is, several
preemptive threads sharing a single Lua state) is really a good idea and
so it is not high in our list of priorities. There are several options,
such as coroutines (i.e., non-preemptive multithreading) or threads with
independent Lua states ("Lua processes"). (One thing we are trying to
improve is the ability to yield inside metamethods and for iterators.)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

Mark Hamburg-4
I can certainly sympathize with the view point that pre-emptive
multi-threading is more trouble than it's worth. On the other hand, on a
multi-processor machine, it would seem almost mandatory. If Lua is the glue
holding a system together, then presumably it also becomes the glue
coordinating the work being done by the processors. To do this would seem to
require a way to pass data structures across the processors -- i.e., across
threads -- and a shared Lua universe seems the simplest way to do so. (I say
Lua universe to distinguish from lua_State which actually is synonymous with
a Lua thread.) Do you have recommendations for some other scheme? How should
messaging between Lua processes work?

Mark

on 5/25/04 12:31 PM, Roberto Ierusalimschy at [hidden email] wrote:

>> I'm just getting ready to look at 5.1 (after I clean up some uses
>> of .n).  One thing I'm curious about is whether it does anything to
>> reduce the need to lock the Lua state when manipulating the stack.
> 
> No. We do not think that "real multithreading" in Lua (that is, several
> preemptive threads sharing a single Lua state) is really a good idea and
> so it is not high in our list of priorities. There are several options,
> such as coroutines (i.e., non-preemptive multithreading) or threads with
> independent Lua states ("Lua processes"). (One thing we are trying to
> improve is the ability to yield inside metamethods and for iterators.)
> 
> -- Roberto


Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

Daniel Quintela-2
If you like the message-passing paradigm, you can try :
http://www.soongsoft.com/LuaTask.zip

Daniel

----- Original Message -----
From: "Mark Hamburg" <[hidden email]>
To: "Lua list" <[hidden email]>
Sent: Tuesday, May 25, 2004 5:04 PM
Subject: Re: The Lua 5.1(work) GC


> I can certainly sympathize with the view point that pre-emptive
> multi-threading is more trouble than it's worth. On the other hand, on a
> multi-processor machine, it would seem almost mandatory. If Lua is the
glue
> holding a system together, then presumably it also becomes the glue
> coordinating the work being done by the processors. To do this would seem
to
> require a way to pass data structures across the processors -- i.e.,
across
> threads -- and a shared Lua universe seems the simplest way to do so. (I
say
> Lua universe to distinguish from lua_State which actually is synonymous
with
> a Lua thread.) Do you have recommendations for some other scheme? How
should
> messaging between Lua processes work?
>
> Mark
>
> on 5/25/04 12:31 PM, Roberto Ierusalimschy at [hidden email]
wrote:
>
> >> I'm just getting ready to look at 5.1 (after I clean up some uses
> >> of .n).  One thing I'm curious about is whether it does anything to
> >> reduce the need to lock the Lua state when manipulating the stack.
> >
> > No. We do not think that "real multithreading" in Lua (that is, several
> > preemptive threads sharing a single Lua state) is really a good idea and
> > so it is not high in our list of priorities. There are several options,
> > such as coroutines (i.e., non-preemptive multithreading) or threads with
> > independent Lua states ("Lua processes"). (One thing we are trying to
> > improve is the ability to yield inside metamethods and for iterators.)
> >
> > -- Roberto
>


Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

Mark Hamburg-4
Thanks. That's interesting as a model but based on the README it's pretty
narrow on how data gets passed amongst tasks. One might be able to work
around that with table pickling I guess.

Mark

on 5/25/04 1:23 PM, Daniel Quintela at [hidden email] wrote:

> If you like the message-passing paradigm, you can try :
> http://www.soongsoft.com/LuaTask.zip
> 
> Daniel
> 
> ----- Original Message -----
> From: "Mark Hamburg" <[hidden email]>
> To: "Lua list" <[hidden email]>
> Sent: Tuesday, May 25, 2004 5:04 PM
> Subject: Re: The Lua 5.1(work) GC
> 
> 
>> I can certainly sympathize with the view point that pre-emptive
>> multi-threading is more trouble than it's worth. On the other hand, on a
>> multi-processor machine, it would seem almost mandatory. If Lua is the
> glue
>> holding a system together, then presumably it also becomes the glue
>> coordinating the work being done by the processors. To do this would seem
> to
>> require a way to pass data structures across the processors -- i.e.,
> across
>> threads -- and a shared Lua universe seems the simplest way to do so. (I
> say
>> Lua universe to distinguish from lua_State which actually is synonymous
> with
>> a Lua thread.) Do you have recommendations for some other scheme? How
> should
>> messaging between Lua processes work?
>> 
>> Mark
>> 
>> on 5/25/04 12:31 PM, Roberto Ierusalimschy at [hidden email]
> wrote:
>> 
>>>> I'm just getting ready to look at 5.1 (after I clean up some uses
>>>> of .n).  One thing I'm curious about is whether it does anything to
>>>> reduce the need to lock the Lua state when manipulating the stack.
>>> 
>>> No. We do not think that "real multithreading" in Lua (that is, several
>>> preemptive threads sharing a single Lua state) is really a good idea and
>>> so it is not high in our list of priorities. There are several options,
>>> such as coroutines (i.e., non-preemptive multithreading) or threads with
>>> independent Lua states ("Lua processes"). (One thing we are trying to
>>> improve is the ability to yield inside metamethods and for iterators.)
>>> 
>>> -- Roberto
>> 
> 


Reply | Threaded
Open this post in threaded view
|

Re: The Lua 5.1(work) GC

Jamie Webb-3
In reply to this post by Mark Hamburg-4
On Tuesday 25 May 2004 21:04, Mark Hamburg wrote:
> I can certainly sympathize with the view point that pre-emptive
> multi-threading is more trouble than it's worth. On the other hand, on a
> multi-processor machine, it would seem almost mandatory. If Lua is the glue
> holding a system together, then presumably it also becomes the glue
> coordinating the work being done by the processors. To do this would seem
> to require a way to pass data structures across the processors -- i.e.,
> across threads -- and a shared Lua universe seems the simplest way to do
> so. (I say Lua universe to distinguish from lua_State which actually is
> synonymous with a Lua thread.) Do you have recommendations for some other
> scheme? How should messaging between Lua processes work?

I think I'd like to see something like this:

- A fork operation which creates a new, identical, independent state, with 
copy-on-write for tables. That is, the table would be 'owned' by both states 
as long as it wasn't written to, but would be duplicated by whichever state 
wrote to it first. Admittedly, that would mean an extra level of indirection 
for object references to deal with their addresses changing, but I'm sure 
that could be made reasonably efficient (and it makes a copying collector 
possible too, for people worried about memory fragmentation). Strings could 
be shared between states since they are constant, but each state would need 
its own strings table. GCed values would presumably need to store a reference 
count indicating how many states they are alive in.

- A message queue for each state, which can be written to by the others. The 
data passed could be any Lua value, with the same copy-on-write method used 
for tables (so message-passing of constant values would be very cheap). A 
'shared blackboard' would be nicer, but it doesn't seem possible without 
using critical sections or somesuch, and that certainly doesn't feel very 
Lua-ish. A possible implementation of that sort of thing on top of message 
passing might be to have a separate noticeboard thread to which objects could 
be posted and recovered.

- A new __clone metamethod for userdata, which would allow them to duplicate 
themselves in an orderly fashion. The default action would be for userdatas 
to be replaced with nils in the child state. This metamethod would probably 
have to be called on access (i.e. as soon as the userdata finds its way onto 
the stack) rather than on write in order to keep things consistent.

That should mean that locks are required only for operations which change 
object reference counts, and for the message queue.

-- Jamie Webb