Garbage collector issues

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Garbage collector issues

Nicolas Devillard
Hello Lua-addicts:

I have a defined a userdata type (an image) that holds large quantities of
memory. Obviously, this data type should be garbage collected as soon as
possible to release resources.
The structure looks like:

struct image {
  int lx;
  int ly;
  ...
  pixelvalue * buffer
}

The 'buffer' part can be potentially huge, but the only part seen by lua
is the 'image' structure, which is fairly small (in my case 32 bytes) and
thus not considered for garbage collection until it is too late. Make the
computation: even setting the gc threshold to 1 kbyte, I can still
allocate 32 images (32x32=1024) before the GC starts acting. If each image
contained 1 Gb worth of pixels, you get a memory overflow almost
immediately.

One solution would be making the gc aware of the "true" size allocated
within userdata types. I can write an equivalent of sizeof() that would
return the true size allocated below any given struct, but how do I make
this information available to the GC?

Another solution is of course to deallocate explicitly in my lua scripts
(this is what I am currently doing), but that challenges the mere
presence of a garbage collector, doesn't it?

Are there other simpler solutions I am simply not aware of?

Thanks for helping,
-- 
Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

John Belmonte-2
If it is a graphics app why not just call collectgarbage() once per frame to
force collection?

Nicolas Devillard wrote:
> I have a defined a userdata type (an image) that holds large quantities of
> memory. Obviously, this data type should be garbage collected as soon as
> possible to release resources.



Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Steve Dekorte-4
In reply to this post by Nicolas Devillard
John Belmonte wrote:
> If it is a graphics app why not just call collectgarbage() once per frame to 
> force collection? 

That works, but doing a gc run is an expensive operation. 
  
> Nicolas Devillard wrote: 
> > I have a defined a userdata type (an image) that holds large quantities of 
> > memory. Obviously, this data type should be garbage collected as soon as 
> > possible to release resources. 

The lua gc has a memory limit that you can set with the collectgarbage() function. 
When it reaches it, it does a gc run. You might try setting this to be at the amount 
of memory that you find acceptable for your program to take. 

Steve

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Steve Dekorte-4
Steve Dekorte wrote:
> The lua gc has a memory limit that you can set with the collectgarbage() function.  
> When it reaches it, it does a gc run. You might try setting this to be at the amount  
> of memory that you find acceptable for your program to take.  

Ops, I forgot to read the beginning of this thread. You already know this and need 
a way to tell Lua  how much memory it's using on your userdata so this works right.

Steve

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Luiz Henrique de Figueiredo
In reply to this post by Nicolas Devillard
>need a way to tell Lua  how much memory it's using on your userdata

Lua 4.0 contains an undocumented function that does this and more:

	LUA_API void *lua_newuserdata (lua_State *L, size_t size);

lua_newuserdata allocates a buffer of the given size and returns it to the
user untouched, much like malloc, except that it uses Lua's internal malloc,
and that the udata is added to the pool and GC is called on it; moreover it
does take "size" into account for triggering GC.

lua_newuserdata is in lua.h, so I think it will remain there...

Note that the udata created by lua_newuserdata has no references inside Lua
at birth, and so you have to set it to a global variable or a table field,
or something; otherwise it will be collected next time.
--lhf

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Nicolas Devillard
> Lua 4.0 contains an undocumented function that does this and more:
>
> 	LUA_API void *lua_newuserdata (lua_State *L, size_t size);
>
> lua_newuserdata allocates a buffer of the given size and returns it to
> the user untouched, much like malloc, except that it uses Lua's
> internal malloc, and that the udata is added to the pool and GC is
> called on it; moreover it does take "size" into account for triggering
> GC.

Hum... that is an issue if your structure is a cascade of sub-structures
allocated by their own constructor (think of classes used in composition).
Is there any way of massaging lua_newuserdata to signal to the GC the
actual size of a userdata object? One solution could be something like:

LUA_API void lua_userdatasize(lua_State *L, size_t size)

which would assign the given size in bytes (to the GC) to the userdata
situated on top of the stack. You could then proceed with something like:

image * im = new_image(1024,1024);
which in turns calls a number of constructors like:
    im->history = new_image_history(0);
    im->data    = new_image_data(lx*ly);
    ...
which in turn might call some dedicated memory allocation routines.
Once this is done, you could push this new variable onto the stack, get
its size through an accessor you have written yourself, and signal it to
lua:

    lua_pushusertag(L, im, image_tag);
    lua_userdatasize(L, image_bytesize(im));

Another simpler way might also be to add an argument to lua_pushusertag to
declare the userdata size of it upon initialization.

I think that might be an interesting addition to the API, it would help
make Lua compatible with a number of existing libraries that already have
complicated allocation schemes you do not want to mess with.

Just my 5c...
Cheers
-- 
Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Reuben Thomas-3
In reply to this post by Luiz Henrique de Figueiredo
> Lua 4.0 contains an undocumented function...

This is getting worryingly familiar. Perhaps a good thing to do for 4.1
would be simply to "reconcile" the code and documentation, and nothing else
(apart from bug fixes). It seems that dust gets into even the cleanest
systems...

Of course, I understand that most undocumented features are undocumented
because they're not designed for external use, but this doesn't apply to
many of Lua's internal functions, well-designed as they are.

malloc seems to be a case in point: Lua has some other useful malloc
functions that it uses internally, but could usefully be exposed, as at the
moment even the built-in libraries don't use them (and rightly so) but they
would be neater and more reliable if they did, I suspect.

-- 
http://sc3d.org/rrt/ | impatience, n.  the urge to do nothing


Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Stefano Lanzavecchia
In reply to this post by Luiz Henrique de Figueiredo
> I have a defined a userdata type (an image) that holds large
> quantities of
> memory. Obviously, this data type should be garbage collected
> as soon as
> possible to release resources.

This problem looks familiar. There have been long threads on the .NET
maillist and newsgroups about deterministic finalization.
Anyway, the one thing you can do is to provide two ways of deallocation: one
via GC, and one via a custom call which leaves the userdata floating around
but deallocates the large buffers. At GC time you simply check if the
buffers are still allocated or not.
--
WildHeart'2k - [hidden email]
Homepage: http://come.to/wildheart/

<<<Borrow money from a pessimist: ---
   they don't expect it back.>>>


Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Luiz Henrique de Figueiredo
In reply to this post by Luiz Henrique de Figueiredo
>There have been long threads on the .NET
>maillist and newsgroups about deterministic finalization.

Lua suports this by calling GC tag methods in reverse order of tags.
--lhf

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Nicolas Devillard
In reply to this post by Stefano Lanzavecchia
> Anyway, the one thing you can do is to provide two ways of
> deallocation: one via GC, and one via a custom call which leaves the
> userdata floating around but deallocates the large buffers. At GC time
> you simply check if the buffers are still allocated or not.

Yep, good point. But this requires more than one destructor for the same
object, which might confuse the users of your C/C++ parts.

Another solution is to explicitly activate the garbage collector just
before you launch a function that will allocate large amounts of memory.
This way you can be sure that memory will be in its best state before you
start allocating heavily.

FYI: I have just checked the Python/C API, there is nothing better. If you
allocate your own structs in a C extension without using the Python
dedicated functions (PyMem_*), your stuff goes unnoticed by the GC. There
is no way to tell the Python GC about the true allocated size of your
objects beneath the top-level wrappers.

-- 
Nicolas


Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Luiz Henrique de Figueiredo
In reply to this post by Luiz Henrique de Figueiredo
I wrote:

 Note that the udata created by lua_newuserdata has no references inside Lua
 at birth, and so you have to set it to a global variable or a table field,
 or something; otherwise it will be collected next time.

but I've misread the code: lua_newuserdata leaves the udata on the top of the
stack, so it won't be collected while you are in the C function that called
lua_newuserdata.
--lhf

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Roberto Ierusalimschy
In reply to this post by Nicolas Devillard
> One solution could be something like:
> 
> LUA_API void lua_userdatasize(lua_State *L, size_t size)
> 
> which would assign the given size in bytes (to the GC) to the userdata
> situated on top of the stack.


You can implement this function as follows (I gess ;-):

/* in lapi.c */
LUA_API void lua_userdatasize (lua_State *L, size_t size) {
  if (ttype(L->top-1) != LUA_TUSERDATA ||
      tsvalue(L->top-1)->len != 0)
    lua_error("...");
  tsvalue(L->top-1)->len = size;
  L->nblocks += size*sizeof(char); 
}

Maybe something similar should go into the official API for 4.1...

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Edgar Toernig
Roberto Ierusalimschy wrote:
> 
> > One solution could be something like:
> >
> > LUA_API void lua_userdatasize(lua_State *L, size_t size)
> >
> > which would assign the given size in bytes (to the GC) to the userdata
> > situated on top of the stack.
> 
> You can implement this function as follows (I gess ;-):
> 
> /* in lapi.c */
> LUA_API void lua_userdatasize (lua_State *L, size_t size) {
>   if (ttype(L->top-1) != LUA_TUSERDATA ||
>       tsvalue(L->top-1)->len != 0)
>     lua_error("...");
>   tsvalue(L->top-1)->len = size;
>   L->nblocks += size*sizeof(char);
> }

Don't you think that this (below) will be much simpler and more
versatile?

LUA_API void
lua_gcaccount(lua_State *L, long amount)
{
    L->nblocks += amount;
}

That way, C code may tell the GC how many bytes to consider exported
by the C part.  amount will be negative when freeing space.
Coroutines for example may want to notify about the C stack allocated
to each thread.

There are a lot of places where L->nblocks is modified.  Wouldn't
it be easier to let luaM_malloc do that?  Of course, luaM_free
would require a size argument, too.  But this value is always known.
(Well, your patch above will break it *g*)  Aside from that, there
are architectures that could benefit from a free with a second (size)
argument.

Ciao, ET.


PS: Btw, lua_newuserdata and co are IMHO slightly broken:

> print(dostring"")  -- gives lua_pusheruserdata(L, NULL)
userdata(0): 0x80632b0
> print(dostring"")
userdata(0): 0x8063448 -- and another one?!?

Maybe this change

-  ts->u.d.value = (udata == NULL) ? uts+1 : udata;
+  ts->u.d.value = s ? uts+1 : udata;

would make a little bit more sense.  (There are other
problems too.  Still thinking about a solution.)


Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Stefano Lanzavecchia
In reply to this post by Luiz Henrique de Figueiredo
> Subject: Re: Garbage collector issues
>
> >There have been long threads on the .NET
> >maillist and newsgroups about deterministic finalization.
>
> Lua suports this by calling GC tag methods in reverse order of tags.
> --lhf

Er... yes and no. The problem is that you can force GC, but if you want
something more fine-grained so that small objects are GC'ed when required
whereas big objects or objects holding on to precious resources (database
connections) are GC'ed immediately (i.e.: when they "go out of scope") you
need some special method.

>    From: Nicolas Devillard <[hidden email]>
> Subject: Re: Garbage collector issues

> > Anyway, the one thing you can do is to provide two ways of
> > deallocation: one via GC, and one via a custom call which leaves the
> > userdata floating around but deallocates the large buffers.
> At GC time
> > you simply check if the buffers are still allocated or not.
>
> Yep, good point. But this requires more than one destructor
> for the same
> object, which might confuse the users of your C/C++ parts.

I was thinking something like (pseudolanguage):

class myClass {
	void *hugeBuffer;
	myClass() { hugeBuffer = malloc(BIGNUM); }

	~myClass() {
		if(hugeBuffer!=NULL)
			free(hugeBuffer);
	}

	dispose() {
		free(hugeBuffer);
		hugeBuffer=NULL;
	}
}

Now you can call dispose when you don't need the object any more and wait
for GC to make the userdata disappear. Or you can forget to call dispose and
wait for GC to call the destructor.

--
WildHeart'2k - [hidden email]
Homepage: http://come.to/wildheart/

<<<Katte ni kimeru na yo! ---
   I didn't agree to this!>>>


Reply | Threaded
Open this post in threaded view
|

Re: Garbage collector issues

Roberto Ierusalimschy
In reply to this post by Luiz Henrique de Figueiredo
> Don't you think that this (below) will be much simpler and more
> versatile?
> 
> LUA_API void
> lua_gcaccount(lua_State *L, long amount)
> {
>     L->nblocks += amount;
> }

You can achieve this same effect with the current API. Istead of add to 
`nblocks', you subtract from the `threshold'.

Because Lua recomputes the threshold when it does a GC cycle, you need
a GC fallback to set it back to the value you want. The final code
would be something like this:

int extracount = 0;  /* extra memory to be counted in Lua (in Kbytes) */

/* function to be set as tag method for GC - nil */
int gc_tm (lua_State *L) {
  lua_setgcthreshold(L, lua_getgcthreshold(L) + extracount);
}


void lua_gcaccount (lua_State *L, int amount) {
  int newt;
  extracount += amount;
  newt = lua_getgcthreshold(L) - amount;
  if (newt < 0) newt = 0;
  lua_setgcthreshold(L, newt);

}

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Idea for supporting better regexps

Reuben Thomas-3
In reply to this post by Edgar Toernig
I have an idea for pattern matching:

After I wrote my regex library for Lua, it occurred to me to replace strfind
and gsub with versions using POSIX regexps. This seemed to uphold the
principle of not doing in Lua what can be done perfectly well outside it (I
think Henry Spencer's regex package is (or could easily be made) pure ANSI
C, so there's no reliance on non-ANSI stuff). On the other hand, the Lua
implementation of regexs is very small.

I think I might still do this for the Luas that I use, but then I had
another idea: my regex library can already be used with PCRE if you prefer
Perl regex syntax, because PCRE supports the POSIX calling API. It then
struck me that the Lua regex package could be made to support the POSIX API
as well. regcomp would be a function that simply returned the pattern as a
string, and regexec would be a call to the matching function.

So if the Lua string matching API were recast like this, then:

1. You could use the existing way of working with no change.

2. You could plug in your favourite POSIX-compatible regex library without
altering any code (just replacing one file with another in the Lua source).

This seems to increase flexibility without hurting anyone. The only problem
I can think of is that you might want to be able to use Lua and POSIX
matching in the same Lua system. This can be handled as follows: have a
#defined symbol that determines whether gsub and strfind are defined in
terms of the Lua pattern-matching functions, or those of the supplied regex
library.

This gives you three configuration options:

1. As now.

2. Use your favourite regex library to provide the "regex" and "match"
functions, while leaving strfind and gsub working as before.

3. Use your preferred regex library to provide regex and match, and
reimplement strfind and gsub in terms of them.

To reassure those interested in backwards compatibility: with option 1,
there is no change from the current state. With option 2, current programs
work as at present (unless they rely on "match" or "regex" being undefined);
new programs can take advantage of better regexes. Option 3 gives the best
solution for scripts that want to take advantage of rich regexs.

The changes needed are mostly in the build system, plus a little tweaking of
the Lua regex code (to support option 2), and will have no impact on
efficiency.

-- 
http://sc3d.org/rrt/ | egrep, n.  a bird that debugs bison


Reply | Threaded
Open this post in threaded view
|

Re: Idea for supporting better regexps

Reuben Thomas-3
Since I've seen no replies to this suggestion I presume it's been treated
with the contempt it deserves ;-)

In a pathetic attempt to attract attention and drum up a response (even if
it's only a flame), I've come up with an even more ridiculous and grandiose
idea:

Componentise Lua.

I'm trying to achieve two goals here (apart from earning the enduring
opprobrium of this group!):

1. Make Lua's internal mechanisms available in their own right.

2. Enable them to be replaced.

Essentially, this is a matter of designing APIs for Lua's insides.

There's one such component already, ZIOLib, though in the Lua distribution
it's not actually separated out (though you can get it on its own, I
believe). Other part of Lua that would also be very nice to use outside Lua
itself are the hash table package and pattern matcher (a nice lightweight
alternative to full-on regexps). Conversely, one bit of Lua it would be good
to be able to replace is the garbage collector (so, for example, if you're
embedding Lua in a system that already has a garbage collector, you can use
that instead).

Since Lua doesn't look as though it's going to grow into a bigger language
or system (which is, IMO, excellent), it's possible to improve its internal
structure in a way that isn't doable in other systems, and really polish it,
while at the same time making it more flexible (bits can be replaced) and
more useful (rather than just an embeddable language, it's a suite of
ANSI C libraries (a rare beast!), comprising lightweight pattern matching,
heavy duty hash tables, stream IO and garbage collection.

Like my last proposal, I don't think this componentization has any
performance or compatibility penalties: Lua is pretty much built as a series
of components internally in any case. It's just a question of making those
APIs public. Will that tie down future changes? Of course it will, but
that's not a problem, as a) Lua is a maturing language, and b) its designers
have an extraordinary gift for making excellent choices, so I don't think
that (in these areas at least) they'll want to change them radically.

How about it?

-- 
http://sc3d.org/rrt/ | Caution Children At Play Drive Slowly