Using Lua and C with a Garbage Collector

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Lua and C with a Garbage Collector

SevenThunders
I have two favorite ways to improve productivity when coding in C. The first is the use of a scripting language for the high level logic and the second is the use of a garbage collector to free one from the drudgery and bug generation of manual memory management.  

Unfortunately it is not trivial to attempt to use both techniques at the same time.  For example I have a C library that uses garbage collection (Hans Boehm et. al.) to manage a complex tree-like data structure.  I wish to expose the C library to Lua so that I can script a lot of my test routines and perform debugging in LUA.  The problem is that the Hans Boehm gc utilities can not see any of the C pointers that are being held in Lua and thus all my C objects are eventually collected even though I am still 'using' them.

For this debugging application it is probably sufficient to expose some of the garbage collection utilities to LUA and then to declare some pointers inside LUA as part of the root set.  This should be OK since the root of the tree eventually points to all the interesting objects I am looking at, but it is not a good solution in general.

Is there any clean way to do this?  The solutions I can think of all have issues.

1)  Recompile LUA so that all base memory allocations are done through the garbage collector.  One can redefine malloc, calloc, realloc and free etc. to point to the gc routines.  I am not sure how the LUA gc performs it's low level memory access however or whether this would work.

2)  Attempt to add all high level Lua pointers to the root set of the garbage collector.  I think the Hans Boehm collector will only follow pointers that it knows have been allocated by the garbage collector.  Thus finding the correct root set would be problematic.

3)  Make a 'protected' array as a global variable inside the C code that is in the data segment and hence the root set of the Boehm garbage collector.  Temporarily 'protect' C pointers by saving them in that array.  This seems like a lot of program overhead and hassle to implement.  However is it possible to force Lua to use memory whose pointers have been saved in the C code data segment?

4) Don't use the Hans Boehm collector, but rather use Lua's gc.  Is this possible?  Can memory allocated in Lua be used in C?  I'm not sure how Luas garbage collector works.
Reply | Threaded
Open this post in threaded view
|

Re: Using Lua and C with a Garbage Collector

Chris-41
On 3/14/06, SevenThunders <[hidden email]> wrote:

I have two favorite ways to improve productivity when coding in C. The first
is the use of a scripting language for the high level logic and the second
is the use of a garbage collector to free one from the drudgery and bug
generation of manual memory management.

Unfortunately it is not trivial to attempt to use both techniques at the
same time.  For example I have a C library that uses garbage collection
(Hans Boehm et. al.) to manage a complex tree-like data structure.  I wish
to expose the C library to Lua so that I can script a lot of my test
routines and perform debugging in LUA.  The problem is that the Hans Boehm
gc utilities can not see any of the C pointers that are being held in Lua
and thus all my C objects are eventually collected even though I am still
'using' them.


This sounds like all you need to do is create a Lua user data type and hook the __gc metamethod to your C garbage collector.  That way the objects persist because they exist in Lua as a user data type and when the Lua garbage collector collects the object it will be released in both places.

--
// Chris
Reply | Threaded
Open this post in threaded view
|

Re: Using Lua and C with a Garbage Collector

SevenThunders
That is really interesting.  I've used metatables for extending LUA syntax to C datatypes but I didn't realize you could attach garbage collector actions to them.  This approach is quite interesting.  It doesn't directly solve my problem in that it doesn't expose the C pointers stored in LUA data structures to the gc in C.   Thus the C garbage collector will still see the C objects stored in LUA as unreachable.  However what your solution does do is to allow the LUA gc to automatically deallocate objects from C using say the standard malloc and free approach.  That's pretty close to having a garbage collector, though all the destructors will have to be written explicitly and on the C side of things the C garbage collector will have to be dropped or disabled.
Reply | Threaded
Open this post in threaded view
|

Re: Using Lua and C with a Garbage Collector

Todor Totev
In reply to this post by SevenThunders

In Lua 5.1 it is trivial to change the memory management.
Acording the manual, the function to create lua state
receives as an argument a function that manages the memory:

   lua_State *lua_newstate (lua_Alloc f, void *ud);

In manual paragraph 3.7:

   typedef void * (*lua_Alloc) (void *ud,
                                void *ptr,
                                size_t osize,
                                size_t nsize);

Perhaps your one will be something like:

static void *l_alloc (void *ud, void *ptr, size_t osize, size_t nsize) {
   (void)ud;     /* not used */
   (void)osize;  /* not used */
   if (nsize == 0) {
     GC_FREE(ptr);  /* ANSI requires that free(NULL) has no effect */
     return NULL;
   }
   else
     return GC_REALLOC(ptr, nsize);
}

though I'm not expert in using Hans's GC.

If you are using 5.0 - you must re-define some preprocessor macros but
I have never done this before.

Regards,
Todor

On Wed, 15 Mar 2006 01:07:07 +0200, SevenThunders <[hidden email]>  
wrote:

> Unfortunately it is not trivial to attempt to use both techniques at the
> same time.  For example I have a C library that uses garbage collection
> (Hans Boehm et. al.) to manage a complex tree-like data structure.  I  
> wish
> to expose the C library to Lua so that I can script a lot of my test
> routines and perform debugging in LUA.  The problem is that the Hans  
> Boehm
> gc utilities can not see any of the C pointers that are being held in Lua
> and thus all my C objects are eventually collected even though I am still
> 'using' them.
>
> Is there any clean way to do this?  The solutions I can think of all have
> issues.

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua and C with a Garbage Collector

Chris-41
In reply to this post by SevenThunders
On 3/14/06, SevenThunders <[hidden email]> wrote:

That is really interesting.  I've used metatables for extending LUA syntax to
C datatypes but I didn't realize you could attach garbage collector actions
to them.  This approach is quite interesting.  It doesn't directly solve my
problem in that it doesn't expose the C pointers stored in LUA data
structures to the gc in C.   Thus the C garbage collector will still see the
C objects stored in LUA as unreachable.  However what your solution does do
is to allow the LUA gc to automatically deallocate objects from C using say
the standard malloc and free approach.  That's pretty close to having a
garbage collector, though all the destructors will have to be written
explicitly and on the C side of things the C garbage collector will have to
be dropped or disabled.


Actually, I wasn't suggesting dropping the C garbage collector.  I was suggesting making the Lua __gc metamethod call the C garbage collectors  "release()" or whatever on the C pointer than exists in the user type.  I don't think I can explain it very well but hopefully you get the idea.

The reverse should also work if your C garbage collector has a hook when items are collected.  You attach the C garbage collector's equivalent to __gc to a Lua variable and have it release the Lua objects as needed.

--
// Chris
Reply | Threaded
Open this post in threaded view
|

Re: Using Lua and C with a Garbage Collector

SevenThunders
In reply to this post by Todor Totev
Provided that all allocation is done through lua_newstate() you are quite right.  It looks like it uses realloc in one place, the static function l_alloc() in lauxlib.c.  I am puzzled however in that a search through the source for the phrase _newstate  shows very few actual uses of the allocator.  

Wait, actually it seems that lua_State() redefines luaL_newstate() which redefines lua_newstate() in lauxlib.h.  That's probably how the allocator gets propagated, since lua_State is called all over the place.

Thus I can either redefine realloc or even easier include gc.h in lauxlib.c and replace realloc with GC_REALLOC().  An equivalent statement is probably true for free() in the same function, which can be replace with GC_FREE().
Reply | Threaded
Open this post in threaded view
|

Re: Using Lua and C with a Garbage Collector

SevenThunders
In reply to this post by Chris-41
Maybe I'm not understanding something in your suggestion.  As I understand it it wouldn't matter if I used the metatable mechansim to automatically call the Boehm garbage collector inside LUA.  The Boehm gc has to search through memory to verify that all allocated buffers are still reachable via some variable/buffer in the workspace.  To do that it has to do a tree walk through all the pointers in it's root set and then all the pointers that they point to etc.  If I allocate a buffer that immediately disappears into LUA it is completely unreachable by the Boehm garbage collector.  

What buffer in the C  data segment holds a pointer or a pointer to a pointer et. al. allocated by the Boehm gc, that contains my C buffer now stored in Lua?  The Boehm gc has no idea what malloc/realloc is doing or what is in the buffer created by these functions, thus it must eventually mark the C buffer in LUA as unreachable and garbage collect it, causing memory faults etc.  Perhaps if LUA had key root sets on the stack or in some register, it could avoid getting collected for a while.  But my experience with this suggests that it's not long before disaster strikes.
Reply | Threaded
Open this post in threaded view
|

Re: Using Lua and C with a Garbage Collector

Chris-41
On 3/16/06, SevenThunders <[hidden email]> wrote:

Maybe I'm not understanding something in your suggestion.  As I understand it
it wouldn't matter if I used the metatable mechansim to automatically call
the Boehm garbage collector inside LUA.  The Boehm gc has to search through
memory to verify that all allocated buffers are still reachable via some
variable/buffer in the workspace.  To do that it has to do a tree walk
through all the pointers in it's root set and then all the pointers that
they point to etc.  If I allocate a buffer that immediately disappears into
LUA it is completely unreachable by the Boehm garbage collector.

What buffer in the C  data segment holds a pointer or a pointer to a pointer
et. al. allocated by the Boehm gc, that contains my C buffer now stored in
Lua?  The Boehm gc has no idea what malloc/realloc is doing or what is in
the buffer created by these functions, thus it must eventually mark the C
buffer in LUA as unreachable and garbage collect it, causing memory faults
etc.  Perhaps if LUA had key root sets on the stack or in some register, it
could avoid getting collected for a while.  But my experience with this
suggests that it's not long before disaster strikes.


At some level in the garbage collector there is a flag that says that memory/pointer/whatever is in use.  This could be a reference count or any number of things.  When the pointer is sent off to Lua it doesn't disappear as that reference still exists in the garbage collector, having it exist in Lua counts as a reference to it (ie. it should not be marked as free, because Lua has a reference to it).  Only when Lua collects the object would the reference that exists in the C garbage collector be decremented/freed via the __gc metamethod since Lua is no longer referencing it.

But I have never used a GC in C so I don't know specifically how your C garbage collector works and therefore can't provide a more concrete example.

--
// Chris
Reply | Threaded
Open this post in threaded view
|

Re: Using Lua and C with a Garbage Collector

SevenThunders
A conservative garbage collector like the Boehm collector has to do it's tree walk thing to flag all buffers that have references somewhere in memory.  That is the memory it can 'see'.  So as it's processing memory it has to check to see if a particular word is in fact a pointer to perhaps some other buffer.  How does it do this?  The only way it can check this is by seeing if a particular word is in it's current list of allocated buffers. In theory one could do this, at least in windows, by calling the Windows API function IsBadReadPtr, however this wouldn't be very portable.  I don't even know if linux has an equivalent.

Thus any buffer allocated by malloc is not in the Boehm gc's list, since it didn't call GC_MALLOC().  It can't tree walk those regular malloc  buffers.  Currently the default allocator for LUA is whatever realloc and free gives you on your system.  The Boehm garbage collector will be 'blind' to these buffers and anything put in them would be found to be unreachable.  The solution, as suggested earlier, is to overload realloc and free with GC_REALLOC() and GC_FREE().

  I'm not sure what kind of performance hit this would give.  One could I suppose actually replace free with a no-op function instead of GC_FREE.  The result would be that both LUA and C would be using the Boehm garbage collector.  This would allow Lua objects to persist in C if so desired.  Perhaps this would be considered dangerous, I don't know.
Reply | Threaded
Open this post in threaded view
|

Re: Using Lua and C with a Garbage Collector

Chris-41
On 3/16/06, SevenThunders <[hidden email]> wrote:

A conservative garbage collector like the Boehm collector has to do it's tree
walk thing to flag all buffers that have references somewhere in memory.
That is the memory it can 'see'.  So as it's processing memory it has to
check to see if a particular word is in fact a pointer to perhaps some other
buffer.  How does it do this?  The only way it can check this is by seeing
if a particular word is in it's current list of allocated buffers. In theory
one could do this, at least in windows, by calling the Windows API function
IsBadReadPtr, however this wouldn't be very portable.  I don't even know if
linux has an equivalent.

Thus any buffer allocated by malloc is not in the Boehm gc's list, since it
didn't call GC_MALLOC().  It can't tree walk those regular malloc  buffers.
Currently the default allocator for LUA is whatever realloc and free gives
you on your system.  The Boehm garbage collector will be 'blind' to these
buffers and anything put in them would be found to be unreachable.  The
solution, as suggested earlier, is to overload realloc and free with
GC_REALLOC() and GC_FREE().

  I'm not sure what kind of performance hit this would give.  One could I
suppose actually replace free with a no-op function instead of GC_FREE.  The
result would be that both LUA and C would be using the Boehm garbage
collector.  This would allow Lua objects to persist in C if so desired.
Perhaps this would be considered dangerous, I don't know.
--
View this message in context: http://www.nabble.com/Using-Lua-and-C-with-a-Garbage-Collector-t1281804.html#a3444624
Sent from the Lua - General forum at Nabble.com.


OK, we're missing each other here and I grow weary.  ;)

The Lua user data _would_ in fact have a pointer to memory allocated with GC_MALLOC so your C garbage would be able to see it just fine.  That's the reference I was talking about.

--
// Chris
Reply | Threaded
Open this post in threaded view
|

Re: Using Lua and C with a Garbage Collector

SevenThunders
The requirement is that ALL Lua data, not just the user data would have to be allocated by GC_REALLOC(). If that's what you meant than I apologize in advance for the wasted space I generated on the forum.

:)
Reply | Threaded
Open this post in threaded view
|

Re: Using Lua and C with a Garbage Collector

Chris-41
On 3/16/06, SevenThunders <[hidden email]> wrote:

The requirement is that ALL Lua data, not just the user data would have to be
allocated by GC_REALLOC(). If that's what you meant than I apologize in
advance for the wasted space I generated on the forum.

:)
--
View this message in context: http://www.nabble.com/Using-Lua-and-C-with-a-Garbage-Collector-t1281804.html#a3447979
Sent from the Lua - General forum at Nabble.com.



Well, at least any data that crosses both Lua and the C garbage collector.

--
// Chris