Userdata finalization order

classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Userdata finalization order

Petri Häkkinen
Hi,

The reference manual says:

2.10.1
"At the end of each garbage-collection cycle, the finalizers for userdata are called in reverse order of their creation, among those collected in that cycle."

Does this mean that sometimes userdata may not be collected in reverse order, i.e. when objects are collected in different cycles?

If this is the case, writing Lua bindings for complex C++ libraries just become more difficult in my mind. For example, there are typically all sorts of managers, and dependencies between objects that need to be teared down in proper order (reverse order of creation).

I'm asking this because I just hit a case that causes a crash in a C++ physics library when shutting down Lua. Basically there are a few manager objects which get collected before some other objects that have been created through the managers, and the physics library doesn't like that at all.

What would be the recommended practice to deal with these situations?

I'm using LuaJit-2.0-beta4 if that matters.

Best regards,

Petri Häkkinen


Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Wesley Smith
On Fri, Oct 15, 2010 at 10:48 PM, Petri Häkkinen <[hidden email]> wrote:

> Hi,
>
> The reference manual says:
>
> 2.10.1
> "At the end of each garbage-collection cycle, the finalizers for userdata are called in reverse order of their creation, among those collected in that cycle."
>
> Does this mean that sometimes userdata may not be collected in reverse order, i.e. when objects are collected in different cycles?
>
> If this is the case, writing Lua bindings for complex C++ libraries just become more difficult in my mind. For example, there are typically all sorts of managers, and dependencies between objects that need to be teared down in proper order (reverse order of creation).
>
> I'm asking this because I just hit a case that causes a crash in a C++ physics library when shutting down Lua. Basically there are a few manager objects which get collected before some other objects that have been created through the managers, and the physics library doesn't like that at all.
>
> What would be the recommended practice to deal with these situations?

You can't rely on userdata being collected in a particular order.
You'll have to implement some kind of notification system or improve
your collection logic when closing down a lua_State.  If a manager
gets deleted, it should notify any clients.  If a client gets freed,
it should detach itself from a manager.  If you implement this kind of
logic, there shouldn't be any crashes.

wes

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Javier Guerra Giraldez
In reply to this post by Petri Häkkinen
On Fri, Oct 15, 2010 at 3:48 PM, Petri Häkkinen <[hidden email]> wrote:
> Basically there are a few manager objects which get collected before some other objects that have been created through the managers, and the physics library doesn't like that at all.

if the manager must be the destroyed after all it's surrogate objects,
then those object should hold a reference to its respective manager.
usually it's easiest with an environment table on the userdata

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Petri Häkkinen
In reply to this post by Wesley Smith
On 15.10.2010, at 23.58, Wesley Smith <[hidden email]> wrote:

> On Fri, Oct 15, 2010 at 10:48 PM, Petri Häkkinen <[hidden email]>
>  wrote:
>> Hi,
>>
>> The reference manual says:
>>
>> 2.10.1
>> "At the end of each garbage-collection cycle, the finalizers for  
>> userdata are called in reverse order of their creation, among those  
>> collected in that cycle."
>>
>> Does this mean that sometimes userdata may not be collected in  
>> reverse order, i.e. when objects are collected in different cycles?
>>
>> If this is the case, writing Lua bindings for complex C++ libraries  
>> just become more difficult in my mind. For example, there are  
>> typically all sorts of managers, and dependencies between objects  
>> that need to be teared down in proper order (reverse order of  
>> creation).
>>
>> I'm asking this because I just hit a case that causes a crash in a C
>> ++ physics library when shutting down Lua. Basically there are a  
>> few manager objects which get collected before some other objects  
>> that have been created through the managers, and the physics  
>> library doesn't like that at all.
>>
>> What would be the recommended practice to deal with these situations?
>
> You can't rely on userdata being collected in a particular order.
> You'll have to implement some kind of notification system or improve
> your collection logic when closing down a lua_State.  If a manager
> gets deleted, it should notify any clients.  If a client gets freed,
> it should detach itself from a manager.  If you implement this kind of
> logic, there shouldn't be any crashes.
>
> wes

Yep, that sounds sensible. It's just that tracking all those  
dependencies is a lot of work (there are dozens of types/classes) and  
the binding is already several thousand lines of C++ code so I was  
hoping for an easier solution.

Cheers,

Petri
Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Ico Doornekamp


* On 2010-10-17 Petri Häkkinen <[hidden email]> wrote  :

> On 15.10.2010, at 23.58, Wesley Smith <[hidden email]> wrote:
>
>> On Fri, Oct 15, 2010 at 10:48 PM, Petri Häkkinen <[hidden email]>
>> wrote:
>>> Hi,
>>>
>>> The reference manual says:
>>>
>>> 2.10.1
>>> "At the end of each garbage-collection cycle, the finalizers for  
>>> userdata are called in reverse order of their creation, among those  
>>> collected in that cycle."
>>>
>>> Does this mean that sometimes userdata may not be collected in  
>>> reverse order, i.e. when objects are collected in different cycles?
>>>
>>> If this is the case, writing Lua bindings for complex C++ libraries  
>>> just become more difficult in my mind. For example, there are  
>>> typically all sorts of managers, and dependencies between objects  
>>> that need to be teared down in proper order (reverse order of  
>>> creation).
>>>
>>> I'm asking this because I just hit a case that causes a crash in a C
>>> ++ physics library when shutting down Lua. Basically there are a few
>>> manager objects which get collected before some other objects that
>>> have been created through the managers, and the physics library
>>> doesn't like that at all.
>>>
>>> What would be the recommended practice to deal with these situations?
>>
>> You can't rely on userdata being collected in a particular order.
>> You'll have to implement some kind of notification system or improve
>> your collection logic when closing down a lua_State.  If a manager
>> gets deleted, it should notify any clients.  If a client gets freed,
>> it should detach itself from a manager.  If you implement this kind of
>> logic, there shouldn't be any crashes.
>
> Yep, that sounds sensible. It's just that tracking all those dependencies
> is a lot of work (there are dozens of types/classes) and the binding is
> already several thousand lines of C++ code so I was hoping for an easier
> solution.

Facing a similar problem some time ago, we chose to go for light
userdata for object representation in Lua, avoiding garbage collection
altogether. This considerably simplified the Lua wrapper for the
library, and we take the extra work of keeping references and explicitly
destroying the objects for granted. Not a very Lua-ish solution, but it
seemed to be the right choice for our situation.

--
:wq
^X^Cy^K^X^C^C^C^C

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Francesco Abbate
In reply to this post by Petri Häkkinen
2010/10/17 Petri Häkkinen <[hidden email]>:
> Yep, that sounds sensible. It's just that tracking all those dependencies is
> a lot of work (there are dozens of types/classes) and the binding is already
> several thousand lines of C++ code so I was hoping for an easier solution.

Hi,

I've got exactly this kind of problem with Lua and C++. Lua is not
able to ensure a finalisation order of the userdata objects. The only
solutions I know of are:
- you use light userdata objects and you leave the memory management
to C++ by using
  reference counting => almost everyone is doing like that
- you use weak tables to mirror the relations between C++ objects so
that Lua GC can
  respect references and finalisation orders

Please note that using an environment table for userdata does not
work, I will explain why. Suppose that an object "a" depends on a
second object "b". In order to ensure the finalisation order you set a
reference to "b" in the env table of "a". This seems to work but
suppose now that both "a" and "b" becomes unreachable. What happens is
that Lua GC will free both of them regardless of the environment table
of "a" and so the finalisation order can be violated.

Please note that weak tables also have a major problem because when
the program terminate Lua will do something very coarse with
userdatas: it will free all the userdata all together regardless of
anything else. It is evident that this will screw up the finalisation
order that you was struggling to ensure with the weak tables.

The only solution that I've found to this problem was an ugly hack
made just before Lua close. The other solution would be to call "exit"
before Lua has terminated but this is equivalent to admit your defeat
:-)

For me it is evident that this is an area where Lua should be
improved. We are talking about *real* problems and not syntax sugar or
more convenient syntax or anything fancy like that.

Some times ago I've proposed a very simple modification to Lua to fix
this problem: add a function lua_addref to add a reference from a
userdata to another userdata. The idea is that every userdata will
keep a list (possibly empty) of others userdata that it depends on.
Then the GC will ensure the proper finalisation order just by
inspecting for every userdata the list of references. Unfortunately
this suggestion was dropped without too much discussions, people told
me that env table for userdata where just enough.

Unfortunately Lua authors doesn't seems to be interested to fix this
problem. As far as I understood all the people that are working with
complex C++ code use light userdata and C++ reference counting.

I have also reported a bug related to some userdata on weak tables as
keys that are finalised but still kept in the weak table. See this
thread:

http://lua-users.org/lists/lua-l/2010-08/msg00605.html

it was clear that it is a bug and Mark Hamburg helped me to fix the
problem easily but the problem was not fixed in Lua neither Lua
authors made any comment about it.

--
Francesco

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Miles Bader-2
Francesco Abbate <[hidden email]> writes:
> As far as I understood all the people that are working with
> complex C++ code use light userdata and C++ reference counting.

For allocated-in/owned-by-C++ objectsI use generally use _normal_
userdata + C++ reference counting:  the userdata holds a pointer to the
C++ obj.  

The userdata gets GCed as normal by Lua, and if Lua decides to delete
it, that ends up decrementing the C++ reference count, which may delete
the pointed-to C++ object (or not, if there are remaining references
from other C++ objects).

This works very well for me.  [Whereas light userdata never seems
particularly useful...]

-Miles

--
Selfish, adj. Devoid of consideration for the selfishness of others.

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Javier Guerra Giraldez
In reply to this post by Francesco Abbate
On Sun, Oct 17, 2010 at 6:24 AM, Francesco Abbate
<[hidden email]> wrote:
> suppose now that both "a" and "b" becomes unreachable. What happens is
> that Lua GC will free both of them regardless of the environment table
> of "a" and so the finalisation order can be violated.

is that true?

what i would expect is that "a" is unreachable, but GC considers "b"
reachable by "a", even if "a" isn't, so "b" would only be collected
after "a" (in a different GC cycle)

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Wesley Smith
On Sun, Oct 17, 2010 at 3:11 PM, Javier Guerra Giraldez
<[hidden email]> wrote:

> On Sun, Oct 17, 2010 at 6:24 AM, Francesco Abbate
> <[hidden email]> wrote:
>> suppose now that both "a" and "b" becomes unreachable. What happens is
>> that Lua GC will free both of them regardless of the environment table
>> of "a" and so the finalisation order can be violated.
>
> is that true?
>
> what i would expect is that "a" is unreachable, but GC considers "b"
> reachable by "a", even if "a" isn't, so "b" would only be collected
> after "a" (in a different GC cycle)

Except when you close the state and everything is suddenly collectable.

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Francesco Abbate
In reply to this post by Javier Guerra Giraldez
2010/10/17 Javier Guerra Giraldez <[hidden email]>:
> is that true?
>
> what i would expect is that "a" is unreachable, but GC considers "b"
> reachable by "a", even if "a" isn't, so "b" would only be collected
> after "a" (in a different GC cycle)

I was thinking like you before discovering that it is not actually the
case. I've actually experienced this problem and it took me some time
to find it.

See this thread http://lua-users.org/lists/lua-l/2010-08/msg00024.html
and the answer of Benoit Germain.

--
Francesco

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Andre Leiradella-2
In reply to this post by Petri Häkkinen
  On 17/10/2010 07:17, Petri Häkkinen wrote:

> On 15.10.2010, at 23.58, Wesley Smith <[hidden email]> wrote:
>
>> On Fri, Oct 15, 2010 at 10:48 PM, Petri Häkkinen
>> <[hidden email]> wrote:
>>> Hi,
>>>
>>> The reference manual says:
>>>
>>> 2.10.1
>>> "At the end of each garbage-collection cycle, the finalizers for
>>> userdata are called in reverse order of their creation, among those
>>> collected in that cycle."
>>>
>>> Does this mean that sometimes userdata may not be collected in
>>> reverse order, i.e. when objects are collected in different cycles?
>>>
>>> If this is the case, writing Lua bindings for complex C++ libraries
>>> just become more difficult in my mind. For example, there are
>>> typically all sorts of managers, and dependencies between objects
>>> that need to be teared down in proper order (reverse order of
>>> creation).
>>>
>>> I'm asking this because I just hit a case that causes a crash in a
>>> C++ physics library when shutting down Lua. Basically there are a
>>> few manager objects which get collected before some other objects
>>> that have been created through the managers, and the physics library
>>> doesn't like that at all.
>>>
>>> What would be the recommended practice to deal with these situations?
>>
>> You can't rely on userdata being collected in a particular order.
>> You'll have to implement some kind of notification system or improve
>> your collection logic when closing down a lua_State.  If a manager
>> gets deleted, it should notify any clients.  If a client gets freed,
>> it should detach itself from a manager.  If you implement this kind of
>> logic, there shouldn't be any crashes.
>>
>> wes
>
> Yep, that sounds sensible. It's just that tracking all those
> dependencies is a lot of work (there are dozens of types/classes) and
> the binding is already several thousand lines of C++ code so I was
> hoping for an easier solution.
In a much simpler project I had images and sub-images exposed to Lua
scripts. Sub-images were just a pointer to an image and information
about the viewport of the sub-image into the image. Of course the
collection of images when sub-images were still alive would cause a crash.

I solved this by adding the pair sub-image=image to the registry
whenever a sub-image was created, and setting sub-image=nil whenever a
sub-image was collected. This ensured the correct collection order for
these data as an image would only be collected after all sub-images were
collected.

Cheers,

Andre
>
> Cheers,
>
> Petri
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Edgar Toernig
In reply to this post by Francesco Abbate
Francesco Abbate wrote:
>
> Please note that using an environment table for userdata does not
> work, I will explain why. Suppose that an object "a" depends on a
> second object "b". In order to ensure the finalisation order you set a
> reference to "b" in the env table of "a". This seems to work but
> suppose now that both "a" and "b" becomes unreachable. What happens is
> that Lua GC will free both of them regardless of the environment table
> of "a" and so the finalisation order can be violated.

As both, "a" and "b", become unreachable, Lua is right to collect both
of them.  And it will call the gc-method in reverse order of creation:
if "b" was created before "a" (should be, as "a" needs "b") it will
call "a.gc" and then "b.gc".

If Lua calls "a.gc" before "b.gc" even if "a" was created earlier than
"b" then it's a bug.

Ciao, ET.

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Francesco Abbate
2010/10/17 E. Toernig <[hidden email]>:
> As both, "a" and "b", become unreachable, Lua is right to collect both
> of them.  And it will call the gc-method in reverse order of creation:
> if "b" was created before "a" (should be, as "a" needs "b") it will
> call "a.gc" and then "b.gc".
>
> If Lua calls "a.gc" before "b.gc" even if "a" was created earlier than
> "b" then it's a bug.

Thank you for the precision. Please note that I didn't say that this
is a bug, I just said that the utilisation of an env table for
userdata is not suitable to ensure a proper finalization order for the
objects and your remark actually confirms that.

--
Francesco

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Roberto Ierusalimschy
In reply to this post by Francesco Abbate
> 2010/10/17 Javier Guerra Giraldez <[hidden email]>:
> > is that true?
> >
> > what i would expect is that "a" is unreachable, but GC considers "b"
> > reachable by "a", even if "a" isn't, so "b" would only be collected
> > after "a" (in a different GC cycle)
>
> I was thinking like you before discovering that it is not actually the
> case. I've actually experienced this problem and it took me some time
> to find it.
>
> See this thread http://lua-users.org/lists/lua-l/2010-08/msg00024.html
> and the answer of Benoit Germain.

There is a difference between "finalization" (calling the __gc
metamethod) and "disposal" or "collecting" (actually freeing the object).

Any object that is going to be finalized, plus any object accessible
from it, is never collected before the finalization, no matter the
finalization order. In particular, when Lua closes a state, it calls all
finalizers before disposing any object. There is no way to crash Lua by
accessing Lua objects during finalization (bar bugs in Lua).

C/C++ objects is a different matter. They may be (and often are)
disposed by the finalizer of its proxy in Lua. So, if the finalizer for
object A tries to access C/C++ data associated with object B, and the
finalizer for B disposes that data, then you should ensure that B is
finalized after A. No manipulation of references or weak tables will
solve your problem here. You have basically two solutions:

- If A cannot live without B, then it should be created after B
(otherwise it would have to live without B until B is created).
If it is created after B, Lua ensures that its finalizer will be
called before B's (as pointed out by E.T.)

- If A can live without B, then it could avoid accessing B's data during
its finalization. For instance, B finalization could mark in B's Lua
part that the object is already finalized, by changing its pointer to
C/C++ data to NULL. The finalizer for A then checks whether the pointer
is NULL before accessing that data.  (Remember that the pointer itself,
stored in the userdata memory, is always accessible, even after B's
finalization.)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Francesco Abbate
2010/10/18 Roberto Ierusalimschy <[hidden email]>:
> There is a difference between "finalization" (calling the __gc
> metamethod) and "disposal" or "collecting" (actually freeing the object).
>
> Any object that is going to be finalized, plus any object accessible
> from it, is never collected before the finalization, no matter the
> finalization order. In particular, when Lua closes a state, it calls all
> finalizers before disposing any object. There is no way to crash Lua by
> accessing Lua objects during finalization (bar bugs in Lua).

Hi Roberto,

thank you for taking the time to discuss this subject.

Regarding you comment, the bug I was talking about is because actually
you *can* access userdata that have been finalized if they are in a
weak table as keys. As you know Lua will not remove immediately the
finalized object from the table if they are weak keys but it waits for
the next GC cycle. I will still think that this is a bug unless you
explain why it is not. Note also that this "behaviour" is absolutely
undocumented.

> C/C++ objects is a different matter. They may be (and often are)
> disposed by the finalizer of its proxy in Lua. So, if the finalizer for
> object A tries to access C/C++ data associated with object B, and the
> finalizer for B disposes that data, then you should ensure that B is
> finalized after A. No manipulation of references or weak tables will
> solve your problem here. You have basically two solutions:
>
> - If A cannot live without B, then it should be created after B
> (otherwise it would have to live without B until B is created).
> If it is created after B, Lua ensures that its finalizer will be
> called before B's (as pointed out by E.T.)
>
> - If A can live without B, then it could avoid accessing B's data during
> its finalization. For instance, B finalization could mark in B's Lua
> part that the object is already finalized, by changing its pointer to
> C/C++ data to NULL. The finalizer for A then checks whether the pointer
> is NULL before accessing that data.  (Remember that the pointer itself,
> stored in the userdata memory, is always accessible, even after B's
> finalization.)

Well, Roberto, here you are proposing some workaround to a Lua
problem. The first proposition is in many case not acceptable, in most
applications the object can be created in any order and the links
between then are established at run time. To impose always a specific
order of creation is simply too much restrictive or just impossible in
many applications.

Talking about the second proposition, you are saying that we should
avoid to access referenced objects during finalization because these
could have been already finalized. I'm sure you understand that this
principle is not part of the normal C++ programming practice. Normally
when an object is finalised it also releases all of its resources and
in some cases the resources can be other objects. The problem here is
"how do you know that the secondary object have already been finalized
?" There is no obvious way to know that. I believe that what you
propose is technically possible in many cases but it does impose some
extra constraints to the C++ design of the code and an additional
complexity of the problem.

I believe that in this area Lua could be *really* improved and the
demonstration is that many C++ programmer bypass the normal Lua memory
management by creating a shallow userdata (just a boxed pointer) and
use reference counting. This is obviously an extra mechanism over
Lua's GC that is needed to ensure correct memory management. I believe
that a *native* Lua mechanism to handle these cases could really
improve and ease the integration of C++ applications.

--
Francesco

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Javier Guerra Giraldez
On Mon, Oct 18, 2010 at 2:20 PM, Francesco Abbate
<[hidden email]> wrote:
> To impose always a specific
> order of creation is simply too much restrictive or just impossible in
> many applications.

i can't think of an example where:

- A can be created before B
- but A depends on B  (how did it exist before B, then?)
- you have to destroy A before B

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Francesco Abbate
2010/10/18 Javier Guerra Giraldez <[hidden email]>:
> i can't think of an example where:
>
> - A can be created before B
> - but A depends on B  (how did it exist before B, then?)
> - you have to destroy A before B

For example A is a plot and B is a line or a graphical elements. You
create first the plot wichh is initially empty and then you add
graphical elements like lines or circles or something else. Once you
have added an element to the plot this latter store a *reference* to
the element and it is therefore important to ensure that B (the
element) is not collected as far as A (the plot) exists.

But I can make to you a simple Lua example:

A = {}
B = { 'hello world' }
A[1] = B

A was created before B but now A contains a reference to B and Lua
will ensure that as far as A is still alive B will be not collected
because A contains a reference to B. And now just imagine the same
situations with C++ userdata instead of Lua objects... as you can see
it is really simple :-)

--
Francesco

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Wesley Smith
On Mon, Oct 18, 2010 at 1:56 PM, Francesco Abbate
<[hidden email]> wrote:

> 2010/10/18 Javier Guerra Giraldez <[hidden email]>:
>> i can't think of an example where:
>>
>> - A can be created before B
>> - but A depends on B  (how did it exist before B, then?)
>> - you have to destroy A before B
>
> For example A is a plot and B is a line or a graphical elements. You
> create first the plot wichh is initially empty and then you add
> graphical elements like lines or circles or something else. Once you
> have added an element to the plot this latter store a *reference* to
> the element and it is therefore important to ensure that B (the
> element) is not collected as far as A (the plot) exists.
>
> But I can make to you a simple Lua example:
>
> A = {}
> B = { 'hello world' }
> A[1] = B

So, what happens in this case in C++?


Plot *p = new Plot();
Line *l = new Line();
p->add_line(l);
delete p;

Is Line l invalid?  When it's added to the Plot does Plot take
ownership?  Increment a reference count?  Seems like you have the same
problems in C++ as you would with Lua.

wes

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

Javier Guerra Giraldez
In reply to this post by Francesco Abbate
On Mon, Oct 18, 2010 at 3:56 PM, Francesco Abbate
<[hidden email]> wrote:
> But I can make to you a simple Lua example:
>
> A = {}
> B = { 'hello world' }
> A[1] = B

in this case, i wouldn't say that "A depends on B"; but that "A owns
B" (the same with the plot, which owns the lines).  thus, finalizing a
Lua reference to B shouldn't destroy it; but let the A's finalizer to
deal with all it's owned subobjects.

--
Javier

Reply | Threaded
Open this post in threaded view
|

Re: Userdata finalization order

James Rhodes
In reply to this post by Javier Guerra Giraldez
On Tue, Oct 19, 2010 at 6:48 AM, Javier Guerra Giraldez <[hidden email]> wrote:
On Mon, Oct 18, 2010 at 2:20 PM, Francesco Abbate
<[hidden email]> wrote:
> To impose always a specific
> order of creation is simply too much restrictive or just impossible in
> many applications.

i can't think of an example where:

- A can be created before B
- but A depends on B  (how did it exist before B, then?)
- you have to destroy A before B

objA = ClassA();
objB = ClassB();
objA:Bind(objB);
-- Now A depends on B.

Regards, James.
12