Weak tables and userdata finalization

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Weak tables and userdata finalization

Mark Hamburg
I believe I just worked through a bug in my project caused by weak caches together with userdata finalization, but I wanted to confirm that my analysis was correct and not just the result of sleep deprivation.

* Lua 5.1.4
* Two userdatas each with __gc metamethods. Let's call them UD1 and UD2.
* We also have a table T.
* T references UD2 -- e.g., T.x == UD2
* T is accessible in a weak cache C (getmetatable( C ).__mode == 'kv') and C[ 'key' ] == T
* UD1 has an environment that also references T -- e.g., getfenv( UD1 ).y == T

So, here is what I think is happening to me that I want to confirm is possible.

* The GC finds that neither UD1 nor UD2 (nor T) are accessible via strong links, so it queues UD1 and UD2 for finalization.
* As part of queueing UD1 for finalization, it marks its environment so that that remains available for the __gc metamethod and this in turn marks T.
* T then survives in the cache.
* We fetch T out of the cache.
* This doesn't stop UD2 from being finalized which then makes us unhappy when we go to use use UD2 via T.x.

Except that writing this down, it doesn't jive with my recollection of what the GC does. In particular, I would have expected the reference to T in C to have been cleared even if T is kept alive by the finalization logic.

The symptom, however, is clearly that we go to use UD2 and it has already been finalized. UD1, on the other hand, was just a theory.

Any other insights into how we could end up running through weak caches and reaching an object that has been or will be finalized? I know about the dangers with keys on weak tables since those are identified as potentially being needed by the __gc metamethod, but I am not iterating weak tables anywhere in a way that would discover such finalized keys.

Mark


Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Mark Hamburg
On Sep 2, 2011, at 8:44 AM, Mark Hamburg wrote:

> I believe I just worked through a bug in my project caused by weak caches together with userdata finalization, but I wanted to confirm that my analysis was correct and not just the result of sleep deprivation.
>
> * Lua 5.1.4
> * Two userdatas each with __gc metamethods. Let's call them UD1 and UD2.
> * We also have a table T.
> * T references UD2 -- e.g., T.x == UD2
> * T is accessible in a weak cache C (getmetatable( C ).__mode == 'kv') and C[ 'key' ] == T
> * UD1 has an environment that also references T -- e.g., getfenv( UD1 ).y == T
>
> So, here is what I think is happening to me that I want to confirm is possible.
>
> * The GC finds that neither UD1 nor UD2 (nor T) are accessible via strong links, so it queues UD1 and UD2 for finalization.
> * As part of queueing UD1 for finalization, it marks its environment so that that remains available for the __gc metamethod and this in turn marks T.
> * T then survives in the cache.
> * We fetch T out of the cache.
> * This doesn't stop UD2 from being finalized which then makes us unhappy when we go to use use UD2 via T.x.
>
> Except that writing this down, it doesn't jive with my recollection of what the GC does. In particular, I would have expected the reference to T in C to have been cleared even if T is kept alive by the finalization logic.
>
> The symptom, however, is clearly that we go to use UD2 and it has already been finalized. UD1, on the other hand, was just a theory.
>
> Any other insights into how we could end up running through weak caches and reaching an object that has been or will be finalized? I know about the dangers with keys on weak tables since those are identified as potentially being needed by the __gc metamethod, but I am not iterating weak tables anywhere in a way that would discover such finalized keys.

Maybe I just confirmed my original theory and demonstrated that my recollection was wrong:

  udsize = luaC_separateudata(L, 0);  /* separate userdata to be finalized */
  marktmu(g);  /* mark `preserved' userdata */
  udsize += propagateall(g);  /* remark, to propagate `preserveness' */
  cleartable(g->weak);  /* remove collected objects from weak tables */

Basic lesson: Weak tables plus finalization are a dangerous mix?

Mark


Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Mark Hamburg
On Sep 2, 2011, at 8:51 AM, Mark Hamburg wrote:

> On Sep 2, 2011, at 8:44 AM, Mark Hamburg wrote:
>
>> I believe I just worked through a bug in my project caused by weak caches together with userdata finalization, but I wanted to confirm that my analysis was correct and not just the result of sleep deprivation.
>>
>> * Lua 5.1.4
>> * Two userdatas each with __gc metamethods. Let's call them UD1 and UD2.
>> * We also have a table T.
>> * T references UD2 -- e.g., T.x == UD2
>> * T is accessible in a weak cache C (getmetatable( C ).__mode == 'kv') and C[ 'key' ] == T
>> * UD1 has an environment that also references T -- e.g., getfenv( UD1 ).y == T
>>
>> So, here is what I think is happening to me that I want to confirm is possible.
>>
>> * The GC finds that neither UD1 nor UD2 (nor T) are accessible via strong links, so it queues UD1 and UD2 for finalization.
>> * As part of queueing UD1 for finalization, it marks its environment so that that remains available for the __gc metamethod and this in turn marks T.
>> * T then survives in the cache.
>> * We fetch T out of the cache.
>> * This doesn't stop UD2 from being finalized which then makes us unhappy when we go to use use UD2 via T.x.
>>
>> Except that writing this down, it doesn't jive with my recollection of what the GC does. In particular, I would have expected the reference to T in C to have been cleared even if T is kept alive by the finalization logic.
>>
>> The symptom, however, is clearly that we go to use UD2 and it has already been finalized. UD1, on the other hand, was just a theory.
>>
>> Any other insights into how we could end up running through weak caches and reaching an object that has been or will be finalized? I know about the dangers with keys on weak tables since those are identified as potentially being needed by the __gc metamethod, but I am not iterating weak tables anywhere in a way that would discover such finalized keys.
>
> Maybe I just confirmed my original theory and demonstrated that my recollection was wrong:
>
>  udsize = luaC_separateudata(L, 0);  /* separate userdata to be finalized */
>  marktmu(g);  /* mark `preserved' userdata */
>  udsize += propagateall(g);  /* remark, to propagate `preserveness' */
>  cleartable(g->weak);  /* remove collected objects from weak tables */
>
> Basic lesson: Weak tables plus finalization are a dangerous mix?

Though does this then get worse? I had assumed that if I had an object that could be queued for finalization on the value side of a weak table, that entry would get cleared in the GC pass so one could recognize such objects by testing an appropriate mapping. For example, our proxy system for Objective-C objects uses a weak table mapping light userdata Objective-C object addresses to Lua proxies so that we generate the same proxy for a userdata object if we need to publish it to Lua multiple times. It would be really bad if we could end up using a proxy that had been scheduled for finalization. Now, I haven't seen this happening, but the above code snippet certainly suggests that it's possible. Please tell me I've missed something as opposed to just being extremely lucky...

Mark


Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Mark Hamburg

On Sep 2, 2011, at 9:06 AM, Mark Hamburg wrote:

> Though does this then get worse? I had assumed that if I had an object that could be queued for finalization on the value side of a weak table, that entry would get cleared in the GC pass so one could recognize such objects by testing an appropriate mapping. For example, our proxy system for Objective-C objects uses a weak table mapping light userdata Objective-C object addresses to Lua proxies so that we generate the same proxy for a userdata object if we need to publish it to Lua multiple times. It would be really bad if we could end up using a proxy that had been scheduled for finalization. Now, I haven't seen this happening, but the above code snippet certainly suggests that it's possible. Please tell me I've missed something as opposed to just being extremely lucky...

I feel better about this case. iscleared does allows for the collection of finalized keys. Whew.

Mark


Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Gregory Bonik
In reply to this post by Mark Hamburg
Hello Mark,

It's a great coincidence: I had exactly the same problem yesterday. My
program which uses similar cache just crashed randomly. The bug was very
hard to reproduce and I spent the whole day tracking it down.

> Now, I haven't seen this happening, but the above code snippet
> certainly suggests that it's possible. Please tell me I've missed
> something as opposed to just being extremely lucky...

So yes, you were lucky :)

The solution I ended up with is just manually removing T from the cache
in the U2's finalizer.

Let me guess: I think U2 is a dummy userdata only used because one
cannot set a finalizer for a table in Lua 5.1. AFAIK, it's possible in
Lua 5.2, so this probably would solve the problem.

Concerning the explaination of behaviour, this excerpt from the manual
may be relevant (note the last sentence):

> At the end of each garbage-collection cycle, the finalizers for
> userdata are called in reverse order of their creation, among those
> collected in that cycle. That is, the first finalizer to be called is
> the one associated with the userdata created last in the program. The
> userdata itself is freed only in the next garbage-collection cycle.

-- Gregory



Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Mark Hamburg
On Sep 2, 2011, at 12:11 PM, Gregory Bonik wrote:

> Hello Mark,
>
> It's a great coincidence: I had exactly the same problem yesterday. My
> program which uses similar cache just crashed randomly. The bug was very
> hard to reproduce and I spent the whole day tracking it down.
>
>> Now, I haven't seen this happening, but the above code snippet
>> certainly suggests that it's possible. Please tell me I've missed
>> something as opposed to just being extremely lucky...
>
> So yes, you were lucky :)

No, actually, as I noted in a follow up, it turns out that the case that really made me paranoid is taken care of. Direct caches of userdata get cleared when marked for finalization. It's just indirect caches that have problems and those need something else to keep the values alive.

The actual case we were dealing with:

View system userdata objects have Lua environments to couple them into the Lua logic. These point to other objects supplying values through a binding mechanism. In particular, we can have value suppliers supplying images which are themselves represented with userdata objects.

Collecting the view data preserves its environment which preserves the value supplier but doesn't keep the image being supplied from being finalized. Code comes along and constructs a new view which wants to find a supplier for a particular image and gets the cached which leads to it try to use a finalized image.

Mark


Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Gregory Bonik
Mark Hamburg wrote:

> The actual case we were dealing with:
>
> View system userdata objects have Lua environments to couple them into
> the Lua logic. These point to other objects supplying values through a
> binding mechanism. In particular, we can have value suppliers
> supplying images which are themselves represented with userdata
> objects.
>
> Collecting the view data preserves its environment which preserves the
> value supplier but doesn't keep the image being supplied from being
> finalized. Code comes along and constructs a new view which wants to
> find a supplier for a particular image and gets the cached which leads
> to it try to use a finalized image.

If I understand correctly the case, there is another problem here. It
can happen so that the new view is allocated exactly at the same address
as the old one. As I can conclude from you original post, you have a
mapping (via a weak table) from the lightuserdata addresses to the env
tables of the views.

Suppose that GC finalizes old view's userdata but keeps its env table,
then a new view with the same address is constructed and mapped to the
env table of the old defunct view.

I my case, I have a sort of "indirect" cache: C++ object addresses are
mapped to env tables via a special weak table. Env tables are used to
store Lua callbacks connected to objects. Multiple userdata can
represent the same object, so they share the same env table. These
userdata do not have a finalizer. Instead, a special userdata is
referenced from each env table. Its sole purpose is to decrement C++
object's reference count in the finalizer. Everything went fine until
the memory allocator started to reuse disposed memory for new objects.

-- Gregory


Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Mark Hamburg
On Sep 3, 2011, at 3:13 PM, Gregory Bonik wrote:

> Mark Hamburg wrote:
>> The actual case we were dealing with:
>>
>> View system userdata objects have Lua environments to couple them into
>> the Lua logic. These point to other objects supplying values through a
>> binding mechanism. In particular, we can have value suppliers
>> supplying images which are themselves represented with userdata
>> objects.
>>
>> Collecting the view data preserves its environment which preserves the
>> value supplier but doesn't keep the image being supplied from being
>> finalized. Code comes along and constructs a new view which wants to
>> find a supplier for a particular image and gets the cached which leads
>> to it try to use a finalized image.
>
> If I understand correctly the case, there is another problem here. It
> can happen so that the new view is allocated exactly at the same address
> as the old one. As I can conclude from you original post, you have a
> mapping (via a weak table) from the lightuserdata addresses to the env
> tables of the views.
>
> Suppose that GC finalizes old view's userdata but keeps its env table,
> then a new view with the same address is constructed and mapped to the
> env table of the old defunct view.

Not a problem, the table goes from light userdata (the view) to full userdata (the Lua-side proxy) and from that we go to the environment.
>
> I my case, I have a sort of "indirect" cache: C++ object addresses are
> mapped to env tables via a special weak table. Env tables are used to
> store Lua callbacks connected to objects. Multiple userdata can
> represent the same object, so they share the same env table. These
> userdata do not have a finalizer. Instead, a special userdata is
> referenced from each env table. Its sole purpose is to decrement C++
> object's reference count in the finalizer. Everything went fine until
> the memory allocator started to reuse disposed memory for new objects.

Yeah. We're safe from that because the existence of the non-finalized Lua proxy will keep the native object alive and the only thing we map to from the native object is that non-finalized Lua proxy.

Mark


Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

fatfatson
hi, mark & Gregory ~
maybe i have the similar problem with you..
i also use a weak table to save the map between c++ object address to its lua table,
every c++ object is represented as a lua table, whose array part element[1] is a userdata for saving the c++ object's address,
and the __gc method for the userdata is set for calling the object's ref-counter decrease
at most time it goes well, but if i adjust the gc parameter "setpause" to a little value, some strange crash would happen, and the dump looks like a memory corrupt.
i have no much knownodge about lua gc, so what's your conclusion for the problem? is my c++ bind solution correct ?
thanks~
 
 
2011/9/4 Mark Hamburg <[hidden email]>
On Sep 3, 2011, at 3:13 PM, Gregory Bonik wrote:

> Mark Hamburg wrote:
>> The actual case we were dealing with:
>>
>> View system userdata objects have Lua environments to couple them into
>> the Lua logic. These point to other objects supplying values through a
>> binding mechanism. In particular, we can have value suppliers
>> supplying images which are themselves represented with userdata
>> objects.
>>
>> Collecting the view data preserves its environment which preserves the
>> value supplier but doesn't keep the image being supplied from being
>> finalized. Code comes along and constructs a new view which wants to
>> find a supplier for a particular image and gets the cached which leads
>> to it try to use a finalized image.
>
> If I understand correctly the case, there is another problem here. It
> can happen so that the new view is allocated exactly at the same address
> as the old one. As I can conclude from you original post, you have a
> mapping (via a weak table) from the lightuserdata addresses to the env
> tables of the views.
>
> Suppose that GC finalizes old view's userdata but keeps its env table,
> then a new view with the same address is constructed and mapped to the
> env table of the old defunct view.

Not a problem, the table goes from light userdata (the view) to full userdata (the Lua-side proxy) and from that we go to the environment.
>
> I my case, I have a sort of "indirect" cache: C++ object addresses are
> mapped to env tables via a special weak table. Env tables are used to
> store Lua callbacks connected to objects. Multiple userdata can
> represent the same object, so they share the same env table. These
> userdata do not have a finalizer. Instead, a special userdata is
> referenced from each env table. Its sole purpose is to decrement C++
> object's reference count in the finalizer. Everything went fine until
> the memory allocator started to reuse disposed memory for new objects.

Yeah. We're safe from that because the existence of the non-finalized Lua proxy will keep the native object alive and the only thing we map to from the native object is that non-finalized Lua proxy.

Mark



Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Gregory Bonik
冶晶邓 wrote:

> hi, mark & Gregory ~
> maybe i have the similar problem with you..
> i also use a weak table to save the map between c++ object address to
> its lua table,
> every c++ object is represented as a lua table, whose array part
> element[1] is a userdata for saving the c++ object's address,
> and the __gc method for the userdata is set for calling the object's
> ref-counter decrease
> at most time it goes well, but if i adjust the gc parameter "setpause"
> to a little value, some strange crash would happen, and the dump looks
> like a memory corrupt.
> i have no much knownodge about lua gc, so what's your conclusion for
> the problem? is my c++ bind solution correct ?
> thanks~

The solution is to clear the corresponding entry in the weak table
manually in the __gc metamethod right before you decrease the reference
count, i. e. your __gc should look like that:

int gc(lua_State* L)
{
    Object* obj = *(Object**)lua_touserdata(L, 1);
    lua_getfield(L, LUA_REGISTRYINDEX, "map");
    lua_pushlightuserdata(L, obj);
    lua_pushnil(L);
    lua_rawset(L, -3);
    lua_pop(L, 1);

    obj->unref();
    return 0;
}

--Gregory



Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Mark Hamburg
In reply to this post by Mark Hamburg
On Sep 2, 2011, at 8:44 AM, Mark Hamburg wrote:

> I believe I just worked through a bug in my project caused by weak caches together with userdata finalization, but I wanted to confirm that my analysis was correct and not just the result of sleep deprivation.
>
> * Lua 5.1.4
> * Two userdatas each with __gc metamethods. Let's call them UD1 and UD2.
> * We also have a table T.
> * T references UD2 -- e.g., T.x == UD2
> * T is accessible in a weak cache C (getmetatable( C ).__mode == 'kv') and C[ 'key' ] == T
> * UD1 has an environment that also references T -- e.g., getfenv( UD1 ).y == T
>
> So, here is what I think is happening to me that I want to confirm is possible.
>
> * The GC finds that neither UD1 nor UD2 (nor T) are accessible via strong links, so it queues UD1 and UD2 for finalization.
> * As part of queueing UD1 for finalization, it marks its environment so that that remains available for the __gc metamethod and this in turn marks T.
> * T then survives in the cache.
> * We fetch T out of the cache.
> * This doesn't stop UD2 from being finalized which then makes us unhappy when we go to use use UD2 via T.x.

The fix for my code (not yet implemented) and possibly the fix for "your" code too for those of you falling victim to similar patterns: There's no problem that can't be solved with an extra level of indirection...

In this case, we add a fully-weak table to the mix. When finalizing UD1 as it turns out, I need access to the contents of the userdata, but I don't need access to the contents of the environment. So, we change the environment for UD1 to be a fully-weak table leading to the table that was the environment in the old design:

        getfenv( UD1 )[ 1 ].y = T
        getmetatable( getfenv( UD1 ) ).__mode == 'kv'


Now, when the garbage collector decides to finalize UD1, it will keep getfenv( UD1 ) alive for one more cycle, but this won't keep the old environment table alive and hence won't keep T alive.

So, weak tables look to be part of the problem and part of the solution. Note, however, that this does not work if we need the actual environment table while running the finalizer for UD1.

Mark


Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Roberto Ierusalimschy
In reply to this post by Mark Hamburg
> Maybe I just confirmed my original theory and demonstrated that my
> recollection was wrong:
>
>   udsize = luaC_separateudata(L, 0);  /* separate userdata to be finalized */
>   marktmu(g);  /* mark `preserved' userdata */
>   udsize += propagateall(g);  /* remark, to propagate `preserveness' */
>   cleartable(g->weak);  /* remove collected objects from weak tables */
>
> Basic lesson: Weak tables plus finalization are a dangerous mix?

Certainly this basic lesson is correct.

Nevertheless, if you did not find a bug, at least it seems an
inconsistency. Anything anchored through finalizing userdata should have
the same behavior (related to weak tables) of those userdata. Maybe
Lua should clear tables with weak values before marking userdata to be
finalized. (With that change, it seems that function 'iscleared' would
not need to treat finalized userdata in keys as a special case.)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Mark Hamburg
On Sep 15, 2011, at 6:57 AM, Roberto Ierusalimschy wrote:

>> Maybe I just confirmed my original theory and demonstrated that my
>> recollection was wrong:
>>
>>  udsize = luaC_separateudata(L, 0);  /* separate userdata to be finalized */
>>  marktmu(g);  /* mark `preserved' userdata */
>>  udsize += propagateall(g);  /* remark, to propagate `preserveness' */
>>  cleartable(g->weak);  /* remove collected objects from weak tables */
>>
>> Basic lesson: Weak tables plus finalization are a dangerous mix?
>
> Certainly this basic lesson is correct.
>
> Nevertheless, if you did not find a bug, at least it seems an
> inconsistency. Anything anchored through finalizing userdata should have
> the same behavior (related to weak tables) of those userdata. Maybe
> Lua should clear tables with weak values before marking userdata to be
> finalized. (With that change, it seems that function 'iscleared' would
> not need to treat finalized userdata in keys as a special case.)

That would be more consistent in the behavior. As you might guess from this thread, it was a bit of a puzzle figuring out how we got burned since direct references in weak tables did get cleared.

That said, doing so would mean that userdata that needed to access other data would have to use the userdata environment rather than weak tables, so it's going to break some code out there that failed to adopt environments as the mechanism of choice. On the other hand, since semi-weak tables in Lua 5.1 are prone to GC cycles, they were a bad way to handle attaching Lua data to userdata anyway so breaking that code is probably for the best assuming it leads to it getting fixed. What's more, breaking it in this way won't introduce subtle errors since data one expected to be there suddenly won't be.

If I don't care about that change in semantics, is the minimal change to Lua 5.1 to move the cleartable call above the marktmu call? I've got my two-stage, weak-reference solution but a more robust -- i.e., less dependent on code doing "the right thing" -- solution seems desirable.

Mark


Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Roberto Ierusalimschy
> That said, doing so would mean that userdata that needed to access
> other data would have to use the userdata environment rather than
> weak tables, so it's going to break some code out there that failed
> to adopt environments as the mechanism of choice.

My suggestion was to move the clearing only of weak values. Weak keys
would continue to be cleared after the separation of userdata. (Lua 5.2
has different 'cleartable' calls for different degrees of weaknesses.)
So, this technique (weak userdata -> strong value used in finalization)
would continue to work as before.


> If I don't care about that change in semantics, is the minimal change
> to Lua 5.1 to move the cleartable call above the marktmu call?

I think so, but I would have to test.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Weak tables and userdata finalization

Roberto Ierusalimschy
In reply to this post by Mark Hamburg
> If I don't care about that change in semantics, is the minimal change
> to Lua 5.1 to move the cleartable call above the marktmu call? I've
> got my two-stage, weak-reference solution but a more robust -- i.e.,
> less dependent on code doing "the right thing" -- solution seems
> desirable.

Probably you can move the cleartable call above the marktmu call, but
you also need to keep another call in the original position. The marktmu
phase (and following propagateall) may ressurect some weak table that
will have to be cleared too. See next example:

local u = newproxy(true)
-- __gc metamethod should be collected before running
setmetatable(getmetatable(u), {__mode = "v"})
getmetatable(u).__gc = function (o) os.exit(1) end  -- cannot happen
u = nil
collectgarbage()


-- Roberto