Multiple Lua Threads, __gc crash accessing "dead" thread

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple Lua Threads, __gc crash accessing "dead" thread

ThePhD
Observe the following complete, compile-able (and mostly C) code, which stores a simple "luaL_ref" wrapper in a struct named "co_test" and links up a simple creation method ("new") and a simple deletion method ("__gc"):

#include <lua.h>
#include <lualib.h>
#include <lauxlib.h>

#include <assert.h>

typedef struct co_test {
    lua_State* L;
    int ref;
} co_test;

void co_test_make(co_test* p, lua_State* L, int stack_target) {
    lua_pushvalue(L, stack_target); // copy value   
    p->L = L;
    p->ref = luaL_ref(L, LUA_REGISTRYINDEX); // save in registry
}

void co_test_destroy(co_test* p) {
    luaL_unref(p->L, LUA_REGISTRYINDEX, p->ref);
    p->ref = 0;
    p->L = NULL;
}

int co_test_new(lua_State* L) {
    co_test* p = (co_test*)lua_newuserdata(L, sizeof(co_test));
    luaL_setmetatable(L, "co_test"); // bless userdata with metatable

    co_test_make(p, L, 1);

    return 1;
}

int co_test___gc(lua_State* L) {
    co_test* p = (co_test*)lua_touserdata(L, 1);

    co_test_destroy(p);

    return 0;
}

int main(int, char*[]) {

    const char code[] = R"(
local co = coroutine.wrap(
    function()
        local t = co_test.new({ "SOME_TABLE" })
        -- STUFF
    end
)

co() -- call
co = nil -- destroy coroutine/underlying thread
collectgarbage() -- collect garbage
)";

    lua_State* L = luaL_newstate();
    luaL_Reg registration[3] = { 0 };
    int pre_stack_check = 0;
    int post_stack_check = 0;

    luaL_requiref(L, "base", luaopen_base, 1);
    lua_pop(L, 1);
   
    luaL_requiref(L, "coroutine", luaopen_coroutine, 1);
    lua_pop(L, 1);

    pre_stack_check = lua_gettop(L);
    if (luaL_newmetatable(L, "co_test") == 1) {
        registration[0] = luaL_Reg{ "new", &co_test_new };
        registration[1] = luaL_Reg{ "__gc", &co_test___gc };
        luaL_setfuncs(L, registration, 0);
        lua_pushvalue(L, -1); // duplicate metatable
        lua_setfield(L, -1, "__index"); // loop-back lookup
    }
    lua_setglobal(L, "co_test");
    post_stack_check = lua_gettop(L);
    // sanity check
    assert(pre_stack_check == post_stack_check);

    if (luaL_dostring(L, code)) {
        assert(false);
    }

    return 0;
}


In "co_test_destroy", the code segfaults (g++, clang++) / "Read Access Violation" (VC++) when doing "luaL_unref" when this code executes. I found that the reason is that the "co" variable, which holds the coroutine and thread represented by the "L" that the "co_test" was created with, dies. If you do not explicitly "nil" the "co" variable in the Lua code, the thread/state that "co_test" is created with stays alive.

This behavior is observable in Lua 5.3, 5.2 and 5.1 (but not with LuaJIT on VC++ and Windows, for some reason).

I am unsure of how to get around this issue properly. I know in 5.2 and 5.3 I can forcefully obtain the main thread's state and thusly avoid this problem altogether by just adjusting the "L" pointer in the "co_test_new" function. But I seem to be on an island for Lua 5.1.

Does anyone have any thoughts about this? Is this behavior desirable? Should I just ask my users to collect garbage before setting any coroutine/thread wrappers to "nil"?

Sincerely,
ThePhD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Lua Threads, __gc crash accessing "dead" thread

Francisco Olarte
On Wed, Sep 13, 2017 at 4:02 PM, ThePhD <[hidden email]> wrote:
> Observe the following complete, compile-able (and mostly C) code, which
> stores a simple "luaL_ref" wrapper in a struct named "co_test" and links up
> a simple creation method ("new") and a simple deletion method ("__gc"):
...
> void co_test_destroy(co_test* p) {
>     luaL_unref(p->L, LUA_REGISTRYINDEX, p->ref);
>     p->ref = 0;
>     p->L = NULL;
> }
...
> int co_test___gc(lua_State* L) {
>     co_test* p = (co_test*)lua_touserdata(L, 1);
>
>     co_test_destroy(p);
>
>     return 0;
> }
...

> In "co_test_destroy", the code segfaults (g++, clang++) / "Read Access
> Violation" (VC++) when doing "luaL_unref" when this code executes. I found
> that the reason is that the "co" variable, which holds the coroutine and
> thread represented by the "L" that the "co_test" was created with, dies. If
> you do not explicitly "nil" the "co" variable in the Lua code, the
> thread/state that "co_test" is created with stays alive.
>
> This behavior is observable in Lua 5.3, 5.2 and 5.1 (but not with LuaJIT on
> VC++ and Windows, for some reason).
>
> I am unsure of how to get around this issue properly. I know in 5.2 and 5.3
> I can forcefully obtain the main thread's state and thusly avoid this
> problem altogether by just adjusting the "L" pointer in the "co_test_new"
> function. But I seem to be on an island for Lua 5.1.

Have you tried passing the lua_state from _gc to _destroy and using
that state to unref?

I mean:

void co_test_destroy(lua_State * L, co_test* p) { *** Extra parameter.
    luaL_unref(L, LUA_REGISTRYINDEX, p->ref); *** Using parameter
instead of the data stored in p.
     p->ref = 0;
     p->L = NULL;
 }
...
int co_test___gc(lua_State* L) {
     co_test* p = (co_test*)lua_touserdata(L, 1);
     co_test_destroy(L, p); *** Passing calling state, which must be ok.
    return 0;
 }


> Does anyone have any thoughts about this? Is this behavior desirable? Should
> I just ask my users to collect garbage before setting any coroutine/thread
> wrappers to "nil"?

Well, you are using a dead state, behaviour may be undesirable, but I
would call it expected.

Francisco Olarte

Reply | Threaded
Open this post in threaded view
|

Re: Multiple Lua Threads, __gc crash accessing "dead" thread

Daurnimator
In reply to this post by ThePhD
On 14 September 2017 at 00:02, ThePhD <[hidden email]> wrote:

> Observe the following complete, compile-able (and mostly C) code, which
> stores a simple "luaL_ref" wrapper in a struct named "co_test" and links up
> a simple creation method ("new") and a simple deletion method ("__gc"):
>
> #include <lua.h>
> #include <lualib.h>
> #include <lauxlib.h>
>
> #include <assert.h>
>
> typedef struct co_test {
>     lua_State* L;
>     int ref;
> } co_test;
>
> void co_test_make(co_test* p, lua_State* L, int stack_target) {
>     lua_pushvalue(L, stack_target); // copy value
>     p->L = L;
>     p->ref = luaL_ref(L, LUA_REGISTRYINDEX); // save in registry
> }
>
> void co_test_destroy(co_test* p) {
>     luaL_unref(p->L, LUA_REGISTRYINDEX, p->ref);
>     p->ref = 0;
>     p->L = NULL;
> }
>
> int co_test_new(lua_State* L) {
>     co_test* p = (co_test*)lua_newuserdata(L, sizeof(co_test));
>     luaL_setmetatable(L, "co_test"); // bless userdata with metatable
>
>     co_test_make(p, L, 1);
>
>     return 1;
> }
>
> int co_test___gc(lua_State* L) {
>     co_test* p = (co_test*)lua_touserdata(L, 1);
>
>     co_test_destroy(p);
>
>     return 0;
> }
>
> int main(int, char*[]) {
>
>     const char code[] = R"(
> local co = coroutine.wrap(
>     function()
>         local t = co_test.new({ "SOME_TABLE" })
>         -- STUFF
>     end
> )
>
> co() -- call
> co = nil -- destroy coroutine/underlying thread
> collectgarbage() -- collect garbage
> )";
>
>     lua_State* L = luaL_newstate();
>     luaL_Reg registration[3] = { 0 };
>     int pre_stack_check = 0;
>     int post_stack_check = 0;
>
>     luaL_requiref(L, "base", luaopen_base, 1);
>     lua_pop(L, 1);
>
>     luaL_requiref(L, "coroutine", luaopen_coroutine, 1);
>     lua_pop(L, 1);
>
>     pre_stack_check = lua_gettop(L);
>     if (luaL_newmetatable(L, "co_test") == 1) {
>         registration[0] = luaL_Reg{ "new", &co_test_new };
>         registration[1] = luaL_Reg{ "__gc", &co_test___gc };
>         luaL_setfuncs(L, registration, 0);
>         lua_pushvalue(L, -1); // duplicate metatable
>         lua_setfield(L, -1, "__index"); // loop-back lookup
>     }
>     lua_setglobal(L, "co_test");
>     post_stack_check = lua_gettop(L);
>     // sanity check
>     assert(pre_stack_check == post_stack_check);
>
>     if (luaL_dostring(L, code)) {
>         assert(false);
>     }
>
>     return 0;
> }
>
>
> In "co_test_destroy", the code segfaults (g++, clang++) / "Read Access
> Violation" (VC++) when doing "luaL_unref" when this code executes. I found
> that the reason is that the "co" variable, which holds the coroutine and
> thread represented by the "L" that the "co_test" was created with, dies. If
> you do not explicitly "nil" the "co" variable in the Lua code, the
> thread/state that "co_test" is created with stays alive.
>
> This behavior is observable in Lua 5.3, 5.2 and 5.1 (but not with LuaJIT on
> VC++ and Windows, for some reason).
>
> I am unsure of how to get around this issue properly. I know in 5.2 and 5.3
> I can forcefully obtain the main thread's state and thusly avoid this
> problem altogether by just adjusting the "L" pointer in the "co_test_new"
> function. But I seem to be on an island for Lua 5.1.
>
> Does anyone have any thoughts about this? Is this behavior desirable? Should
> I just ask my users to collect garbage before setting any coroutine/thread
> wrappers to "nil"?
>
> Sincerely,
> ThePhD

I was able to run your testcase without issue in 5.2 and 5.3. (5.1
failed due to missing functions e.g. luaL_setmetatable)

Patch to make it compile in normal gcc + some debugging output:

--- phd-orig.c    2017-09-14 00:32:14.094458804 +1000
+++ phd.c    2017-09-14 00:33:00.193862671 +1000
@@ -17,4 +17,5 @@

 void co_test_destroy(co_test* p) {
+    printf("COLLECTING\n");
     luaL_unref(p->L, LUA_REGISTRYINDEX, p->ref);
     p->ref = 0;
@@ -39,5 +40,5 @@
 }

-int main(int, char*[]) {
+int main() {

     const char code[] = R"(
@@ -52,8 +53,13 @@
 co = nil -- destroy coroutine/underlying thread
 collectgarbage() -- collect garbage
+print('DONE')
 )";

     lua_State* L = luaL_newstate();
-    luaL_Reg registration[3] = { 0 };
+    luaL_Reg registration[] = {
+        { "new", &co_test_new },
+        { "__gc", &co_test___gc },
+        { NULL, NULL }
+    };
     int pre_stack_check = 0;
     int post_stack_check = 0;
@@ -67,6 +73,4 @@
     pre_stack_check = lua_gettop(L);
     if (luaL_newmetatable(L, "co_test") == 1) {
-        registration[0] = luaL_Reg{ "new", &co_test_new };
-        registration[1] = luaL_Reg{ "__gc", &co_test___gc };
         luaL_setfuncs(L, registration, 0);
         lua_pushvalue(L, -1); // duplicate metatable
@@ -79,5 +83,5 @@

     if (luaL_dostring(L, code)) {
-        assert(false);
+        assert(0);
     }

Reply | Threaded
Open this post in threaded view
|

Re: Multiple Lua Threads, __gc crash accessing "dead" thread

彭 书呆
In reply to this post by ThePhD
On 09/13/2017 10:02 PM, ThePhD wrote:

> Observe the following complete, compile-able (and mostly C) code, which stores a simple "luaL_ref" wrapper in a struct named "co_test" and links up a simple creation method ("new") and a simple deletion method ("__gc"):
>
> #include <lua.h>
> #include <lualib.h>
> #include <lauxlib.h>
>
> #include <assert.h>
>
> typedef struct co_test {
>     lua_State* L;
>     int ref;
> } co_test;
>
> void co_test_make(co_test* p, lua_State* L, int stack_target) {
>     lua_pushvalue(L, stack_target); // copy value   
>     p->L = L;
>     p->ref = luaL_ref(L, LUA_REGISTRYINDEX); // save in registry
> }
>
> void co_test_destroy(co_test* p) {
>     luaL_unref(p->L, LUA_REGISTRYINDEX, p->ref);
>     p->ref = 0;
>     p->L = NULL;
> }
>
> int co_test_new(lua_State* L) {
>     co_test* p = (co_test*)lua_newuserdata(L, sizeof(co_test));
>     luaL_setmetatable(L, "co_test"); // bless userdata with metatable
>
>     co_test_make(p, L, 1);
>
>     return 1;
> }
>
> int co_test___gc(lua_State* L) {
>     co_test* p = (co_test*)lua_touserdata(L, 1);
>
>     co_test_destroy(p);
>
>     return 0;
> }
>
> int main(int, char*[]) {
>
>     const char code[] = R"(
> local co = coroutine.wrap(
>     function()
>         local t = co_test.new({ "SOME_TABLE" })
>         -- STUFF
>     end
> )
>
> co() -- call
> co = nil -- destroy coroutine/underlying thread
> collectgarbage() -- collect garbage
> )";
>
>     lua_State* L = luaL_newstate();
>     luaL_Reg registration[3] = { 0 };
>     int pre_stack_check = 0;
>     int post_stack_check = 0;
>
>     luaL_requiref(L, "base", luaopen_base, 1);
>     lua_pop(L, 1);
>    
>     luaL_requiref(L, "coroutine", luaopen_coroutine, 1);
>     lua_pop(L, 1);
>
>     pre_stack_check = lua_gettop(L);
>     if (luaL_newmetatable(L, "co_test") == 1) {
>         registration[0] = luaL_Reg{ "new", &co_test_new };
>         registration[1] = luaL_Reg{ "__gc", &co_test___gc };
>         luaL_setfuncs(L, registration, 0);
>         lua_pushvalue(L, -1); // duplicate metatable
>         lua_setfield(L, -1, "__index"); // loop-back lookup
>     }
>     lua_setglobal(L, "co_test");
>     post_stack_check = lua_gettop(L);
>     // sanity check
>     assert(pre_stack_check == post_stack_check);
>
>     if (luaL_dostring(L, code)) {
>         assert(false);
>     }
>
>     return 0;
> }
>
>
> In "co_test_destroy", the code segfaults (g++, clang++) / "Read Access Violation" (VC++) when doing "luaL_unref" when this code executes. I found that the reason is that the "co" variable, which holds the coroutine and thread represented by the "L" that the "co_test" was created with, dies. If you do not explicitly "nil" the "co" variable in the Lua code, the thread/state that "co_test" is created with stays alive.
>
> This behavior is observable in Lua 5.3, 5.2 and 5.1 (but not with LuaJIT on VC++ and Windows, for some reason).
>
> I am unsure of how to get around this issue properly. I know in 5.2 and 5.3 I can forcefully obtain the main thread's state and thusly avoid this problem altogether by just adjusting the "L" pointer in the "co_test_new" function. But I seem to be on an island for Lua 5.1.
>
> Does anyone have any thoughts about this? Is this behavior desirable? Should I just ask my users to collect garbage before setting any coroutine/thread wrappers to "nil"?
>
> Sincerely,
> ThePhD

this is a tricky situation. basically you store a reference to a
gc managed object (the thread) outside Lua (well, by ouside Lua
I really mean beyond the gc's knowledge), and you need to deref
the pointer in the finalizer, but the collector knows nothing about
this reference relation, so it can't re-mark the referenced object
when calling the finalizer. so when your finializer got called, the
referenced thread object could already got swept.

one obvious solution is you store the referenced thread as the uservalue
or in a field of the metatable of your userdata, this way the gc will
notice the correct reference relation and make sure it not be collected
when calling the finalizer.

if for some reason you can't use the uservalue or metatable, another
possible solution I could think of is to also store the Lua thread
object in the registry (think another lua_ref), which effectively
maintains the reference relation, though in an weird indirect way.


--
the nerdy Peng / 书呆彭 /

Reply | Threaded
Open this post in threaded view
|

Re: Multiple Lua Threads, __gc crash accessing "dead" thread

ThePhD
Daurnimator: The segfault (read-acccess violation) happens with more regularity in Visual C++ builds because in debug mode the compiler forcefully sets freed and deallocated memory to a known "nothing is here" pattern like "\xCC" or "\xDD", causing it to crash more reliably. I'm sure if the code was run in a more complicated or long-running environment as I had been, it would likely exhibit undefined behavior or segfault sooner!

Francisco: Thank you for the suggestion. Unfortunately, I don't have that choice. For C code I can make that change, but my example was a stripping down of some C++ code. The thing that calls "co_test_destroy" is really a C++ destructor, and that destructor can't receive extra parameters. So while I know I can fix it if it was just plain C code, it seems like I am more or less stuck in the C++ world.

彭 书呆:

I will most likely use these suggestions to ensure the thread really doesn't die. It's probably the most surefire way to ensure the garbage collector doesn't snatch the thread out from under me. I could also go about forcefully making "L" of my "co_test" point to the main lua_State (using LUA_RIDX_MAINTHREAD and friends), in which case the registry reference will still be plenty valid. I'll have to try both and explore each route.

Thank you all for the suggestions and advice; I was really stuck on trying to figure this out for a good long while.

Sincerely,

ThePhD


On Wed, Sep 13, 2017 at 10:50 AM, 彭 书呆 <[hidden email]> wrote:
On 09/13/2017 10:02 PM, ThePhD wrote:
> Observe the following complete, compile-able (and mostly C) code, which stores a simple "luaL_ref" wrapper in a struct named "co_test" and links up a simple creation method ("new") and a simple deletion method ("__gc"):
>
> #include <lua.h>
> #include <lualib.h>
> #include <lauxlib.h>
>
> #include <assert.h>
>
> typedef struct co_test {
>     lua_State* L;
>     int ref;
> } co_test;
>
> void co_test_make(co_test* p, lua_State* L, int stack_target) {
>     lua_pushvalue(L, stack_target); // copy value   
>     p->L = L;
>     p->ref = luaL_ref(L, LUA_REGISTRYINDEX); // save in registry
> }
>
> void co_test_destroy(co_test* p) {
>     luaL_unref(p->L, LUA_REGISTRYINDEX, p->ref);
>     p->ref = 0;
>     p->L = NULL;
> }
>
> int co_test_new(lua_State* L) {
>     co_test* p = (co_test*)lua_newuserdata(L, sizeof(co_test));
>     luaL_setmetatable(L, "co_test"); // bless userdata with metatable
>
>     co_test_make(p, L, 1);
>
>     return 1;
> }
>
> int co_test___gc(lua_State* L) {
>     co_test* p = (co_test*)lua_touserdata(L, 1);
>
>     co_test_destroy(p);
>
>     return 0;
> }
>
> int main(int, char*[]) {
>
>     const char code[] = R"(
> local co = coroutine.wrap(
>     function()
>         local t = co_test.new({ "SOME_TABLE" })
>         -- STUFF
>     end
> )
>
> co() -- call
> co = nil -- destroy coroutine/underlying thread
> collectgarbage() -- collect garbage
> )";
>
>     lua_State* L = luaL_newstate();
>     luaL_Reg registration[3] = { 0 };
>     int pre_stack_check = 0;
>     int post_stack_check = 0;
>
>     luaL_requiref(L, "base", luaopen_base, 1);
>     lua_pop(L, 1);
>    
>     luaL_requiref(L, "coroutine", luaopen_coroutine, 1);
>     lua_pop(L, 1);
>
>     pre_stack_check = lua_gettop(L);
>     if (luaL_newmetatable(L, "co_test") == 1) {
>         registration[0] = luaL_Reg{ "new", &co_test_new };
>         registration[1] = luaL_Reg{ "__gc", &co_test___gc };
>         luaL_setfuncs(L, registration, 0);
>         lua_pushvalue(L, -1); // duplicate metatable
>         lua_setfield(L, -1, "__index"); // loop-back lookup
>     }
>     lua_setglobal(L, "co_test");
>     post_stack_check = lua_gettop(L);
>     // sanity check
>     assert(pre_stack_check == post_stack_check);
>
>     if (luaL_dostring(L, code)) {
>         assert(false);
>     }
>
>     return 0;
> }
>
>
> In "co_test_destroy", the code segfaults (g++, clang++) / "Read Access Violation" (VC++) when doing "luaL_unref" when this code executes. I found that the reason is that the "co" variable, which holds the coroutine and thread represented by the "L" that the "co_test" was created with, dies. If you do not explicitly "nil" the "co" variable in the Lua code, the thread/state that "co_test" is created with stays alive.
>
> This behavior is observable in Lua 5.3, 5.2 and 5.1 (but not with LuaJIT on VC++ and Windows, for some reason).
>
> I am unsure of how to get around this issue properly. I know in 5.2 and 5.3 I can forcefully obtain the main thread's state and thusly avoid this problem altogether by just adjusting the "L" pointer in the "co_test_new" function. But I seem to be on an island for Lua 5.1.
>
> Does anyone have any thoughts about this? Is this behavior desirable? Should I just ask my users to collect garbage before setting any coroutine/thread wrappers to "nil"?
>
> Sincerely,
> ThePhD

this is a tricky situation. basically you store a reference to a
gc managed object (the thread) outside Lua (well, by ouside Lua
I really mean beyond the gc's knowledge), and you need to deref
the pointer in the finalizer, but the collector knows nothing about
this reference relation, so it can't re-mark the referenced object
when calling the finalizer. so when your finializer got called, the
referenced thread object could already got swept.

one obvious solution is you store the referenced thread as the uservalue
or in a field of the metatable of your userdata, this way the gc will
notice the correct reference relation and make sure it not be collected
when calling the finalizer.

if for some reason you can't use the uservalue or metatable, another
possible solution I could think of is to also store the Lua thread
object in the registry (think another lua_ref), which effectively
maintains the reference relation, though in an weird indirect way.


--
the nerdy Peng / 书呆彭 /


Reply | Threaded
Open this post in threaded view
|

Re: Multiple Lua Threads, __gc crash accessing "dead" thread

Francisco Olarte
On Wed, Sep 13, 2017 at 6:09 PM, ThePhD <[hidden email]> wrote:

> Francisco: Thank you for the suggestion. Unfortunately, I don't have that
> choice. For C code I can make that change, but my example was a stripping
> down of some C++ code. The thing that calls "co_test_destroy" is really a
> C++ destructor, and that destructor can't receive extra parameters. So while
> I know I can fix it if it was just plain C code, it seems like I am more or
> less stuck in the C++ world.

Well, that destructor must have been called from the collectgarbage
loop in the interpreter, so you can always store the passed state
somewhere.

Another simple stuff is to just stuff the pointer to the main state in
the registry, using a lightuserdata as a key. We do this with a couple
of functions, using the getter function address as a lightuserdata as
a key, and works like a charm.

FOS