Possible bug in Lua's Garbage Collector

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Possible bug in Lua's Garbage Collector

Patrick Donnelly-3
I've been desperately trying to track down a problem that arose while
working on Nmap's scripting engine. I had a userdata that I pushed
onto the stack, and later (while it was still on the stack), it was
finalized unexpectedly. I've tried hard to reproduce the problem into
a small piece of code, but was unsuccessful. Instead I found something
that may be related, it appears to be a memory leak. First I'd like to
show what I'm doing in my Nmap code (long but possibly important) to
show what's going on. The function below is called via cpcall (so one
light user data argument); please just look at what's happening to the
stack and try not to pay attention to the logic.:

<code>
static int preparethreads (lua_State *L)
{
  Target *target = (Target *) lua_touserdata(L, 1);
  Target **utarget;
  PortList* plist = &(target->ports);
  Port* port = NULL;
  size_t rules_count;
  int i;

  lua_settop(L, 0); // clear stack

  utarget = (Target **) lua_newuserdata(L, sizeof(Target *)); // index 1
  *utarget = target; // set the target
  luaL_getmetatable(L, TARGET_CLASS);
  lua_setmetatable(L, -2); // set target metatable
  lua_rawgeti(L, LUA_REGISTRYINDEX, nse_threads); // index 2
  lua_rawgeti(L, LUA_REGISTRYINDEX, nse_runlevels); // index 3

  fprintf(stderr, "top = %d\n", lua_gettop(L));
  fprintf(stderr, "%s = %p\n", lua_typename(L, lua_type(L, 1)),
lua_topointer(L, 1));
  // Find all matching hostrules
  lua_getfield(L, LUA_REGISTRYINDEX, HOSTTESTS);
  rules_count = lua_objlen(L, -1);

  for (i = 1; i <= (int) rules_count; i++)
  {
    lua_rawgeti(L, -1, i);
    lua_getfield(L, -1, "hostrule");

    lua_newtable(L);
    set_hostinfo(L, target);
    lua_call(L, 1, 1);

  fprintf(stderr, "here1\n");

    if (checkboolean(L, -1, "Hostrule did not return a boolean\n"))
    {
      lua_State *thread = lua_newthread(L);
  fprintf(stderr, "here2\n");
      lua_newtable(L);
      set_hostinfo(L, target);
      lua_getfield(L, -4, ACTION); // get the action for the script
      lua_pushvalue(L, -2); // hostinfo
      lua_xmove(L, thread, 2);
      // On stack: Thread and Hostrule table

      lua_pushvalue(L, -2); // thread
      lua_createtable(L, 0, 5); // thread info
      lua_pushvalue(L, -3); // hostinfo
      lua_setfield(L, -2, HOST);
      lua_getfield(L, -6, RUNLEVEL); // get runlevel
      lua_setfield(L, -2, RUNLEVEL); // set runlevel in thread info
      lua_pushinteger(L, 0); // hostrule type
      lua_setfield(L, -2, TYPE);
  fprintf(stderr, "here3\n");
      lua_pushvalue(L, 1); // Target userdata
  fprintf(stderr, "top = %d\n", lua_gettop(L));
  fprintf(stderr, "%s = %p\n", lua_typename(L, lua_type(L, -1)),
lua_topointer(L, -1));
      lua_setfield(L, -2, TARGET); // set target
  fprintf(stderr, "here4\n");
      lua_pushliteral(L, "id");
  fprintf(stderr, "here5\n");
      lua_pushvalue(L, -1);
  fprintf(stderr, "here6\n");
      lua_gettable(L, -7); // id for script environment
  fprintf(stderr, "here7\n");
      ...
</code>

The output is:

<output>
top = 3
userdata = 0x81c4204
here1
here2
here3
top = 11
userdata = 0x81c4204
here4
here5
here6
finalizing userdata
userdata = 0x81c4204
</output>

Which doesn't make sense because the userdata is at the top of the
stack (for this function).

This lead me to try, as I said earlier, to produce a minimal example
showing this problem. I haven't been able to yet, however I did get
some sort of memory leak while try to reproduce the above (and I hope
it's connected). It's attached. It just needs statically linked to
Lua.

The output I get on my system:

<output>
Entering go
making userdata:0x805df2c
making trash
Leaving go
Memory used = 68
Entering go
making userdata:0x8070894
making trash
Leaving go
Memory used = 88
Entering go
making userdata:0x8075914
making trash
Leaving go
Memory used = 107
Entering go
making userdata:0x807b14c
making trash
Leaving go
Memory used = 162
Entering go
making userdata:0x808a11c
making trash
Leaving go
Memory used = 170
Entering go
making userdata:0x808f0bc
making trash
Leaving go
Memory used = 224
calling gc                         <--- Memory should now be near 0
Memory used = 123
</output>

You can see, even after the final full collect, it's still using 123
KiloBytes of memory. The _really_ odd thing is, if you uncomment the
second lua_settop(L) in "go", all the userdata are finalized:

<output>
Entering go
making userdata:0x805df2c
making trash
Leaving go
Memory used = 67
Entering go
making userdata:0x807049c
finalizing userdata 0x805df2c
making trash
Leaving go
Memory used = 93
Entering go
making userdata:0x8079ae4
making trash
Leaving go
Memory used = 113
Entering go
making userdata:0x807e91c
making trash
Leaving go
Memory used = 168
Entering go
making userdata:0x808ce4c
finalizing userdata 0x8079ae4
finalizing userdata 0x807049c
making trash
Leaving go
Memory used = 159
Entering go
making userdata:0x8091f2c
finalizing userdata 0x808ce4c
finalizing userdata 0x807e91c
making trash
Leaving go
Memory used = 74
calling gc
finalizing userdata 0x8091f2c
Memory used = 27
</output>

Hope this post was clear enough; It's late and I'm pretty tired from
trying to track this down.

Regards,

-- 
-Patrick Donnelly

"One of the lessons of history is that nothing is often a good thing
to do and always a clever thing to say."

-Will Durant
#include <lua.h>
#include <lauxlib.h>
#include <stdio.h>

static int finalize (lua_State *L)
{
  fprintf(stderr, "finalizing userdata %p\n", lua_topointer(L, 1));
  fflush(stderr);
  return 0;
}

static int make_class (lua_State *L)
{
  luaL_newmetatable(L, "myclass");
  lua_pushliteral(L, "__gc");
  lua_pushcclosure(L, finalize, 0);
  lua_settable(L, -3);
  lua_pop(L, 1);
}

static int go (lua_State *L)
{
  int i; lua_State *thread;
  printf("Entering go\n");
  lua_settop(L, 0);
  thread = lua_newthread(L); // index 1
  lua_newtable(L);
  lua_newtable(L);
  lua_xmove(L, thread, 2);

  //lua_settop(L, 0);

  lua_newuserdata(L, sizeof(int)*5000); // index 2
  luaL_getmetatable(L, "myclass");
  lua_setmetatable(L, -2);
  fprintf(stderr, "making %s:%p\n", lua_typename(L, lua_type(L, -1)), lua_topointer(L, -1));

  luaL_checkstack(L, 1000, "Stack overflow");
  for (i = 1; i <= 1000; i++)
    lua_newtable(L);

  lua_pushvalue(L, 2);
  lua_setfield(L, -2, "user");
  lua_pushliteral(L, "hi");
  lua_newtable(L);
  lua_settable(L, -3);
  lua_settable(L, LUA_REGISTRYINDEX);

  fprintf(stderr, "making trash\n");
  fflush(stderr);
  lua_newtable(L);
  for (i = 1; i <= 50; i++)
  {
    lua_newtable(L);
    lua_rawseti(L, -2, i);
  }
  fprintf(stderr, "Leaving go\n");
  return 0;
}

int main()
{
  int i;
  lua_State *L = luaL_newstate();
  make_class(L);

  /* lua_pushcfunction(L, go);

  lua_newtable(L);
  lua_newtable(L);
  lua_pushliteral(L, "__mode");
  lua_pushliteral(L, "k");
  lua_settable(L, -3);
  lua_setmetatable(L, -2); */

  
  lua_cpcall(L, go, NULL);
  fprintf(stderr, "Memory used = %d\n", lua_gc(L, LUA_GCCOUNT, 0));
  lua_cpcall(L, go, NULL);
  fprintf(stderr, "Memory used = %d\n", lua_gc(L, LUA_GCCOUNT, 0));
  lua_cpcall(L, go, NULL);
  fprintf(stderr, "Memory used = %d\n", lua_gc(L, LUA_GCCOUNT, 0));
  lua_cpcall(L, go, NULL);
  fprintf(stderr, "Memory used = %d\n", lua_gc(L, LUA_GCCOUNT, 0));
  lua_cpcall(L, go, NULL);
  fprintf(stderr, "Memory used = %d\n", lua_gc(L, LUA_GCCOUNT, 0));
  lua_cpcall(L, go, NULL);
  fprintf(stderr, "Memory used = %d\n", lua_gc(L, LUA_GCCOUNT, 0));
  printf("calling gc\n");
  lua_gc(L, LUA_GCCOLLECT, 0);
  fprintf(stderr, "Memory used = %d\n", lua_gc(L, LUA_GCCOUNT, 0));
  return 0;
}
Reply | Threaded
Open this post in threaded view
|

Re: Possible bug in Lua's Garbage Collector

Chris-41
On Thu, May 29, 2008 at 6:18 AM, Patrick Donnelly
<[hidden email]> wrote:
> I've been desperately trying to track down a problem that arose while
> working on Nmap's scripting engine. I had a userdata that I pushed
> onto the stack, and later (while it was still on the stack), it was
> finalized unexpectedly. I've tried hard to reproduce the problem into


I don't have time to look at all your logic but a couple things stuck out.

1. Check the return value of lua_call().

2. Instead of printing "hereX", print out the value of lua_gettop() so
you can see what is happening to the stack.  This seems like the most
likely problem.  Your user data is probably getting removed from the
stack and subsequently collected.

-- 
// Chris

Reply | Threaded
Open this post in threaded view
|

Re: Possible bug in Lua's Garbage Collector (nope)

Mike Pall-85
In reply to this post by Patrick Donnelly-3
Patrick Donnelly wrote:
> static int preparethreads (lua_State *L)

I'm not going to comment on this part, because it's hopelessly
obfuscated. I'm pretty sure you can no longer understand the code
you wrote yourself. So consider rewriting it in a more modular
fashion. I.e. divide it up into many smaller and reusable
functions with easily understood stack-effects.

And while you're at it, add a simple function which pretty-prints
all objects on the current stack (with both absolute and relative
stack slot numbers). Insert this at strategic points when
debugging instead of the printf's.

> This lead me to try, as I said earlier, to produce a minimal example
> showing this problem. I haven't been able to yet, however I did get
> some sort of memory leak while try to reproduce the above (and I hope
> it's connected). It's attached. It just needs statically linked to
> Lua.

This is not a memory leak. You are basically creating unique keys
and values (userdata and a table) and storing them into the
registry.

> You can see, even after the final full collect, it's still using 123
> KiloBytes of memory.

Of course the registry holds a strong reference to each userdata,
so they will not be collected.

> The _really_ odd thing is, if you uncomment the
> second lua_settop(L) in "go", all the userdata are finalized:

No, that's not surprising. This is changing the non-relative stack
indexes, so the lua_pushvalue(L, 2) later on doesn't reference the
userdata anymore.

A few more remarks:
- You can directly operate on threads, i.e. lua_newtable(thread).
  No need to use lua_xmove() here.
- The luaL_checkstack() call does not cover all stack slots. You
  are pushing more objects after the 1000 tables.
- You should really turn on Lua/C API checks while you are
  experimenting with tricky stack manipulations.

About your choice of the subject line: ever heard of Occam's Razor?

--Mike

Reply | Threaded
Open this post in threaded view
|

Re: Possible bug in Lua's Garbage Collector

Alex Davies
In reply to this post by Patrick Donnelly-3
I've been desperately trying to track down a problem that arose while
working on Nmap's scripting engine. I had a userdata that I pushed
onto the stack, and later (while it was still on the stack), it was
finalized unexpectedly. I've tried hard to reproduce the problem into
a small piece of code, but was unsuccessful.

Before trying the code myself, I thought I'd check that you are running with lua_assert defined? Or at least LUA_USE_APICHECK. Either would immediately rule out the possibility that a bad call is corrupting the stack.

- Alex
Reply | Threaded
Open this post in threaded view
|

Re: Possible bug in Lua's Garbage Collector

Luiz Henrique de Figueiredo
In reply to this post by Patrick Donnelly-3
> showing this problem. I haven't been able to yet, however I did get
> some sort of memory leak while try to reproduce the above

I haven't tried to follow the logic of your example, but I don't think it
qualifies as a memory leak because if you call lua_close(L) at the end of
main then all udata are finalized. So, no memory is leaked.

Also, if you build Lua with LUA_USE_APICHECK on (just #define it in luaconf.h),
then you'll get an assertion raised very soon:
	% ./a.out
	Entering go
	making userdata:0x805df2c
	a.out: lapi.c:232: lua_pushvalue: Assertion `L->top < L->ci->top' failed.
This is raised by lua_pushvalue(L, 2) just after the loop that creates
tables So, it seems to me there's a stack overflow there. However, this
overflow is not the culprit because changing 1000 in luaL_checkstack to
1200 does not give the results you expected.

After a bit of digging, the problem seems to be that go leaves the
threads in the stack, and their memory is never collected. That's why
settop to 0 solves the problem.

--lhf

Reply | Threaded
Open this post in threaded view
|

Re: Possible bug in Lua's Garbage Collector

Edgar Toernig
In reply to this post by Patrick Donnelly-3
Patrick Donnelly wrote:
>
> I had a userdata that I pushed
> onto the stack, and later (while it was still on the stack), it was
> finalized unexpectedly.

> static int preparethreads (lua_State *L)
> {
>   Target *target = (Target *) lua_touserdata(L, 1);
>   Target **utarget;
> 
>   lua_settop(L, 0); // clear stack
> 
>   utarget = (Target **) lua_newuserdata(L, sizeof(Target *)); // index 1

I haven't read further as here's already a bug.

The settop(0) removes the reference to the passed userdata.
If that was the only reference to it, 'target' is now no
longer valid.  The following lua calls (like lua_newuserdata)
may invoke the GC, free the userdata, reuse the memory for
something else and 'target' points to garbage.

Ciao, ET. 

Reply | Threaded
Open this post in threaded view
|

Re: Possible bug in Lua's Garbage Collector

Patrick Donnelly-3
In reply to this post by Luiz Henrique de Figueiredo
On Thu, May 29, 2008 at 5:42 AM, Luiz Henrique de Figueiredo
<[hidden email]> wrote:
>> showing this problem. I haven't been able to yet, however I did get
>> some sort of memory leak while try to reproduce the above
>
> I haven't tried to follow the logic of your example, but I don't think it
> qualifies as a memory leak because if you call lua_close(L) at the end of
> main then all udata are finalized. So, no memory is leaked.

I recompiled with LUA_USE_APICHECK and added a checkstack() I forgot
after adding 1000 tables to the stack (where the assertion failed).
the problem still exists. As I understand it, everything on go()s
stack should be collected after returning and lua_gc(L LUA_GCCOLLECT,
0); is called? It is not?

> Also, if you build Lua with LUA_USE_APICHECK on (just #define it in luaconf.h),
> then you'll get an assertion raised very soon:
>        % ./a.out
>        Entering go
>        making userdata:0x805df2c
>        a.out: lapi.c:232: lua_pushvalue: Assertion `L->top < L->ci->top' failed.
> This is raised by lua_pushvalue(L, 2) just after the loop that creates
> tables So, it seems to me there's a stack overflow there. However, this
> overflow is not the culprit because changing 1000 in luaL_checkstack to
> 1200 does not give the results you expected.
>
> After a bit of digging, the problem seems to be that go leaves the
> threads in the stack, and their memory is never collected. That's why
> settop to 0 solves the problem.

That sounds like a problem?

I don't see why Lua is collecting the userdata in my Nmap code as it
is clearly still on the stack (at index 1). I feel these problems may
be related.

-- 
-Patrick Donnelly

"One of the lessons of history is that nothing is often a good thing
to do and always a clever thing to say."

-Will Durant

Reply | Threaded
Open this post in threaded view
|

Re: Possible bug in Lua's Garbage Collector

Patrick Donnelly-3
Well, nevermind. Turns out I was putting those userdata in the
registry. I'm sorry about the noise and poorly named title. I still
can't explain the problem with my Nmap code though.

-- 
-Patrick Donnelly

"One of the lessons of history is that nothing is often a good thing
to do and always a clever thing to say."

-Will Durant

Reply | Threaded
Open this post in threaded view
|

Re: Possible bug in Lua's Garbage Collector

Patrick Donnelly-3
On Thu, May 29, 2008 at 1:31 PM, Patrick Donnelly
<[hidden email]> wrote:
> Well, nevermind. Turns out I was putting those userdata in the
> registry. I'm sorry about the noise and poorly named title. I still
> can't explain the problem with my Nmap code though.

The problem turned out to be an unexpected error and lua_close() was
called. I wasn't able to see this initially because an assertion
elsewhere failed which caused other debug information not to be
printed. I'd like to thank everyone for all the help and advice.

-- 
-Patrick Donnelly

"One of the lessons of history is that nothing is often a good thing
to do and always a clever thing to say."

-Will Durant