Lua assertion during garbage collection when using a weak table

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Lua assertion during garbage collection when using a weak table

Sascha Zelzer
Hi,

I am using a debug build of Lua 5.3.5 which defines "lua_assert" to
"assert" from the cassert header. When I run the minimal working example
from the bottom of this mail, I am hitting the following assertion
(sometimes the program needs to run multiple times until the assert
condition is false):

Assertion failed: g->ephemeron == NULL && g->weak == NULL, file
D:\lua\src\lua-5.3.5\src\lgc.c, line 987

where g->ephemeron is NULL but g->weak is not.

Could someone tell me if the assertion is pointing to a real problem, or
if its maybe not a valid condition to assert for in the first place?

Many thanks,

Sascha


------- MWE start ------------

local function f()
  local weaktable = setmetatable({}, { __mode = "v" })
 
  return function()
    local row = weaktable[1]
    if not row then
      row = { a = 1 }
      weaktable[1] = row
    end
    return row
  end
end

for i=1,10000 do
  local t = f()()
end

------- MWE end ------------



Reply | Threaded
Open this post in threaded view
|

Re: Lua assertion during garbage collection when using a weak table

Andrew Gierth
>>>>> "Sascha" == Sascha Zelzer <[hidden email]> writes:

 Sascha> Could someone tell me if the assertion is pointing to a real
 Sascha> problem, or if its maybe not a valid condition to assert for in
 Sascha> the first place?

I _think_ the assertion is invalid and that there's no real problem.
(But I'm not completely sure.)

This is my reading of the code:

The assertion is in the "atomic" function called by singlestep when the
current collection phase is GCSatomic. We get into the atomic phase when
the previous phase, propagation, ran out of objects in g->gray.
Propagation will have left any weak tables on the "grayagain" list;
unless I'm misreading something, it looks like it's intended that
g->weak and g->ephemeron should be empty outside of the atomic phase.

But the atomic step is this:

    case GCSatomic: {
      lu_mem work;
      propagateall(g);  /* make sure gray list is empty */
      work = atomic(L);  /* work is what was traversed by 'atomic' */
      entersweep(L);
      g->GCestimate = gettotalbytes(g);  /* first estimate */;
      return work;
    }

If the gray list had a weak table (of mode "k" or "v", but not "kv")
added to it between the end of the propagation phase and the start of
the atomic step, then propagateall will move that table to either
g->weak or g->ephemeron depending on the weak mode. The following call
to atomic(L) will then hit the assertion.

So the question is, can a weak table be added to the gray list in this
time window? That can happen if the table is marked via one of the
barrier functions, and it looks like luaC_upvalbarrier is the likely
candidate: when the returned anonymous function is closed over
"weaktable", the value (the weak table) is marked and becomes gray.

So I think your program triggers the assertion if it so happens that the
closing of the anonymous function value on exit from f() happens to fall
between the end of a propagate phase of GC and the following atomic
step. This is probably a fairly narrow window to hit, which explains why
the failure is rare.

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: Lua assertion during garbage collection when using a weak table

Steve Litt
On Thu, 21 Feb 2019 02:47:49 +0000
Andrew Gierth <[hidden email]> wrote:

Propagation will have left any weak tables on the

I've been gone too long. What's a "weak table?"

> "grayagain" list; unless I'm misreading something, it looks like it's
> intended that g->weak and g->ephemeron should be empty outside of the
> atomic phase.

Is ephemeron some kind of reserved word or part of the language?

SteveT

Reply | Threaded
Open this post in threaded view
|

Re: Lua assertion during garbage collection when using a weak table

Andrew Gierth
>>>>> "Steve" == Steve Litt <[hidden email]> writes:

 Steve> I've been gone too long. What's a "weak table?"

A weak table is one whose keys and/or values are weak references to
objects - that is to say, they do not prevent the objects from being
garbage-collected (in the event of which the key-value pair is removed
from the table).

weakt = setmetatable({}, {__mode="v"})  -- table with weak values
t = {"foo"}
weakt[1] = t   -- does not prevent t from being GC'd
t = nil        -- no strong references to the {"foo"} table remain
collectgarbage()
collectgarbage()
for k,v in pairs(weakt) do print(k,v) end  -- prints nothing

 >> "grayagain" list; unless I'm misreading something, it looks like it's
 >> intended that g->weak and g->ephemeron should be empty outside of the
 >> atomic phase.

 Steve> Is ephemeron some kind of reserved word or part of the language?

an "ephemeron table" is a table with weak keys but strong values, that
is to say its metatable has __mode="k" - it prevents the value in each
key/value pair from being GC'd _as long as_ the key is reachable from
somewhere other than the value (or is not an object). Once the key is
unreachable, the entry is removed.

See http://www.lua.org/manual/5.3/manual.html#2.5.2

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: Lua assertion during garbage collection when using a weak table

Roberto Ierusalimschy
In reply to this post by Andrew Gierth
Sorry to come that late to this problem. Many thanks for the detailed
analysis.


> I _think_ the assertion is invalid and that there's no real problem.
> (But I'm not completely sure.)

I guess you are right. After that assertion, 'atomic' does a
'propagateall' itself, which will add objects to those lists.
Nothing between the assertion and this first 'propagateall'
are affected by those lists. Probably, that assertion could
come before that previous 'propagateall', or that 'propagateall'
could be removed.


> This is my reading of the code:
>
> [...]
>
> So the question is, can a weak table be added to the gray list in this
> time window? That can happen if the table is marked via one of the
> barrier functions, and it looks like luaC_upvalbarrier is the likely
> candidate: when the returned anonymous function is closed over
> "weaktable", the value (the weak table) is marked and becomes gray.

A key point here is this code in 'traverseweak' (and similarly
in 'traverseephemeron'):

  if (g->gcstate == GCSpropagate)
    linkgclist(h, g->grayagain);  /* must retraverse it in atomic phase */
  else if (hasclears)
    linkgclist(h, g->weak);  /* has to be cleared later */

In the timeframe you mentioned, the GC is already in state GCSatomic,
so table is inserted in the 'g->weak' list.

In 5.4, this test is like this:

  if (g->gcstate == GCSatomic && hasclears)
    linkgclist(h, g->weak);  /* has to be cleared later */
  else
    linkgclist(h, g->grayagain);  /* must retraverse it in atomic phase */

(Note that 5.4 renamed some GC states, so its GCSatomic correspond to
GCSinsideatomic in 5.3.) In that time frame, 5.4 will be in the
state GCSenteratomic, and so the object will not go to the weak
list. However, 'traverseephemeron' still tests against GCSpropagate,
so probably that list has a similar problem.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Lua assertion during garbage collection when using a weak table

Roberto Ierusalimschy
> Sorry to come that late to this problem. Many thanks for the detailed
> analysis.
>
>
> > I _think_ the assertion is invalid and that there's no real problem.
> > (But I'm not completely sure.)
>
> I guess you are right. After that assertion, 'atomic' does a
> 'propagateall' itself, which will add objects to those lists.
> Nothing between the assertion and this first 'propagateall'
> are affected by those lists. Probably, that assertion could
> come before that previous 'propagateall', or that 'propagateall'
> could be removed.

I am afraid we were wrong. If the table is modified while in list
'weak' (but still before the atomic phase), it won't be traversed
again. So, a new element assigned to it may not be marked; it then
wll be collected while still in the table, eventually crashing the
system. It seems this is a real bug.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Lua assertion during garbage collection when using a weak table

Andrew Gierth
>>>>> "Roberto" == Roberto Ierusalimschy <[hidden email]> writes:

 Roberto> I am afraid we were wrong. If the table is modified while in
 Roberto> list 'weak' (but still before the atomic phase),

How could that happen?

In between the final propagate step (which switched the state to
GCatomic) and the start of the atomic step, can anything ever call any
of the propagation functions? As far as I can tell, during that interval
the only changes are marking and barriers, which can result in objects
being added to 'gray' or 'grayagain' but not to 'weak'.

Only when the atomic step is called and the first propagateall(g) is
done do we actually start putting objects onto the 'weak' list, and from
that point we go straight into atomic() without executing any user code,
so no addition to the table can happen.

Or to put it differently, why is it not sufficient to just move the
assertion from its current spot to here:

    case GCSatomic: {
      lu_mem work;
+     lua_assert(g->ephemeron == NULL && g->weak == NULL);
      propagateall(g);  /* make sure gray list is empty */
      work = atomic(L);  /* work is what was traversed by 'atomic' */
      entersweep(L);
      g->GCestimate = gettotalbytes(g);  /* first estimate */;
      return work;
    }

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: Lua assertion during garbage collection when using a weak table

Roberto Ierusalimschy
>  Roberto> I am afraid we were wrong. If the table is modified while in
>  Roberto> list 'weak' (but still before the atomic phase),
>
> How could that happen?
>
> In between the final propagate step (which switched the state to
> GCatomic) and the start of the atomic step, can anything ever call any
> of the propagation functions? As far as I can tell, during that interval
> the only changes are marking and barriers, which can result in objects
> being added to 'gray' or 'grayagain' but not to 'weak'.

You are right.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Lua assertion during garbage collection when using a weak table

Viacheslav Usov
On Fri, Mar 29, 2019 at 9:41 PM Roberto Ierusalimschy <[hidden email]> wrote:

> You are right.

Do I understand correctly that the consensus is that the assertion is spurious? Yet looking at  https://www.lua.org/bugs.html, this is not considered a bug.

Is there a patch for this? Will this be fixed?

Thanks,
V.

Reply | Threaded
Open this post in threaded view
|

Re: Lua assertion during garbage collection when using a weak table

Roberto Ierusalimschy
> On Fri, Mar 29, 2019 at 9:41 PM Roberto Ierusalimschy <
> [hidden email]> wrote:
>
> > You are right.
>
> Do I understand correctly that the consensus is that the assertion is
> spurious? Yet looking at  https://www.lua.org/bugs.html, this is not
> considered a bug.
>
> Is there a patch for this?

Just remove the assertion or, better yet, ignore it.


> Will this be fixed?

It is turned off, so it is like a comment. We usually do not consider
wrong comments as bugs, and we do not do releases to fix comments.

If this issue is causing your program any real harm, please let us know.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Lua assertion during garbage collection when using a weak table

Viacheslav Usov
On Tue, May 28, 2019 at 10:01 PM Roberto Ierusalimschy <[hidden email]> wrote:

> If this issue is causing your program any real harm, please let us know.

Well, I would not call this "real harm", but we do run debug builds that are linked against Lua with assertions enabled, so those false positives are a nuisance. I would like to continue having assertions in debug builds, so "turn them all off" is not a palatable solution to me. I could, of course, just turn this particular assertion off, but that would be another Lua patch I have to maintain.... hence my original questions.

Since there are already bugs documented for 5.3.5, it would be nice to have this erroneous assertion removed in a release that fixes them.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Lua assertion during garbage collection when using a weak table

Roberto Ierusalimschy
> On Tue, May 28, 2019 at 10:01 PM Roberto Ierusalimschy <
> [hidden email]> wrote:
>
> > If this issue is causing your program any real harm, please let us know.
>
> Well, I would not call this "real harm", but we do run debug builds that
> are linked against Lua with assertions enabled, so those false positives
> are a nuisance. I would like to continue having assertions in debug builds,
> so "turn them all off" is not a palatable solution to me.

Assertions in Lua are not intended for general use; they are there for
internal tests. You might have noticed that there is nothing related to
them in luaconf.h or in the makefile (unlike LUA_USE_APICHECK, which is
for general use). If you want to use them, that is fine, but you are on
your own.

-- Roberto