current status about last bugs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

current status about last bugs

Roberto Ierusalimschy
This is what we figured out until now:

- Stack overflow in luaO_pushvfstring

Seems to be the issue pointed out by Andrew, of wrong computation of
nCcalls when entering a coroutine. (I did not check that yet.)


- Heap use after free in luaD_call

Macro 'checkstackp' can run a GC step and destroy the Call Info
previously allocated to call a function. (Andrew came to the same
conclusion.)

quick fix: call 'next_ci' only after checking the stack.


- Heap overflow in luaT_adjustvarargs

Bug caused by wrong order in macro checkstackGC; it should check the
GC before resizing the stack. (Also pointed out by Andrew.)

quick fix: change the order of operations in checkstackGC.


- heap-buffer-overflow in getobjname

Macro checkGC (in the luaV_execute) must save the 'pc', since a GC
step can raise errors. Even though the error is not propagated, it
still needs a valid pc to generate the error message. (Related, the
call to luaT_adjustvarargs must be protected, for a similar reason.)

quick fix: save pc in macro checkGC and protect the call to
luaT_adjustvarargs.


- heap-buffer-overflow in luaD_pretailcall

Didn't check yet, but it seems to be the caused by the
same problem with macro checkstackGC. (At least it goes away when
we fix that macro.)


- Heap overflow in luaH_get

Until now, I have no clues about this one. It seems to be a real problem
in the GC, but it is hard to reproduce. Applying the previous fixes
makes it desapear, but I cannot see how they could solve the bug.

-- Roberto
Reply | Threaded
Open this post in threaded view
|

Re: current status about last bugs

Roberto Ierusalimschy
> - Heap overflow in luaH_get
>
> Until now, I have no clues about this one. It seems to be a real problem
> in the GC, but it is hard to reproduce. Applying the previous fixes
> makes it desapear, but I cannot see how they could solve the bug.

Indeed, the other changes only disguise the bug. It still happens
changing some GC parameters. (This one definitively looks like
a GC bug.)

-- Roberto
Reply | Threaded
Open this post in threaded view
|

Re: current status about last bugs

Dibyendu Majumdar
On Wed, 8 Jul 2020 at 18:20, Roberto Ierusalimschy
<[hidden email]> wrote:

>
> > - Heap overflow in luaH_get
> >
> > Until now, I have no clues about this one. It seems to be a real problem
> > in the GC, but it is hard to reproduce. Applying the previous fixes
> > makes it desapear, but I cannot see how they could solve the bug.
>
> Indeed, the other changes only disguise the bug. It still happens
> changing some GC parameters. (This one definitively looks like
> a GC bug.)
>

Does this bug only appear in generational GC?

Regards
Reply | Threaded
Open this post in threaded view
|

Re: current status about last bugs

Roberto Ierusalimschy
> > > - Heap overflow in luaH_get
> > >
> > > Until now, I have no clues about this one. It seems to be a real problem
> > > in the GC, but it is hard to reproduce. Applying the previous fixes
> > > makes it desapear, but I cannot see how they could solve the bug.
> >
> > Indeed, the other changes only disguise the bug. It still happens
> > changing some GC parameters. (This one definitively looks like
> > a GC bug.)
> >
>
> Does this bug only appear in generational GC?

Apparently yes. As I said, any small changes can hide it. Maybe some
other parameters in incremental mode can expose it.

-- Roberto
Reply | Threaded
Open this post in threaded view
|

Re: current status about last bugs

Daniel Silverstone
In reply to this post by Roberto Ierusalimschy
On Wed, Jul 08, 2020 at 11:50:54 -0300, Roberto Ierusalimschy wrote:
> - Heap overflow in luaH_get
>
> Until now, I have no clues about this one. It seems to be a real problem
> in the GC, but it is hard to reproduce. Applying the previous fixes
> makes it desapear, but I cannot see how they could solve the bug.

I assume you're already doing so, but just in case, the `rr` debug tool is
exceedingly helpful when trying to diagnose this kind of thing.  You can
walk forward and backward and replay the execution of a failing test case
over and over until you diagnose what happened.  If you're comfortable
with `gdb` but really wish you could put a watchpoint on the address
of a variable which was definitely corrupted by the time the crash happened
and then `reverse-continue` until that corruption goes away, the `rr` is
what you need.

If I'm already preaching to the choir here, then I apologise :D

D.

--
Daniel Silverstone                         http://www.digital-scurf.org/
PGP mail accepted and encouraged.            Key Id: 3CCE BABE 206C 3B69
Reply | Threaded
Open this post in threaded view
|

Re: current status about last bugs

Roberto Ierusalimschy
> I assume you're already doing so, but just in case, the `rr` debug tool is
> exceedingly helpful when trying to diagnose this kind of thing.  You can
> walk forward and backward and replay the execution of a failing test case
> over and over until you diagnose what happened.  If you're comfortable
> with `gdb` but really wish you could put a watchpoint on the address
> of a variable which was definitely corrupted by the time the crash happened
> and then `reverse-continue` until that corruption goes away, the `rr` is
> what you need.
>
> If I'm already preaching to the choir here, then I apologise :D

Not at all! I was still using gdb :-). I will try rr. Many thanks,

-- Roberto
Reply | Threaded
Open this post in threaded view
|

Re: current status about last bugs

Dibyendu Majumdar
In reply to this post by Roberto Ierusalimschy
On Wed, 8 Jul 2020 at 20:17, Roberto Ierusalimschy
<[hidden email]> wrote:
>
> > Does this bug only appear in generational GC?
>
> Apparently yes. As I said, any small changes can hide it. Maybe some
> other parameters in incremental mode can expose it.
>

Of course Ravi's code base is different - but I do have the GC merged.
For what it is worth, I am not able to reproduce the error. But that
may not tell you much because I haven't got some of the VM changes.

Regards
Reply | Threaded
Open this post in threaded view
|

Re: current status about last bugs

Dibyendu Majumdar
On Wed, 8 Jul 2020 at 21:29, Dibyendu Majumdar <[hidden email]> wrote:

>
> On Wed, 8 Jul 2020 at 20:17, Roberto Ierusalimschy
> <[hidden email]> wrote:
> >
> > > Does this bug only appear in generational GC?
> >
> > Apparently yes. As I said, any small changes can hide it. Maybe some
> > other parameters in incremental mode can expose it.
> >
>

I was able to reproduce the issue in Ravi in an older version. I
thought I would let you know what my issue was, maybe same thing is
happening  here.

vmcase(OP_RETURN) {
int n = GETARG_B(i) - 1; /* number of results */
int nparams1 = GETARG_C(i);
if (n < 0) /* not fixed? */
n = cast_int(L->top - ra); /* get what is available */
savepc(ci);
if (TESTARG_k(i)) { /* may there be open upvalues? */
if (L->top < ci->top)
L->top = ci->top;
luaF_close(L, base, LUA_OK);
updatetrap(ci);
updatestack(ci);
}
if (nparams1) /* vararg function? */
ci->func -= ci->u.l.nextraargs + nparams1;
L->top = ra + n; /* set call for 'luaD_poscall' */
luaD_poscall(L, ci, n);
return;
}

The use of ra - is ra still correct? Stacke may have been reallocated?
There was similar  bug in Ravi.

Regards
Reply | Threaded
Open this post in threaded view
|

Re: current status about last bugs

Roberto Ierusalimschy
> vmcase(OP_RETURN) {
> int n = GETARG_B(i) - 1; /* number of results */
> int nparams1 = GETARG_C(i);
> if (n < 0) /* not fixed? */
> n = cast_int(L->top - ra); /* get what is available */
> savepc(ci);
> if (TESTARG_k(i)) { /* may there be open upvalues? */
> if (L->top < ci->top)
> L->top = ci->top;
> luaF_close(L, base, LUA_OK);
> updatetrap(ci);
> updatestack(ci);
> }
> if (nparams1) /* vararg function? */
> ci->func -= ci->u.l.nextraargs + nparams1;
> L->top = ra + n; /* set call for 'luaD_poscall' */
> luaD_poscall(L, ci, n);
> return;
> }
>
> The use of ra - is ra still correct? Stacke may have been reallocated?
> There was similar  bug in Ravi.

Note the macro updatestack: It should correct 'ra' in case of a stack
reallocation.

-- Roberto
Reply | Threaded
Open this post in threaded view
|

Re: current status about last bugs

Dibyendu Majumdar
On Thu, 9 Jul 2020 at 15:18, Roberto Ierusalimschy
<[hidden email]> wrote:

>
> > vmcase(OP_RETURN) {
> > int n = GETARG_B(i) - 1; /* number of results */
> > int nparams1 = GETARG_C(i);
> > if (n < 0) /* not fixed? */
> > n = cast_int(L->top - ra); /* get what is available */
> > savepc(ci);
> > if (TESTARG_k(i)) { /* may there be open upvalues? */
> > if (L->top < ci->top)
> > L->top = ci->top;
> > luaF_close(L, base, LUA_OK);
> > updatetrap(ci);
> > updatestack(ci);
> > }
> > if (nparams1) /* vararg function? */
> > ci->func -= ci->u.l.nextraargs + nparams1;
> > L->top = ra + n; /* set call for 'luaD_poscall' */
> > luaD_poscall(L, ci, n);
> > return;
> > }
> >
> > The use of ra - is ra still correct? Stacke may have been reallocated?
> > There was similar  bug in Ravi.
>
> Note the macro updatestack: It should correct 'ra' in case of a stack
> reallocation.
>

Ah okay so that's fine. Sorry for the noise.

Regards