Lua 5.4.0-rc5 segfault in low memory conditions

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Lua 5.4.0-rc5 segfault in low memory conditions

Sergey Zakharchenko
Hello list,

I've been tracking down an issue with crashes on low memory lately,
without much success, but seems like I have a more or less
reproducible case now. Lua 5.4.0-rc5 with the attached patch (rather
small, intended to dump l_alloc invocations and fail exactly one of
them) when run on the attached test script on x86-64 Linux causes a
segmentation fault during l_alloc invocation #1809 (long after the
#345 which is patched to fail). The default GC settings are used.
Issue reproduced with both -O2 (default) and -O0; enabling
lua_assert=assert also changes nothing (no assertion failure, just
segfault). FWIW the backtrace with -O0 looks like:

Program received signal SIGSEGV, Segmentation fault.
... deep in libc...
#3  0x00007ffff7ce3f8f in realloc () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x000055555557f73e in l_alloc (ud=0x0, ptr=0x5555555a28c0,
osize=544, nsize=1536) at lauxlib.c:1009
#5  0x000055555556974d in luaM_realloc_ (L=0x5555555a2268,
block=0x5555555a28c0, osize=544, nsize=1536) at lmem.c:166
#6  0x0000555555562b47 in luaD_reallocstack (L=0x5555555a2268,
newsize=96, raiseerror=1) at ldo.c:187
#7  0x0000555555562cf6 in luaD_growstack (L=0x5555555a2268, n=85,
raiseerror=1) at ldo.c:231
#8  0x00005555555639d3 in luaD_call (L=0x5555555a2268,
func=0x5555555a2910, nresults=-1) at ldo.c:494
#9  0x0000555555563b75 in luaD_callnoyield (L=0x5555555a2268,
func=0x5555555a2910, nResults=-1) at ldo.c:526
#10 0x000055555555f544 in f_call (L=0x5555555a2268, ud=0x7fffffffdd80)
at lapi.c:997
#11 0x00005555555629be in luaD_rawrunprotected (L=0x5555555a2268,
f=0x55555555f50f <f_call>, ud=0x7fffffffdd80) at ldo.c:148
#12 0x0000555555564316 in luaD_pcall (L=0x5555555a2268,
func=0x55555555f50f <f_call>, u=0x7fffffffdd80, old_top=80, ef=64) at
ldo.c:749
#13 0x000055555555f614 in lua_pcallk (L=0x5555555a2268, nargs=0,
nresults=-1, errfunc=3, ctx=0, k=0x0) at lapi.c:1023
#14 0x000055555555ba60 in docall (L=0x5555555a2268, narg=0, nres=-1)
at lua.c:139
#15 0x000055555555be5d in handle_script (L=0x5555555a2268,
argv=0x7fffffffe260) at lua.c:228
#16 0x000055555555ca08 in pmain (L=0x5555555a2268) at lua.c:603

So as I see it, allocation #1809 starts and there's some sort of
memory corruption/use after free happening before that, corrupting
internal malloc structures.

I've uploaded the files to
https://gist.github.com/szakharchenko/0752b973e5c563546b91d1b0865bce99
as well if that's more convenient.

I have a slightly larger patch with the allocation to fail
configurable from environment and a script for the brute force; let me
know if you need them but they're fairly trivial...

The original issue was discovered on big endian 32-bit MIPS so it's
likely not endian- or word-size-related.

I note that the single failed allocation causes a full GC, which
happens to deallocate nothing in this case. On MIPS I use a custom
malloc wrapper for debugging where every RAM block is only allocated
once and overwritten on free; I received an assertion error (when
compiled with assertions) in propagatemark where it stumbles upon an
object with an unknown tt (on the "default: lua_assert(0);" line)
corresponding to the freed memory pattern. Hopefully someone more
savvy than me can track this down (x86 mallocs have debug modes as
well).

Let me know if I can be of any more help, thanks for all the effort
you put into Lua, and good luck!

Best regards,

--
DoubleF

test.lua (4K) Download Attachment
patch-against-lua-5.4.0-rc5.patch (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Andrew Gierth
>>>>> "Sergey" == Sergey Zakharchenko <[hidden email]> writes:

 Sergey> Hello list,
 Sergey> I've been tracking down an issue with crashes on low memory
 Sergey> lately, without much success, but seems like I have a more or
 Sergey> less reproducible case now. Lua 5.4.0-rc5 with the attached
 Sergey> patch (rather small, intended to dump l_alloc invocations and
 Sergey> fail exactly one of them) when run on the attached test script
 Sergey> on x86-64 Linux causes a segmentation fault during l_alloc
 Sergey> invocation #1809 (long after the #345 which is patched to
 Sergey> fail).

I couldn't reproduce this on freebsd, even with malloc debugging options
enabled.

Can you get a backtrace of where your allocation #345 is happening? This
should be easy by running it under gdb, setting a breakpoint on your
failed = 1; line, and then getting a backtrace when it stops on that?
(Then, of course, let it continue and verify it crashes in the expected
place)

--
Andrew.
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Sergey Zakharchenko
Hello Andrew,

Andrew Gierth <[hidden email]>:
> I couldn't reproduce this on freebsd, even with malloc debugging options
> enabled.

I'm afraid the memory patterns may be different across libcs and the
allocation that needs to fail may vary as well. My base system is
Debian Buster and libc is glibc 2.28. I could run a similar test on a
relatively  ancient FreeBSD installation I have around if you think
it's worth it. I'm attaching the brute force patch and helper script I
mentioned before for convenience.

> Can you get a backtrace of where your allocation #345 is happening?

Sure, it's during proto construction:

00000344 (nil) 5 0x5555555a9d40 56 <--- this one is OK
fail_alloc_at: 00000345: failing

Breakpoint 1, l_alloc (ud=0x0, ptr=0x0, osize=10, nsize=128) at lauxlib.c:998
998           failed = 1;
(gdb) bt
#0  l_alloc (ud=0x0, ptr=0x0, osize=10, nsize=128) at lauxlib.c:998
#1  0x0000555555569895 in luaM_malloc_ (L=0x5555555a2268, size=128,
tag=10) at lmem.c:193
#2  0x0000555555565e7f in luaC_newobj (L=0x5555555a2268, tt=10,
sz=128) at lgc.c:242
#3  0x00005555555658c4 in luaF_newproto (L=0x5555555a2268) at lfunc.c:246
#4  0x000055555556da08 in luaY_parser (L=0x5555555a2268,
z=0x7fffffffbd20, buff=0x7fffffffbc78, dyd=0x7fffffffbc90,
name=0x5555555a8ab8 "@test.lua", firstchar=114) at lparser.c:1979
#5  0x000055555556451c in f_parser (L=0x5555555a2268,
ud=0x7fffffffbc70) at ldo.c:796
#6  0x00005555555629be in luaD_rawrunprotected (L=0x5555555a2268,
f=0x55555556441f <f_parser>, ud=0x7fffffffbc70) at ldo.c:148
#7  0x0000555555564316 in luaD_pcall (L=0x5555555a2268,
func=0x55555556441f <f_parser>, u=0x7fffffffbc70, old_top=80, ef=0) at
ldo.c:749
#8  0x0000555555564607 in luaD_protectedparser (L=0x5555555a2268,
z=0x7fffffffbd20, name=0x5555555a8ab8 "@test.lua", mode=0x0) at
ldo.c:813
#9  0x000055555555f79a in lua_load (L=0x5555555a2268,
reader=0x55555557e905 <getF>, data=0x7fffffffbdc0,
chunkname=0x5555555a8ab8 "@test.lua", mode=0x0) at lapi.c:1053
#10 0x000055555557ed3b in luaL_loadfilex (L=0x5555555a2268,
filename=0x7fffffffe562 "test.lua", mode=0x0) at lauxlib.c:776
#11 0x000055555555be2f in handle_script (L=0x5555555a2268,
argv=0x7fffffffe260) at lua.c:225
#12 0x000055555555ca08 in pmain (L=0x5555555a2268) at lua.c:603
#13 0x00005555555638d5 in luaD_call (L=0x5555555a2268,
func=0x5555555a28d0, nresults=1) at ldo.c:482
#14 0x0000555555563b75 in luaD_callnoyield (L=0x5555555a2268,
func=0x5555555a28d0, nResults=1) at ldo.c:526
#15 0x000055555555f544 in f_call (L=0x5555555a2268, ud=0x7fffffffe110)
at lapi.c:997
#16 0x00005555555629be in luaD_rawrunprotected (L=0x5555555a2268,
f=0x55555555f50f <f_call>, ud=0x7fffffffe110) at ldo.c:148
#17 0x0000555555564316 in luaD_pcall (L=0x5555555a2268,
func=0x55555555f50f <f_call>, u=0x7fffffffe110, old_top=16, ef=0) at
ldo.c:749
#18 0x000055555555f614 in lua_pcallk (L=0x5555555a2268, nargs=2,
nresults=1, errfunc=0, ctx=0, k=0x0) at lapi.c:1023
#19 0x000055555555cb2d in main (argc=2, argv=0x7fffffffe258) at lua.c:629

Not sure if this directly gives you anything. The "interesting" things
in 'failed' vs 'non-failed' cases happen later on, when a bunch of
protos are seemingly disposed of by the GC in the 'failed' case, at
least that's what I recall from testing on MIPS. However, this
deallocation of protos doesn't happen during the full GC triggered by
the allocation failure.

> (Then, of course, let it continue and verify it crashes in the expected place)

Verified. Thanks for your interest!

Best regards,

--
DoubleF

brute-force.patch (1K) Download Attachment
brute-force.sh (328 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Sergey Zakharchenko
Hi again,

FWIW glibc malloc with MALLOC_CHECK_=3 fails earlier in the same (fail
#345) test case:

00001777 0x5555555a96b0 128 (nil) 0
realloc(): invalid pointer

Program received signal SIGABRT, Aborted.
0x00007ffff7c968bb in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff7c968bb in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7c81535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff7cd8778 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff7cdee6a in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007ffff7ce3370 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x000055555557f73e in l_alloc (ud=0x0, ptr=0x5555555abff0,
osize=32, nsize=64) at lauxlib.c:1009
#6  0x000055555556974d in luaM_realloc_ (L=0x5555555a2018,
block=0x5555555abff0, osize=32, nsize=64) at lmem.c:166
#7  0x0000555555569803 in luaM_saferealloc_ (L=0x5555555a2018,
block=0x5555555abff0, osize=32, nsize=64) at lmem.c:181
#8  0x00005555555695ad in luaM_growaux_ (L=0x5555555a2018,
block=0x5555555abff0, nelems=8, psize=0x5555555ae308, size_elems=4,
limit=2147483647, what=0x555555592d5e "opcodes") at lmem.c:97
#9  0x000055555558034c in luaK_code (fs=0x7fffffffb5e8, i=589894) at lcode.c:393
#10 0x000055555556c62c in retstat (ls=0x7fffffffb9d0) at lparser.c:1868
#11 statement (ls=ls@entry=0x7fffffffb9d0) at lparser.c:1922
#12 0x000055555556c798 in statlist (ls=ls@entry=0x7fffffffb9d0) at lparser.c:792
#13 0x000055555556c9ef in body (ls=ls@entry=0x7fffffffb9d0,
e=e@entry=0x7fffffffb730, ismethod=ismethod@entry=0, line=119) at
lparser.c:993
#14 0x000055555556cc5a in simpleexp (v=0x7fffffffb730,
ls=0x7fffffffb9d0) at lparser.c:1172
#15 subexpr (ls=ls@entry=0x7fffffffb9d0, v=v@entry=0x7fffffffb730,
limit=limit@entry=0) at lparser.c:1260
#16 0x000055555556ce3e in expr (ls=ls@entry=0x7fffffffb9d0,
v=v@entry=0x7fffffffb730) at lparser.c:1280
#17 0x000055555556cf68 in explist (ls=ls@entry=0x7fffffffb9d0,
v=v@entry=0x7fffffffb730) at lparser.c:1007
#18 0x000055555556c5a4 in retstat (ls=0x7fffffffb9d0) at lparser.c:1850
#19 statement (ls=ls@entry=0x7fffffffb9d0) at lparser.c:1922
#20 0x000055555556c798 in statlist (ls=ls@entry=0x7fffffffb9d0) at lparser.c:792
#21 0x000055555556c9ef in body (ls=ls@entry=0x7fffffffb9d0,
e=e@entry=0x7fffffffb900, ismethod=ismethod@entry=0, line=1) at
lparser.c:993
#22 0x000055555556cc5a in simpleexp (v=0x7fffffffb900,
ls=0x7fffffffb9d0) at lparser.c:1172
#23 subexpr (ls=ls@entry=0x7fffffffb9d0, v=v@entry=0x7fffffffb900,
limit=limit@entry=0) at lparser.c:1260
#24 0x000055555556ce3e in expr (ls=ls@entry=0x7fffffffb9d0,
v=v@entry=0x7fffffffb900) at lparser.c:1280
#25 0x000055555556cf3e in explist (ls=ls@entry=0x7fffffffb9d0,
v=v@entry=0x7fffffffb900) at lparser.c:1004
#26 0x000055555556c5a4 in retstat (ls=0x7fffffffb9d0) at lparser.c:1850
#27 statement (ls=ls@entry=0x7fffffffb9d0) at lparser.c:1922
#28 0x000055555556c798 in statlist (ls=ls@entry=0x7fffffffb9d0) at lparser.c:792
#29 0x000055555556db6c in mainfunc (fs=0x7fffffffb988,
ls=0x7fffffffb9d0) at lparser.c:1963
#30 luaY_parser (L=0x5555555a2018, z=0x7fffffffbd20,
buff=0x7fffffffbc78, dyd=<optimized out>, name=0x5555555a83b8
"@test.lua", firstchar=114) at lparser.c:1986
#31 0x000055555556451c in f_parser (L=0x5555555a2018,
ud=0x7fffffffbc70) at ldo.c:796
...

The pointer in question, 0x5555555abff0, had been freed before at
00001346, and, as verified by setting a conditional breakpoint, the
backtrace there is:

Breakpoint 2, l_alloc (ud=0x0, ptr=0x5555555abff0, osize=32, nsize=0)
at lauxlib.c:992
992       if (alloc_idx == fail_alloc_at)

#0  l_alloc (ud=0x0, ptr=0x5555555abff0, osize=32, nsize=0) at lauxlib.c:992
#1  0x0000555555569685 in luaM_free_ (L=0x5555555a2018,
block=0x5555555abff0, osize=32) at lmem.c:135
#2  0x00005555555659e9 in luaF_freeproto (L=0x5555555a2018,
f=0x5555555ae2f0) at lfunc.c:273
#3  0x0000555555567713 in freeobj (L=0x5555555a2018, o=0x5555555ae2f0)
at lgc.c:714
#4  0x00005555555680f5 in sweepgen (L=0x5555555a2018,
g=0x5555555a20e0, p=0x5555555ae380, limit=0x5555555a9660) at
lgc.c:1016
#5  0x0000555555568520 in youngcollection (L=0x5555555a2018,
g=0x5555555a20e0) at lgc.c:1148
#6  0x0000555555568baa in genstep (L=0x5555555a2018, g=0x5555555a20e0)
at lgc.c:1333
#7  0x00005555555693fd in luaC_step (L=0x5555555a2018) at lgc.c:1571
#8  0x000055555556bc7c in close_func (ls=ls@entry=0x7fffffffb9d0) at
lparser.c:762
#9  0x000055555556ca55 in body (ls=ls@entry=0x7fffffffb9d0,
e=e@entry=0x7fffffffb560, ismethod=ismethod@entry=0, line=154) at
lparser.c:997
#10 0x000055555556cc5a in simpleexp (v=0x7fffffffb560,
ls=0x7fffffffb9d0) at lparser.c:1172
#11 subexpr (ls=ls@entry=0x7fffffffb9d0, v=v@entry=0x7fffffffb560,
limit=limit@entry=0) at lparser.c:1260
#12 0x000055555556ce3e in expr (ls=ls@entry=0x7fffffffb9d0,
v=v@entry=0x7fffffffb560) at lparser.c:1280
#13 0x000055555556cf68 in explist (ls=ls@entry=0x7fffffffb9d0,
v=v@entry=0x7fffffffb560) at lparser.c:1007
#14 0x000055555556c5a4 in retstat (ls=0x7fffffffb9d0) at lparser.c:1850
#15 statement (ls=ls@entry=0x7fffffffb9d0) at lparser.c:1922
#16 0x000055555556c798 in statlist (ls=ls@entry=0x7fffffffb9d0) at lparser.c:792
#17 0x000055555556c9ef in body (ls=ls@entry=0x7fffffffb9d0,
e=e@entry=0x7fffffffb730, ismethod=ismethod@entry=0, line=119) at
lparser.c:993
#18 0x000055555556cc5a in simpleexp (v=0x7fffffffb730,
ls=0x7fffffffb9d0) at lparser.c:1172
#19 subexpr (ls=ls@entry=0x7fffffffb9d0, v=v@entry=0x7fffffffb730,
limit=limit@entry=0) at lparser.c:1260
#20 0x000055555556ce3e in expr (ls=ls@entry=0x7fffffffb9d0,
v=v@entry=0x7fffffffb730) at lparser.c:1280
#21 0x000055555556cf68 in explist (ls=ls@entry=0x7fffffffb9d0,
v=v@entry=0x7fffffffb730) at lparser.c:1007
#22 0x000055555556c5a4 in retstat (ls=0x7fffffffb9d0) at lparser.c:1850
#23 statement (ls=ls@entry=0x7fffffffb9d0) at lparser.c:1922
#24 0x000055555556c798 in statlist (ls=ls@entry=0x7fffffffb9d0) at lparser.c:792
#25 0x000055555556c9ef in body (ls=ls@entry=0x7fffffffb9d0,
e=e@entry=0x7fffffffb900, ismethod=ismethod@entry=0, line=1) at
lparser.c:993
#26 0x000055555556cc5a in simpleexp (v=0x7fffffffb900,
ls=0x7fffffffb9d0) at lparser.c:1172
#27 subexpr (ls=ls@entry=0x7fffffffb9d0, v=v@entry=0x7fffffffb900,
limit=limit@entry=0) at lparser.c:1260
#28 0x000055555556ce3e in expr (ls=ls@entry=0x7fffffffb9d0,
v=v@entry=0x7fffffffb900) at lparser.c:1280
#29 0x000055555556cf3e in explist (ls=ls@entry=0x7fffffffb9d0,
v=v@entry=0x7fffffffb900) at lparser.c:1004
#30 0x000055555556c5a4 in retstat (ls=0x7fffffffb9d0) at lparser.c:1850
#31 statement (ls=ls@entry=0x7fffffffb9d0) at lparser.c:1922
#32 0x000055555556c798 in statlist (ls=ls@entry=0x7fffffffb9d0) at lparser.c:792
#33 0x000055555556db6c in mainfunc (fs=0x7fffffffb988,
ls=0x7fffffffb9d0) at lparser.c:1963
#34 luaY_parser (L=0x5555555a2018, z=0x7fffffffbd20,
buff=0x7fffffffbc78, dyd=<optimized out>, name=0x5555555a83b8
"@test.lua", firstchar=114) at lparser.c:1986
#35 0x000055555556451c in f_parser (L=0x5555555a2018,
ud=0x7fffffffbc70) at ldo.c:796
...

(both backtraces from the same program run, just in narrative, not
actual, order).

Best regards,

--
DoubleF
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Luiz Henrique de Figueiredo
In reply to this post by Sergey Zakharchenko
Do these crashes occur in previous rcs of 5.4.0?
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Sergey Zakharchenko
Hello Luiz,

> Do these crashes occur in previous rcs of 5.4.0?

Originally found on rc3. Just verified exactly the same behavior
(crash at #1809 with failure at #345) with rc1.

Best regards,

--
DoubleF
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Sergey Zakharchenko
In reply to this post by Luiz Henrique de Figueiredo
Luiz,

Luiz Henrique de Figueiredo <[hidden email]>:
> Do these crashes occur in previous rcs of 5.4.0?

[yes]
Just to clarify --- I never tested versions 5.2, 5.3 or anything
before 5.4 RC cycle. I haven't been able to trigger similar issues
with 5.1.

Best regards,

--
DoubleF
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Andrew Gierth
In reply to this post by Sergey Zakharchenko
>>>>> "Sergey" == Sergey Zakharchenko <[hidden email]> writes:

 Sergey> Hi again,
 Sergey> FWIW glibc malloc with MALLOC_CHECK_=3 fails earlier in the
 Sergey> same (fail #345) test case:

I can reproduce something that looks almost exactly like this one, so
it's not OS-dependent or glibc-malloc dependent, there must be a real
bug in Lua somewhere.

It looks to me like the issue is not with the actual failure of the
allocation (since the retry succeeds), but the fact that we're doing a
garbage collection at that precise point, which is leaving something
corrupted in a way that breaks later.

investigation ongoing...

--
Andrew.
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Sergey Zakharchenko
Andrew,

Thanks for your efforts!

It looks to me like the issue is not with the actual failure of the
allocation (since the retry succeeds), but the fact that we're doing a
garbage collection at that precise point, which is leaving something
corrupted in a way that breaks later.

Yes, this is my opinion as well. When I compared memory dumps before and after the full GC triggered, I only saw differences in the marked bits, a gclist pointer in a proto that was wild (?pointed to a Lua routine?) became null, and pointers in state changed. The best tools for fixing this are likely rr (working on getting a host that can run it up) and superhuman Lua GC internals knowledge (which I lack).

Best regards,

-- 
DoubleF
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Sergey Zakharchenko
In reply to this post by Sergey Zakharchenko
Sergey Zakharchenko <[hidden email]>:
> [crash on ]
> My base system is
> Debian Buster and libc is glibc 2.28.

The system I can run 'rr' on is also x86-64 but has glibc 2.30, and
the issue isn't triggered by failing #345 (I've also switched to
testing based on git commits, at 69e84805 now). Back to brute-force
search...

Best regards,

--
DoubleF
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

BogdanM


On Sat, Jun 13, 2020 at 4:48 PM Sergey Zakharchenko <[hidden email]> wrote:
Sergey Zakharchenko <[hidden email]>:
> [crash on ]
> My base system is
> Debian Buster and libc is glibc 2.28.

The system I can run 'rr' on is also x86-64 but has glibc 2.30, and
the issue isn't triggered by failing #345 (I've also switched to
testing based on git commits, at 69e84805 now). Back to brute-force
search...

Best regards,

--
DoubleF

That stack trace looks very familiar. I'm about 90% sure that I saw the same error with Lua 5.3.5 (vanilla version, no patches) running on a microcontroller (so in very low memory conditions) using dlmalloc (bare metal, no OS). I doubt that I'll be able to reproduce the issue again, but I will try. I tried to debug the issue at the time, but without success.

Thanks,
Bogdan 
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Sergey Zakharchenko
Hello Bogdan,

[I apologise to everyone not interested in this discussion for my
overly long emails. If I could, I would make them shorter.]

Bogdan Marinescu <[hidden email]>:
> That stack trace looks very familiar. I'm about 90% sure that I saw the same error with Lua 5.3.5

This cannot be ruled out; shall I run a similar test against 5.3.5?

Sergey Zakharchenko:
>> The system I can run 'rr' on is also x86-64 but has glibc 2.30, and
>> the issue isn't triggered by failing #345 (I've also switched to
>> testing based on git commits, at 69e84805 now). Back to brute-force
>> search...

This system crashes when set to fail allocation #336 (which tries to
allocate 56 bytes for object of type 5) and the crash itself happens
after #357 but not during #358; instead an attempt is seemingly made
to add sub-prototypes to an array which is somehow NULL.

Here's the relevant part of the allocation log, showing the fate of
memory area in question:

00000330 (nil) 4 0x564a299685e0 30
00000331 (nil) 4 0x564a29968610 33
00000332 (nil) 4 0x564a29968640 28
00000333 0x564a29966f20 25 (nil) 0
00000334 (nil) 4 0x564a29966f20 34
00000335 (nil) 6 0x564a29969870 40
fail_alloc_at: 00000336: failing
00000336 (nil) 5 (nil) 56
00000337 (nil) 5 0x564a299698a0 56
00000338 (nil) 10 0x564a299698e0 128 <--- the proto is created
00000339 (nil) 0 0x564a29969970 32
00000340 (nil) 0 0x564a29963b70 16
00000341 (nil) 0 0x564a299699a0 4
00000342 (nil) 0 0x564a299699c0 64
00000343 (nil) 0 0x564a29969a10 24
00000344 (nil) 0 (nil) 0
00000345 0x564a29963b70 16 (nil) 0
00000346 (nil) 0 (nil) 0
00000347 (nil) 0 (nil) 0
00000348 0x564a299699a0 4 (nil) 0
00000349 (nil) 0 (nil) 0
00000350 (nil) 0 (nil) 0
00000351 0x564a299699c0 64 (nil) 0
00000352 0x564a299698e0 128 (nil) 0 <--- the proto is deleted
00000353 (nil) 0 0x564a29969a30 48
00000354 (nil) 0 (nil) 0
00000355 0x564a29969a10 24 (nil) 0
00000356 (nil) 0 0x564a29969a70 32
00000357 (nil) 10 0x564a299698e0 128 <--- another? proto is created in
the same area

0x0000564a27d8b768 in addprototype (ls=ls@entry=0x7ffdadcadfe0) at lparser.c:699
699       f->p[fs->np++] = clp = luaF_newproto(L);
(rr) l
694         int oldsize = f->sizep;
695         luaM_growvector(L, f->p, fs->np, f->sizep, Proto *,
MAXARG_Bx, "functions");
696         while (oldsize < f->sizep)
697           f->p[oldsize++] = NULL;
698       }
699       f->p[fs->np++] = clp = luaF_newproto(L);
700       luaC_objbarrier(L, f, clp);
701       return clp;
702     }
703
(rr) print f
$1 = (Proto *) 0x564a299698e0
(rr) print *f
$2 = {next = 0x564a299698a0, tt = 10 '\n', marked = 16 '\020',
numparams = 0 '\000', is_vararg = 0 '\000', maxstacksize = 0 '\000',
sizeupvalues = 0, sizek = 0, sizecode = 0, sizelineinfo = 0, sizep =
0, sizelocvars = 0, sizeabslineinfo = 0, linedefined = 0,
lastlinedefined = 0, k = 0x0, code = 0x0, p = 0x0,
  upvalues = 0x0, lineinfo = 0x0, abslineinfo = 0x0, locvars = 0x0,
source = 0x0, gclist = 0x0}
(rr) bt
#0  0x0000564a27d8b768 in addprototype (ls=ls@entry=0x7ffdadcadfe0) at
lparser.c:699
#1  body (ls=ls@entry=0x7ffdadcadfe0, e=e@entry=0x7ffdadcadf10,
ismethod=ismethod@entry=0, line=1) at lparser.c:983
#2  0x0000564a27d8bbd6 in simpleexp (v=0x7ffdadcadf10,
ls=0x7ffdadcadfe0) at lparser.c:1172
#3  subexpr (ls=ls@entry=0x7ffdadcadfe0, v=v@entry=0x7ffdadcadf10,
limit=limit@entry=0) at lparser.c:1260
#4  0x0000564a27d8bdbe in expr (ls=ls@entry=0x7ffdadcadfe0,
v=v@entry=0x7ffdadcadf10) at lparser.c:1280
#5  0x0000564a27d8bec6 in explist (ls=ls@entry=0x7ffdadcadfe0,
v=v@entry=0x7ffdadcadf10) at lparser.c:1004
#6  0x0000564a27d8b4df in retstat (ls=0x7ffdadcadfe0) at lparser.c:1850
#7  statement (ls=ls@entry=0x7ffdadcadfe0) at lparser.c:1922
#8  0x0000564a27d8b707 in statlist (ls=ls@entry=0x7ffdadcadfe0) at lparser.c:792
#9  0x0000564a27d8cb34 in mainfunc (fs=0x7ffdadcadf98,
ls=0x7ffdadcadfe0) at lparser.c:1963
#10 luaY_parser (L=0x564a299622a8, z=<optimized out>,
buff=0x7ffdadcae288, dyd=<optimized out>, name=0x564a29966f38
"@test.lua", firstchar=114) at lparser.c:1986
#11 0x0000564a27d83514 in f_parser (L=0x564a299622a8,
ud=0x7ffdadcae280) at ldo.c:796
#12 0x0000564a27d819b6 in luaD_rawrunprotected (L=0x564a299622a8,
f=0x564a27d83417 <f_parser>, ud=0x7ffdadcae280) at ldo.c:148
#13 0x0000564a27d8330e in luaD_pcall (L=0x564a299622a8,
func=0x564a27d83417 <f_parser>, u=0x7ffdadcae280, old_top=80, ef=0) at
ldo.c:749
#14 0x0000564a27d835ff in luaD_protectedparser (L=0x564a299622a8,
z=0x7ffdadcae330, name=0x564a29966f38 "@test.lua", mode=0x0) at
ldo.c:813
#15 0x0000564a27d7e78c in lua_load (L=0x564a299622a8,
reader=0x564a27d9d79d <getF>, data=0x7ffdadcae3d0,
chunkname=0x564a29966f38 "@test.lua", mode=0x0) at lapi.c:1053
#16 0x0000564a27d9dbc7 in luaL_loadfilex (L=0x564a299622a8,
filename=0x7ffdadcb14d6 "test.lua", mode=0x0) at lauxlib.c:776
#17 0x0000564a27d7ae0b in handle_script (L=0x564a299622a8,
argv=0x7ffdadcb0870) at lua.c:225
#18 0x0000564a27d7b9ff in pmain (L=0x564a299622a8) at lua.c:603
#19 0x0000564a27d828d0 in luaD_call (L=0x564a299622a8,
func=0x564a29962910, nresults=1) at ldo.c:482
#20 0x0000564a27d82b6d in luaD_callnoyield (L=0x564a299622a8,
func=0x564a29962910, nResults=1) at ldo.c:526
#21 0x0000564a27d7e53a in f_call (L=0x564a299622a8, ud=0x7ffdadcb0720)
at lapi.c:997
#22 0x0000564a27d819b6 in luaD_rawrunprotected (L=0x564a299622a8,
f=0x564a27d7e505 <f_call>, ud=0x7ffdadcb0720) at ldo.c:148
#23 0x0000564a27d8330e in luaD_pcall (L=0x564a299622a8,
func=0x564a27d7e505 <f_call>, u=0x7ffdadcb0720, old_top=16, ef=0) at
ldo.c:749
#24 0x0000564a27d7e60a in lua_pcallk (L=0x564a299622a8, nargs=2,
nresults=1, errfunc=0, ctx=0, k=0x0) at lapi.c:1023
#25 0x0000564a27d7bb24 in main (argc=2, argv=0x7ffdadcb0868) at lua.c:629
(rr) print *ls
$3 = {current = 41, linenumber = 1, lastline = 1, t = {token = 40,
seminfo = {r = 4.6875231737122726e-310, i = 94876525278960, ts =
0x564a299636f0}}, lookahead = {token = 289, seminfo = {r =
4.6875231734557538e-310, i = 94876525273768, ts = 0x564a299622a8}}, fs
= 0x7ffdadcadf98, L = 0x564a299622a8,
  z = 0x7ffdadcae330, buff = 0x7ffdadcae288, h = 0x564a299698a0, dyd =
0x7ffdadcae2a0, source = 0x564a29966f20, envn = 0x564a29963540}
(rr) print *clp
$4 = {next = 0x564a299698a0, tt = 10 '\n', marked = 16 '\020',
numparams = 0 '\000', is_vararg = 0 '\000', maxstacksize = 0 '\000',
sizeupvalues = 0, sizek = 0, sizecode = 0, sizelineinfo = 0, sizep =
0, sizelocvars = 0, sizeabslineinfo = 0, linedefined = 0,
lastlinedefined = 0, k = 0x0, code = 0x0, p = 0x0,
  upvalues = 0x0, lineinfo = 0x0, abslineinfo = 0x0, locvars = 0x0,
source = 0x0, gclist = 0x0}

As you see I've recorded this using 'rr', so I can go back and forth
in time ('reverse next', 'reverse continue', etc.) to inspect program
state; it's compiled at -O0 so everything should be visible. Wish I
knew what I needed; please advise!

Best regards,

--
DoubleF
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

BogdanM
Hi Sergey,

On Sat, Jun 13, 2020 at 5:31 PM Sergey Zakharchenko <[hidden email]> wrote:
 
This cannot be ruled out; shall I run a similar test against 5.3.5?

It would be helpful if you could do that, yes. Thank you.

 Best regards,
Bogdan
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Sergey Zakharchenko
In reply to this post by Sergey Zakharchenko
[-rc5 crash on memory allocation failure]

Minor data point: the current '#336/post-#357' crash is invariant to
sizes of strings, including making them too long to be interned
(somehow I suspected reallocating the string table could be related,
but seems like it isn't). Maybe this matters.

Best regards,
--
DoubleF
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Sergey Zakharchenko
In reply to this post by BogdanM
Bogdan,

Bogdan Marinescu <[hidden email]>:
>> This cannot be ruled out; shall I run a similar test against 5.3.5?
> It would be helpful if you could do that, yes. Thank you.

I have been unable to reproduce the issue with the test I supplied and
a brute-force scan (as of GitHub tag v5.3.5 with a patch similar to
the one I already sent; I also needed to disable debug_realloc there).

Best regards,

--
DoubleF
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Sergey Zakharchenko
In reply to this post by Sergey Zakharchenko
Sergey Zakharchenko <[hidden email]>:
> [-rc5 crash on memory allocation failure]
>
> Minor data point: the current '#336/post-#357' crash is invariant to
> sizes of strings, including making them too long to be interned

In fact, making all strings the same or dropping them entirely doesn't
seem to change the location and nature of this particular crash.

Best regards,

--
DoubleF
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Sergey Zakharchenko
Sergey Zakharchenko <[hidden email]>:
> > [-rc5 crash on memory allocation failure]
> >
> > Minor data point: the current '#336/post-#357' crash is invariant to
> > sizes of strings, including making them too long to be interned
>
> In fact, making all strings the same or dropping them entirely doesn't
> seem to change the location and nature of this particular crash.

I know you're tired of my messages, but behold the ultimate test case
for this particular (no longer sure it's one and the same bug
throughout this thread):

$ ./lua /dev/null
[expected fprintf noise, but completes successfully as expected for an
empty file]

$ LUA_FAIL_ALLOC_AT=336 ./lua /dev/null
[misc output]
./lua: /dev/null:-32: attempt to index a nil value

NB: to compare, in interactive mode, appearance of matches allocation
#332, and appearance of the prompt matches #333.

00000332 0x55e4939bef20 25 (nil) 0
Lua 5.4.0  Copyright (C) 1994-2020 Lua.org, PUC-Rio
00000333 (nil) 4 0x55e4939bef20 32

Best regards,

--
DoubleF
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Andrew Gierth
In reply to this post by Sergey Zakharchenko
>>>>> "Sergey" == Sergey Zakharchenko <[hidden email]> writes:

 Sergey> [-rc5 crash on memory allocation failure]

So my question is this:

The (original) test case basically does a ton of parsing in which a
whole lot of prototypes get generated and added to other prototypes,
in the context of building up one single return statement.

Where in this are those prototypes supposed to be reached from the GC?
i.e. what's the intended path by which the GC is supposed to find them?

The sequence of events I'm seeing is that there's a GC run happening
while we're still in the parser, and after the atomic stage it's finding
(and sweeping) a whole lot of prototypes that are marked white, which
surely should not be possible because all the prototypes being generated
in the parse are presumably still referenced from somewhere. (And the
crashes I get from references to freed memory confirm that they are
still referenced.)

--
Andrew.
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Andrew Gierth
>>>>> "Andrew" == Andrew Gierth <[hidden email]> writes:

 Andrew> So my question is this:

 Andrew> The (original) test case basically does a ton of parsing in
 Andrew> which a whole lot of prototypes get generated and added to
 Andrew> other prototypes, in the context of building up one single
 Andrew> return statement.

 Andrew> Where in this are those prototypes supposed to be reached from
 Andrew> the GC? i.e. what's the intended path by which the GC is
 Andrew> supposed to find them?

Answering my own question: the top-level closure is supposed to be on
the stack (and indeed it is), and everything else should be reached
recursively from that.

But (this is while stopped in a deeply nested part of the parser, but
examining the state of the luaY_parser frame):

(gdb) print funcstate.f
$27 = (Proto *) 0x801614200

(gdb) print *((LClosure*)L->stack[5].val.value_.p)
$30 = {next = 0x80162e170, tt = 6 '\006', marked = 36 '$', nupvalues = 1 '\001', gclist = 0x0, p = 0x801614200, upvals = {0x0}}

So here's the closure on the stack whose prototype (p) points to the
top-level prototype created by luaY_parser. Notice that marked=36, which
is BLACK | G_OLD.

But:

(gdb) print *((LClosure*)L->stack[5].val.value_.p)->p
$31 = {next = 0x801621d00, tt = 10 '\n', marked = 16 '\020', numparams = 0 '\000', is_vararg = 1 '\001', maxstacksize = 2 '\002',
  sizeupvalues = 4, sizek = 0, sizecode = 4, sizelineinfo = 4, sizep = 4, sizelocvars = 0, sizeabslineinfo = 0, linedefined = 0,
  lastlinedefined = 0, k = 0x0, code = 0x801667020, p = 0x801651b40, upvalues = 0x801621d80,
  lineinfo = 0x801668020 "\001", '\245' <repeats 39 times>, "\001&", abslineinfo = 0x0, locvars = 0x0, source = 0x80162e170,
  gclist = 0xa5a5a5a5a5a5a5a5}

Here's the Proto that the closure points to, and it has marked = 16
(i.e. WHITE1). So here we have a classic black->white pointer scenario,
and none of the prototypes nested under this one will be scanned because
everything stops at the black closure.

So the bug is here in luaY_parser:

  funcstate.f = cl->p = luaF_newproto(L);
  /* needs another barrier here */
  funcstate.f->source = luaS_new(L, name);  /* create and anchor TString */
  luaC_objbarrier(L, funcstate.f, funcstate.f->source);

The GC forced by the memory failure happened inside luaF_newproto, and
resulted in the closure "cl" being marked both BLACK and OLD. Without a
barrier at the marked location, we have cl (black) pointing at the new
proto (white) and everything blows up later as a result.

--
Andrew.
Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.4.0-rc5 segfault in low memory conditions

Andrew Gierth
In reply to this post by Sergey Zakharchenko
>>>>> "Sergey" == Sergey Zakharchenko <[hidden email]> writes:

 Sergey> I know you're tired of my messages, but behold the ultimate
 Sergey> test case for this particular (no longer sure it's one and the
 Sergey> same bug throughout this thread):

See if this fixes both of them:

--- src/lparser.c.orig  2020-06-13 19:57:47.428734000 +0100
+++ src/lparser.c       2020-06-13 19:58:08.039573000 +0100
@@ -1977,6 +1977,7 @@
   sethvalue2s(L, L->top, lexstate.h);  /* anchor it */
   luaD_inctop(L);
   funcstate.f = cl->p = luaF_newproto(L);
+  luaC_objbarrier(L, cl, cl->p);
   funcstate.f->source = luaS_new(L, name);  /* create and anchor TString */
   luaC_objbarrier(L, funcstate.f, funcstate.f->source);
   lexstate.buff = buff;


--
Andrew.
12