[LPeg] intermittent stack exhaustion

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

[LPeg] intermittent stack exhaustion

Sebastian Cato
Hello,

I started writing a markdown (or a dialect thereof) to HTML translator to play around with LPeg.

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Sebastian Cato
Sorry, hit enter too early. Should make a habit of filling out the address last.

So, I started writing a markdown (or a dialect thereof) to HTML
translator to play around with LPeg. The code is at:

https://github.com/sebcat/lpeg-markdown

While running markdown_test.lua, the interpreter crashes intermittently.

a commit that exhibits this problem, should the repo change:

git checkout c82794f8f2637ac4161f21113bbf4f23aa34906e

platform info:

$ uname -a
Linux genesis 4.6.6-200.fc23.x86_64 #1 SMP Wed Aug 10 23:13:35 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux
$ lua -v
Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> lpeg = require "lpeg"
> lpeg.version()
1.0.0

Since the problem only occurs intermittently, I run markdown_test.lua as such:
#!/usr/bin/bash -e
while true; do
  ./markdown_test.lua
done

Looking at the core dump, I see:

#0  0x00007f3fb7060889 in hascaptures (tree=0x0) at lpcode.c:131
#1  0x00007f3fb706092c in hascaptures (tree=0x56012245c45c) at lpcode.c:144
#2  0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
#3  0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
#4  0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
#5  0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
#6  0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
#7  0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
#8  0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
#9  0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
#10 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
#11 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
#12 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
#13 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
#14 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
#15 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
#16 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
#17 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
#18 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
#19 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
#20 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
#21 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
#22 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
#23 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
#24 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
#25 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
#26 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144

[...]

#261630 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
#261631 0x00007f3fb706092c in hascaptures (tree=0x56012245c18c) at lpcode.c:144
#261632 0x00007f3fb706092c in hascaptures (tree=0x56012245c10c) at lpcode.c:144
#261633 0x00007f3fb706092c in hascaptures (tree=0x56012245c674) at lpcode.c:144
#261634 0x00007f3fb7061c6e in codecapture (compst=0x7ffef2f91360,
tree=0x56012245c66c, tt=-1, fl=0x7ffef2f901a0) at lpcode.c:720
#261635 0x00007f3fb706266a in codegen (compst=0x7ffef2f91360,
tree=0x56012245c66c, opt=0, tt=-1, fl=0x7ffef2f901a0) at lpcode.c:905
#261636 0x00007f3fb7061af1 in codechoice (compst=0x7ffef2f91360,
p1=0x56012245c664, p2=0x56012245c66c, opt=0, fl=0x7ffef2f901a0) at
lpcode.c:682
#261637 0x00007f3fb70625db in codegen (compst=0x7ffef2f91360,
tree=0x56012245c65c, opt=0, tt=-1, fl=0x7ffef2f901a0) at lpcode.c:900
#261638 0x00007f3fb7062491 in codeseq1 (compst=0x7ffef2f91360,
p1=0x56012245c65c, p2=0x56012245c694, tt=-1, fl=0x7f3fb7063d40
<fullset_>) at lpcode.c:876
#261639 0x00007f3fb70626f3 in codegen (compst=0x7ffef2f91360,
tree=0x56012245c654, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
lpcode.c:910
#261640 0x00007f3fb7061d18 in codecapture (compst=0x7ffef2f91360,
tree=0x56012245c64c, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
lpcode.c:726
#261641 0x00007f3fb706266a in codegen (compst=0x7ffef2f91360,
tree=0x56012245c64c, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
lpcode.c:905
#261642 0x00007f3fb7061da0 in coderuntime (compst=0x7ffef2f91360,
tree=0x56012245c644, tt=-1) at lpcode.c:734
#261643 0x00007f3fb7062685 in codegen (compst=0x7ffef2f91360,
tree=0x56012245c644, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
lpcode.c:906
#261644 0x00007f3fb70622f6 in codegrammar (compst=0x7ffef2f91360,
grammar=0x56012245c02c) at lpcode.c:850
#261645 0x00007f3fb706269d in codegen (compst=0x7ffef2f91360,
tree=0x56012245c02c, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
lpcode.c:907
#261646 0x00007f3fb7061d18 in codecapture (compst=0x7ffef2f91360,
tree=0x56012245c024, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
lpcode.c:726
#261647 0x00007f3fb706266a in codegen (compst=0x7ffef2f91360,
tree=0x56012245c024, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
lpcode.c:905
#261648 0x00007f3fb70629b1 in compile (L=0x560122446018,
p=0x56012245c018) at lpcode.c:977
#261649 0x00007f3fb705fd7d in prepcompile (L=0x560122446018,
p=0x56012245c018, idx=1) at lptree.c:1099
#261650 0x00007f3fb705ff7b in lp_match (L=0x560122446018) at lptree.c:1154
#261651 0x00007f3fb81d97aa in luaD_precall (L=L@entry=0x560122446018,
func=func@entry=0x560122456170, nresults=nresults@entry=1) at
ldo.c:365
#261652 0x00007f3fb81ef5fd in luaV_execute (L=L@entry=0x560122446018)
at lvm.c:1134
#261653 0x00007f3fb81d9b9f in luaD_call (L=L@entry=0x560122446018,
func=<optimized out>, nResults=<optimized out>) at ldo.c:496
#261654 0x00007f3fb81d9bf1 in luaD_callnoyield (L=0x560122446018,
func=<optimized out>, nResults=<optimized out>) at ldo.c:506
#261655 0x00007f3fb81d8fc2 in luaD_rawrunprotected
(L=L@entry=0x560122446018, f=f@entry=0x7f3fb81cf250 <f_call>,
ud=ud@entry=0x7ffef2f918a0) at ldo.c:142
#261656 0x00007f3fb81d9e7d in luaD_pcall (L=L@entry=0x560122446018,
func=func@entry=0x7f3fb81cf250 <f_call>, u=u@entry=0x7ffef2f918a0,
old_top=80, ef=<optimized out>)
    at ldo.c:727
#261657 0x00007f3fb81d0851 in lua_pcallk (L=0x560122446018, nargs=0,
nresults=-1, errfunc=<optimized out>, ctx=0, k=0x0) at lapi.c:968
#261658 0x0000560121f058ab in docall (L=0x560122446018, narg=0,
nres=-1) at lua.c:203
#261659 0x0000560121f0646c in handle_script (argv=<optimized out>,
L=0x560122446018) at lua.c:443
#261660 pmain (L=0x560122446018) at lua.c:577
#261661 0x00007f3fb81d97aa in luaD_precall (L=L@entry=0x560122446018,
func=0x560122446640, nresults=1) at ldo.c:365
#261662 0x00007f3fb81d9b93 in luaD_call (L=L@entry=0x560122446018,
func=<optimized out>, nResults=<optimized out>) at ldo.c:495
#261663 0x00007f3fb81d9bf1 in luaD_callnoyield (L=0x560122446018,
func=<optimized out>, nResults=<optimized out>) at ldo.c:506
#261664 0x00007f3fb81d8fc2 in luaD_rawrunprotected
(L=L@entry=0x560122446018, f=f@entry=0x7f3fb81cf250 <f_call>,
ud=ud@entry=0x7ffef2f91b40) at ldo.c:142
#261665 0x00007f3fb81d9e7d in luaD_pcall (L=L@entry=0x560122446018,
func=func@entry=0x7f3fb81cf250 <f_call>, u=u@entry=0x7ffef2f91b40,
old_top=16, ef=<optimized out>)
    at ldo.c:727
#261666 0x00007f3fb81d0851 in lua_pcallk (L=0x560122446018, nargs=2,
nresults=1, errfunc=<optimized out>, ctx=0, k=0x0) at lapi.c:968
#261667 0x0000560121f0565b in main (argc=2, argv=0x7ffef2f91c88) at lua.c:603

I'm not familiar enough with LPeg or PEG in general to know what's
causing the crash. Any ideas?

Again, sorry for sending a partial e-mail earlier.

//Sebastian Cato

On Sun, Sep 11, 2016 at 6:12 PM, Sebastian Cato <[hidden email]> wrote:
>
> Hello,
>
> I started writing a markdown (or a dialect thereof) to HTML translator to play around with LPeg.
>

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Sebastian Cato
Hello again,

I've managed to reduce the test case to something a bit more managable:

#!/usr/bin/env lua
lpeg = require "lpeg"

print(lpeg.P{
  "Line";
  P = lpeg.P"a",
  Q = lpeg.V"P",
  Line = (lpeg.P"x" * lpeg.C(1-lpeg.V"P")),
}:match("xx"))

I've also tested it on FreeBSD with Lua 5.2.4 and LPeg 1.0.0 with the
same intermittent segfault happening.

//Sebastian Cato

On Sun, Sep 11, 2016 at 6:22 PM, Sebastian Cato <[hidden email]> wrote:

> Sorry, hit enter too early. Should make a habit of filling out the address last.
>
> So, I started writing a markdown (or a dialect thereof) to HTML
> translator to play around with LPeg. The code is at:
>
> https://github.com/sebcat/lpeg-markdown
>
> While running markdown_test.lua, the interpreter crashes intermittently.
>
> a commit that exhibits this problem, should the repo change:
>
> git checkout c82794f8f2637ac4161f21113bbf4f23aa34906e
>
> platform info:
>
> $ uname -a
> Linux genesis 4.6.6-200.fc23.x86_64 #1 SMP Wed Aug 10 23:13:35 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
> $ lua -v
> Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
>> lpeg = require "lpeg"
>> lpeg.version()
> 1.0.0
>
> Since the problem only occurs intermittently, I run markdown_test.lua as such:
> #!/usr/bin/bash -e
> while true; do
>   ./markdown_test.lua
> done
>
> Looking at the core dump, I see:
>
> #0  0x00007f3fb7060889 in hascaptures (tree=0x0) at lpcode.c:131
> #1  0x00007f3fb706092c in hascaptures (tree=0x56012245c45c) at lpcode.c:144
> #2  0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
> #3  0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
> #4  0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
> #5  0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
> #6  0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
> #7  0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
> #8  0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
> #9  0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
> #10 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
> #11 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
> #12 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
> #13 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
> #14 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
> #15 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
> #16 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
> #17 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
> #18 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
> #19 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
> #20 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
> #21 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
> #22 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
> #23 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
> #24 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
> #25 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
> #26 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
>
> [...]
>
> #261630 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
> #261631 0x00007f3fb706092c in hascaptures (tree=0x56012245c18c) at lpcode.c:144
> #261632 0x00007f3fb706092c in hascaptures (tree=0x56012245c10c) at lpcode.c:144
> #261633 0x00007f3fb706092c in hascaptures (tree=0x56012245c674) at lpcode.c:144
> #261634 0x00007f3fb7061c6e in codecapture (compst=0x7ffef2f91360,
> tree=0x56012245c66c, tt=-1, fl=0x7ffef2f901a0) at lpcode.c:720
> #261635 0x00007f3fb706266a in codegen (compst=0x7ffef2f91360,
> tree=0x56012245c66c, opt=0, tt=-1, fl=0x7ffef2f901a0) at lpcode.c:905
> #261636 0x00007f3fb7061af1 in codechoice (compst=0x7ffef2f91360,
> p1=0x56012245c664, p2=0x56012245c66c, opt=0, fl=0x7ffef2f901a0) at
> lpcode.c:682
> #261637 0x00007f3fb70625db in codegen (compst=0x7ffef2f91360,
> tree=0x56012245c65c, opt=0, tt=-1, fl=0x7ffef2f901a0) at lpcode.c:900
> #261638 0x00007f3fb7062491 in codeseq1 (compst=0x7ffef2f91360,
> p1=0x56012245c65c, p2=0x56012245c694, tt=-1, fl=0x7f3fb7063d40
> <fullset_>) at lpcode.c:876
> #261639 0x00007f3fb70626f3 in codegen (compst=0x7ffef2f91360,
> tree=0x56012245c654, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
> lpcode.c:910
> #261640 0x00007f3fb7061d18 in codecapture (compst=0x7ffef2f91360,
> tree=0x56012245c64c, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
> lpcode.c:726
> #261641 0x00007f3fb706266a in codegen (compst=0x7ffef2f91360,
> tree=0x56012245c64c, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
> lpcode.c:905
> #261642 0x00007f3fb7061da0 in coderuntime (compst=0x7ffef2f91360,
> tree=0x56012245c644, tt=-1) at lpcode.c:734
> #261643 0x00007f3fb7062685 in codegen (compst=0x7ffef2f91360,
> tree=0x56012245c644, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
> lpcode.c:906
> #261644 0x00007f3fb70622f6 in codegrammar (compst=0x7ffef2f91360,
> grammar=0x56012245c02c) at lpcode.c:850
> #261645 0x00007f3fb706269d in codegen (compst=0x7ffef2f91360,
> tree=0x56012245c02c, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
> lpcode.c:907
> #261646 0x00007f3fb7061d18 in codecapture (compst=0x7ffef2f91360,
> tree=0x56012245c024, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
> lpcode.c:726
> #261647 0x00007f3fb706266a in codegen (compst=0x7ffef2f91360,
> tree=0x56012245c024, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
> lpcode.c:905
> #261648 0x00007f3fb70629b1 in compile (L=0x560122446018,
> p=0x56012245c018) at lpcode.c:977
> #261649 0x00007f3fb705fd7d in prepcompile (L=0x560122446018,
> p=0x56012245c018, idx=1) at lptree.c:1099
> #261650 0x00007f3fb705ff7b in lp_match (L=0x560122446018) at lptree.c:1154
> #261651 0x00007f3fb81d97aa in luaD_precall (L=L@entry=0x560122446018,
> func=func@entry=0x560122456170, nresults=nresults@entry=1) at
> ldo.c:365
> #261652 0x00007f3fb81ef5fd in luaV_execute (L=L@entry=0x560122446018)
> at lvm.c:1134
> #261653 0x00007f3fb81d9b9f in luaD_call (L=L@entry=0x560122446018,
> func=<optimized out>, nResults=<optimized out>) at ldo.c:496
> #261654 0x00007f3fb81d9bf1 in luaD_callnoyield (L=0x560122446018,
> func=<optimized out>, nResults=<optimized out>) at ldo.c:506
> #261655 0x00007f3fb81d8fc2 in luaD_rawrunprotected
> (L=L@entry=0x560122446018, f=f@entry=0x7f3fb81cf250 <f_call>,
> ud=ud@entry=0x7ffef2f918a0) at ldo.c:142
> #261656 0x00007f3fb81d9e7d in luaD_pcall (L=L@entry=0x560122446018,
> func=func@entry=0x7f3fb81cf250 <f_call>, u=u@entry=0x7ffef2f918a0,
> old_top=80, ef=<optimized out>)
>     at ldo.c:727
> #261657 0x00007f3fb81d0851 in lua_pcallk (L=0x560122446018, nargs=0,
> nresults=-1, errfunc=<optimized out>, ctx=0, k=0x0) at lapi.c:968
> #261658 0x0000560121f058ab in docall (L=0x560122446018, narg=0,
> nres=-1) at lua.c:203
> #261659 0x0000560121f0646c in handle_script (argv=<optimized out>,
> L=0x560122446018) at lua.c:443
> #261660 pmain (L=0x560122446018) at lua.c:577
> #261661 0x00007f3fb81d97aa in luaD_precall (L=L@entry=0x560122446018,
> func=0x560122446640, nresults=1) at ldo.c:365
> #261662 0x00007f3fb81d9b93 in luaD_call (L=L@entry=0x560122446018,
> func=<optimized out>, nResults=<optimized out>) at ldo.c:495
> #261663 0x00007f3fb81d9bf1 in luaD_callnoyield (L=0x560122446018,
> func=<optimized out>, nResults=<optimized out>) at ldo.c:506
> #261664 0x00007f3fb81d8fc2 in luaD_rawrunprotected
> (L=L@entry=0x560122446018, f=f@entry=0x7f3fb81cf250 <f_call>,
> ud=ud@entry=0x7ffef2f91b40) at ldo.c:142
> #261665 0x00007f3fb81d9e7d in luaD_pcall (L=L@entry=0x560122446018,
> func=func@entry=0x7f3fb81cf250 <f_call>, u=u@entry=0x7ffef2f91b40,
> old_top=16, ef=<optimized out>)
>     at ldo.c:727
> #261666 0x00007f3fb81d0851 in lua_pcallk (L=0x560122446018, nargs=2,
> nresults=1, errfunc=<optimized out>, ctx=0, k=0x0) at lapi.c:968
> #261667 0x0000560121f0565b in main (argc=2, argv=0x7ffef2f91c88) at lua.c:603
>
> I'm not familiar enough with LPeg or PEG in general to know what's
> causing the crash. Any ideas?
>
> Again, sorry for sending a partial e-mail earlier.
>
> //Sebastian Cato
>
> On Sun, Sep 11, 2016 at 6:12 PM, Sebastian Cato <[hidden email]> wrote:
>>
>> Hello,
>>
>> I started writing a markdown (or a dialect thereof) to HTML translator to play around with LPeg.
>>

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Patrick Donnelly
In reply to this post by Sebastian Cato
Hi Sebastian,

On Sun, Sep 11, 2016 at 12:22 PM, Sebastian Cato <[hidden email]> wrote:

> Since the problem only occurs intermittently, I run markdown_test.lua as such:
> #!/usr/bin/bash -e
> while true; do
>   ./markdown_test.lua
> done
>
> Looking at the core dump, I see:
>
> #0  0x00007f3fb7060889 in hascaptures (tree=0x0) at lpcode.c:131
> #1  0x00007f3fb706092c in hascaptures (tree=0x56012245c45c) at lpcode.c:144
> #2  0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
> #3  0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
> #4  0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
> [...]

We saw the same problem with Nmap in a PR merging LPeg v1.0.0:
https://github.com/nmap/nmap/pull/478

I hadn't had time to find a trivial test case so thank you Sebastian
for doing that. Hopefully Roberto sees an easy way to fix it :)

--
Patrick Donnelly

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Sebastian Cato
In reply to this post by Sebastian Cato
I tested with LPeg versions < 1.0.0. It seems this bug is present in
lpeg >= 0.12.1.

The problem happens when the parse tree is compiled to byte code.
I built LPeg 1.0.0 in debug mode and modified the test case to print
the tree before it's compiled.
I also reduced the test case further.


#!/usr/bin/env lua
lpeg = require "lpeg"

p = lpeg.P{
  "Line";
  P = lpeg.P"a",
  Q = lpeg.V"P",
  Line = lpeg.P"x" * lpeg.C(lpeg.V"P"),
}

p:ptree()
p:match("xx")


A sucessfully compiled tree looks like:

[1 = P  2 = P  3 = Line  ]
grammar 3
  rule n: 0  key: 3 -- Line = lpeg.P"x" * lpeg.C(lpeg.V"P"),
    seq
      char 'x'
      capture cap: 5  key: 0  n: 0
        call key: 1
  rule n: 1  key: 0  -- Q = lpeg.V"P"
    call key: 2
  rule n: 2  key: 2  -- P = lpeg.P"a"
    char 'a'

While a tree that fails compilation looks like:

[1 = P  2 = P  3 = Line  ]
grammar 3
  rule n: 0  key: 3 -- Line = lpeg.P"x" * lpeg.C(lpeg.V"P"),
    seq
      char 'x'
      capture cap: 5  key: 0  n: 0
        call key: 1
  rule n: 1  key: 2 -- P = lpeg.P"a"
    char 'a'
  rule n: 2  key: 0 -- Q = lpeg.V"P"
    call key: 2
Segmentation fault

I think the rule ordering is different because of string (key) hash
randomization.

If I understand things correctly (and I may not), it seems that the
"Q" rule references itself instead of "P" (call key: 2 instead of call
key: 1 in the failing tree above). This  leads to infinite
recursion/stack exhaustion during compilation.

It's also related to captures somehow. Without the capture, the tree
is built correctly. With the capture, the segfault happens.

I've been looking at TOpenCall -> TCall resolution in
lptree.c:finalfix, lptree.c:fixonecall and calls to
lptree.c:correctkeys when joining ktables, but I'm afraid I don't
really understand how it works at this point.

Sorry for sending a lot of emails, but I figure I might as well keep
anyone looking at this updated.

On Sun, Sep 11, 2016 at 11:09 PM, Sebastian Cato <[hidden email]> wrote:

> Hello again,
>
> I've managed to reduce the test case to something a bit more managable:
>
> #!/usr/bin/env lua
> lpeg = require "lpeg"
>
> print(lpeg.P{
>   "Line";
>   P = lpeg.P"a",
>   Q = lpeg.V"P",
>   Line = (lpeg.P"x" * lpeg.C(1-lpeg.V"P")),
> }:match("xx"))
>
> I've also tested it on FreeBSD with Lua 5.2.4 and LPeg 1.0.0 with the
> same intermittent segfault happening.
>
> //Sebastian Cato
>
> On Sun, Sep 11, 2016 at 6:22 PM, Sebastian Cato <[hidden email]> wrote:
>> Sorry, hit enter too early. Should make a habit of filling out the address last.
>>
>> So, I started writing a markdown (or a dialect thereof) to HTML
>> translator to play around with LPeg. The code is at:
>>
>> https://github.com/sebcat/lpeg-markdown
>>
>> While running markdown_test.lua, the interpreter crashes intermittently.
>>
>> a commit that exhibits this problem, should the repo change:
>>
>> git checkout c82794f8f2637ac4161f21113bbf4f23aa34906e
>>
>> platform info:
>>
>> $ uname -a
>> Linux genesis 4.6.6-200.fc23.x86_64 #1 SMP Wed Aug 10 23:13:35 UTC
>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>> $ lua -v
>> Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
>>> lpeg = require "lpeg"
>>> lpeg.version()
>> 1.0.0
>>
>> Since the problem only occurs intermittently, I run markdown_test.lua as such:
>> #!/usr/bin/bash -e
>> while true; do
>>   ./markdown_test.lua
>> done
>>
>> Looking at the core dump, I see:
>>
>> #0  0x00007f3fb7060889 in hascaptures (tree=0x0) at lpcode.c:131
>> #1  0x00007f3fb706092c in hascaptures (tree=0x56012245c45c) at lpcode.c:144
>> #2  0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
>> #3  0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
>> #4  0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
>> #5  0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
>> #6  0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
>> #7  0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
>> #8  0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
>> #9  0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
>> #10 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
>> #11 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
>> #12 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
>> #13 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
>> #14 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
>> #15 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
>> #16 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
>> #17 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
>> #18 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
>> #19 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
>> #20 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
>> #21 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
>> #22 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
>> #23 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
>> #24 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
>> #25 0x00007f3fb706092c in hascaptures (tree=0x56012245c46c) at lpcode.c:144
>> #26 0x00007f3fb706092c in hascaptures (tree=0x56012245c454) at lpcode.c:144
>>
>> [...]
>>
>> #261630 0x00007f3fb706092c in hascaptures (tree=0x56012245c1dc) at lpcode.c:144
>> #261631 0x00007f3fb706092c in hascaptures (tree=0x56012245c18c) at lpcode.c:144
>> #261632 0x00007f3fb706092c in hascaptures (tree=0x56012245c10c) at lpcode.c:144
>> #261633 0x00007f3fb706092c in hascaptures (tree=0x56012245c674) at lpcode.c:144
>> #261634 0x00007f3fb7061c6e in codecapture (compst=0x7ffef2f91360,
>> tree=0x56012245c66c, tt=-1, fl=0x7ffef2f901a0) at lpcode.c:720
>> #261635 0x00007f3fb706266a in codegen (compst=0x7ffef2f91360,
>> tree=0x56012245c66c, opt=0, tt=-1, fl=0x7ffef2f901a0) at lpcode.c:905
>> #261636 0x00007f3fb7061af1 in codechoice (compst=0x7ffef2f91360,
>> p1=0x56012245c664, p2=0x56012245c66c, opt=0, fl=0x7ffef2f901a0) at
>> lpcode.c:682
>> #261637 0x00007f3fb70625db in codegen (compst=0x7ffef2f91360,
>> tree=0x56012245c65c, opt=0, tt=-1, fl=0x7ffef2f901a0) at lpcode.c:900
>> #261638 0x00007f3fb7062491 in codeseq1 (compst=0x7ffef2f91360,
>> p1=0x56012245c65c, p2=0x56012245c694, tt=-1, fl=0x7f3fb7063d40
>> <fullset_>) at lpcode.c:876
>> #261639 0x00007f3fb70626f3 in codegen (compst=0x7ffef2f91360,
>> tree=0x56012245c654, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
>> lpcode.c:910
>> #261640 0x00007f3fb7061d18 in codecapture (compst=0x7ffef2f91360,
>> tree=0x56012245c64c, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
>> lpcode.c:726
>> #261641 0x00007f3fb706266a in codegen (compst=0x7ffef2f91360,
>> tree=0x56012245c64c, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
>> lpcode.c:905
>> #261642 0x00007f3fb7061da0 in coderuntime (compst=0x7ffef2f91360,
>> tree=0x56012245c644, tt=-1) at lpcode.c:734
>> #261643 0x00007f3fb7062685 in codegen (compst=0x7ffef2f91360,
>> tree=0x56012245c644, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
>> lpcode.c:906
>> #261644 0x00007f3fb70622f6 in codegrammar (compst=0x7ffef2f91360,
>> grammar=0x56012245c02c) at lpcode.c:850
>> #261645 0x00007f3fb706269d in codegen (compst=0x7ffef2f91360,
>> tree=0x56012245c02c, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
>> lpcode.c:907
>> #261646 0x00007f3fb7061d18 in codecapture (compst=0x7ffef2f91360,
>> tree=0x56012245c024, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
>> lpcode.c:726
>> #261647 0x00007f3fb706266a in codegen (compst=0x7ffef2f91360,
>> tree=0x56012245c024, opt=0, tt=-1, fl=0x7f3fb7063d40 <fullset_>) at
>> lpcode.c:905
>> #261648 0x00007f3fb70629b1 in compile (L=0x560122446018,
>> p=0x56012245c018) at lpcode.c:977
>> #261649 0x00007f3fb705fd7d in prepcompile (L=0x560122446018,
>> p=0x56012245c018, idx=1) at lptree.c:1099
>> #261650 0x00007f3fb705ff7b in lp_match (L=0x560122446018) at lptree.c:1154
>> #261651 0x00007f3fb81d97aa in luaD_precall (L=L@entry=0x560122446018,
>> func=func@entry=0x560122456170, nresults=nresults@entry=1) at
>> ldo.c:365
>> #261652 0x00007f3fb81ef5fd in luaV_execute (L=L@entry=0x560122446018)
>> at lvm.c:1134
>> #261653 0x00007f3fb81d9b9f in luaD_call (L=L@entry=0x560122446018,
>> func=<optimized out>, nResults=<optimized out>) at ldo.c:496
>> #261654 0x00007f3fb81d9bf1 in luaD_callnoyield (L=0x560122446018,
>> func=<optimized out>, nResults=<optimized out>) at ldo.c:506
>> #261655 0x00007f3fb81d8fc2 in luaD_rawrunprotected
>> (L=L@entry=0x560122446018, f=f@entry=0x7f3fb81cf250 <f_call>,
>> ud=ud@entry=0x7ffef2f918a0) at ldo.c:142
>> #261656 0x00007f3fb81d9e7d in luaD_pcall (L=L@entry=0x560122446018,
>> func=func@entry=0x7f3fb81cf250 <f_call>, u=u@entry=0x7ffef2f918a0,
>> old_top=80, ef=<optimized out>)
>>     at ldo.c:727
>> #261657 0x00007f3fb81d0851 in lua_pcallk (L=0x560122446018, nargs=0,
>> nresults=-1, errfunc=<optimized out>, ctx=0, k=0x0) at lapi.c:968
>> #261658 0x0000560121f058ab in docall (L=0x560122446018, narg=0,
>> nres=-1) at lua.c:203
>> #261659 0x0000560121f0646c in handle_script (argv=<optimized out>,
>> L=0x560122446018) at lua.c:443
>> #261660 pmain (L=0x560122446018) at lua.c:577
>> #261661 0x00007f3fb81d97aa in luaD_precall (L=L@entry=0x560122446018,
>> func=0x560122446640, nresults=1) at ldo.c:365
>> #261662 0x00007f3fb81d9b93 in luaD_call (L=L@entry=0x560122446018,
>> func=<optimized out>, nResults=<optimized out>) at ldo.c:495
>> #261663 0x00007f3fb81d9bf1 in luaD_callnoyield (L=0x560122446018,
>> func=<optimized out>, nResults=<optimized out>) at ldo.c:506
>> #261664 0x00007f3fb81d8fc2 in luaD_rawrunprotected
>> (L=L@entry=0x560122446018, f=f@entry=0x7f3fb81cf250 <f_call>,
>> ud=ud@entry=0x7ffef2f91b40) at ldo.c:142
>> #261665 0x00007f3fb81d9e7d in luaD_pcall (L=L@entry=0x560122446018,
>> func=func@entry=0x7f3fb81cf250 <f_call>, u=u@entry=0x7ffef2f91b40,
>> old_top=16, ef=<optimized out>)
>>     at ldo.c:727
>> #261666 0x00007f3fb81d0851 in lua_pcallk (L=0x560122446018, nargs=2,
>> nresults=1, errfunc=<optimized out>, ctx=0, k=0x0) at lapi.c:968
>> #261667 0x0000560121f0565b in main (argc=2, argv=0x7ffef2f91c88) at lua.c:603
>>
>> I'm not familiar enough with LPeg or PEG in general to know what's
>> causing the crash. Any ideas?
>>
>> Again, sorry for sending a partial e-mail earlier.
>>
>> //Sebastian Cato
>>
>> On Sun, Sep 11, 2016 at 6:12 PM, Sebastian Cato <[hidden email]> wrote:
>>>
>>> Hello,
>>>
>>> I started writing a markdown (or a dialect thereof) to HTML translator to play around with LPeg.
>>>

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Roberto Ierusalimschy
> Sorry for sending a lot of emails, but I figure I might as well keep
> anyone looking at this updated.

No problem. The more info we get the better.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Roberto Ierusalimschy
In reply to this post by Sebastian Cato
> While a tree that fails compilation looks like:
>
> [1 = P  2 = P  3 = Line  ]
> grammar 3
>   rule n: 0  key: 3 -- Line = lpeg.P"x" * lpeg.C(lpeg.V"P"),
>     seq
>       char 'x'
>       capture cap: 5  key: 0  n: 0
>         call key: 1
>   rule n: 1  key: 2 -- P = lpeg.P"a"
>     char 'a'
>   rule n: 2  key: 0 -- Q = lpeg.V"P"
>     call key: 2
> Segmentation fault
>
> I think the rule ordering is different because of string (key) hash
> randomization.
>
> If I understand things correctly (and I may not), it seems that the
> "Q" rule references itself instead of "P" (call key: 2 instead of call
> key: 1 in the failing tree above). This  leads to infinite
> recursion/stack exhaustion during compilation.

This is correct. The 'key' there is the index in the ktable (shown
in the beginning of the tree) of the key of the rule being called.
In that table, "2 = P", which means that the rule is calling "P"
(which is correct).

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Roberto Ierusalimschy
In reply to this post by Sebastian Cato
> I also reduced the test case further.
>
>
> #!/usr/bin/env lua
> lpeg = require "lpeg"
>
> p = lpeg.P{
>   "Line";
>   P = lpeg.P"a",
>   Q = lpeg.V"P",
>   Line = lpeg.P"x" * lpeg.C(lpeg.V"P"),
> }
>
> p:ptree()
> p:match("xx")
>
> [...]
>
> I think the rule ordering is different because of string (key) hash
> randomization.

We can simplify a little more that grammar, using numeric indices:

p = lpeg.P{
  (lpeg.P"x" * lpeg.C(1-lpeg.V(2))),
  lpeg.P"a",
  lpeg.V(2),
}

The use of numeric indices avoids the string hash randomization; as
a result, at least in my machine, we get a consistent seg. fault in
every run.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Sebastian Cato
>> If I understand things correctly (and I may not), it seems that the
>> "Q" rule references itself instead of "P" (call key: 2 instead of call
>> key: 1 in the failing tree above). This  leads to infinite
>> recursion/stack exhaustion during compilation.
>
> This is correct. The 'key' there is the index in the ktable (shown
> in the beginning of the tree) of the key of the rule being called.
> In that table, "2 = P", which means that the rule is calling "P"
> (which is correct).

Ah, so the tree itself is correct?

> [1 = P  2 = P  3 = Line  ]
> grammar 3
>   rule n: 0  key: 3
>     seq
>       char 'x'
>       capture cap: 5  key: 0  n: 0
>         call key: 1
>   rule n: 1  key: 2
>     char 'a'
>   rule n: 2  key: 0
>     call key: 2

rule 0 says "char 'x' followed by the capture of rule ktable[1]
rule 1 says "char 'a'"
rule 2 says "rule ktable[2]"

and rules are zero indexed and the ktable is indexed from 1. Am I
understanding it correctly?

I thought before that rule n: 2 [...] call key: 2 was a circular call,
but if it's in fact calling ktable[2] then that doesn't hold up
anymore. The stack trace also suggests that the problem happens when
generating code for the capture. For the test case above:

[...]
#261648 0x00007fd414f8c92c in hascaptures (tree=0x556503112b34) at lpcode.c:144
#261649 0x00007fd414f8c92c in hascaptures (tree=0x556503112b34) at lpcode.c:144
#261650 0x00007fd414f8c92c in hascaptures (tree=0x556503112b34) at lpcode.c:144
#261651 0x00007fd414f8dc6e in codecapture (compst=0x7ffc92cf02f0,
tree=0x556503112b14, tt=-1, fl=0x7fd414f8fd60 <fullset_>) at
lpcode.c:720
#261652 0x00007fd414f8e67b in codegen (compst=0x7ffc92cf02f0,
tree=0x556503112b14, opt=0, tt=-1, fl=0x7fd414f8fd60 <fullset_>) at
lpcode.c:907
#261653 0x00007fd414f8e303 in codegrammar (compst=0x7ffc92cf02f0,
grammar=0x556503112af4) at lpcode.c:851
#261654 0x00007fd414f8e6ae in codegen (compst=0x7ffc92cf02f0,
tree=0x556503112af4, opt=0, tt=-1, fl=0x7fd414f8fd60 <fullset_>) at
lpcode.c:909
#261655 0x00007fd414f8e9c2 in compile (L=0x556503109018,
p=0x556503112ae8) at lpcode.c:979
[...]

Another (semi-/unrelated) thing. In lpcode.c:codegrammar, code
generation is performed on all rules. I saw that lptree:verifygrammar
skips unreferenced rules. Is it safe to skip code generation on
unreferenced rules too? e.g.,

lpeg = require"lpeg"

lpeg.P{
  lpeg.V(2),
  lpeg.P"a",
  lpeg.P"z" + (1-lpeg.P"q")*lpeg.V(2),
}:pcode()

generates:

[1 = 2  2 = 2  3 = 1  ]
00: call -> 7
02: end
03: any
04: jmp -> 7
06: ret
07: char 'a'
08: ret
09: testchar 'z'-> 14
11: any
12: ret
13: any
14: set [(00-70)(72-ff)]
23: jmp -> 7
25: ret
26: end

with a patch to lpcode.c:codegrammar [1], it generates:

[1 = 2  2 = 2  3 = 1  ]
00: call -> 7
02: end
03: any
04: jmp -> 7
06: ret
07: char 'a'
08: ret
09: ret
10: end

It's probably possible to skip the extra 'ret' too, but there was an
assertion on that rules should end with IRet in lpcode.c:correctcalls
which I didn't want to touch. test.lua passed when I ran the patched
version against it.

[1]:
--- lpeg-1.0.0/lpcode.c    2015-09-28 19:40:46.000000000 +0200
+++ lpeg-1.0.0_branch/lpcode.c    2016-09-13 09:20:07.209067309 +0200
@@ -847,7 +847,9 @@
   jumptohere(compst, firstcall);
   for (rule = sib1(grammar); rule->tag == TRule; rule = sib2(rule)) {
     positions[rulenumber++] = gethere(compst);  /* save rule position */
-    codegen(compst, sib1(rule), 0, NOINST, fullset);  /* code rule */
+    if (rule->key > 0) { /* only generate code for referenced rules */
+      codegen(compst, sib1(rule), 0, NOINST, fullset);  /* code rule */
+    }
     addinstruction(compst, IRet, 0);
   }
   assert(rule->tag == TTrue);

On Mon, Sep 12, 2016 at 8:25 PM, Roberto Ierusalimschy
<[hidden email]> wrote:

>> I also reduced the test case further.
>>
>>
>> #!/usr/bin/env lua
>> lpeg = require "lpeg"
>>
>> p = lpeg.P{
>>   "Line";
>>   P = lpeg.P"a",
>>   Q = lpeg.V"P",
>>   Line = lpeg.P"x" * lpeg.C(lpeg.V"P"),
>> }
>>
>> p:ptree()
>> p:match("xx")
>>
>> [...]
>>
>> I think the rule ordering is different because of string (key) hash
>> randomization.
>
> We can simplify a little more that grammar, using numeric indices:
>
> p = lpeg.P{
>   (lpeg.P"x" * lpeg.C(1-lpeg.V(2))),
>   lpeg.P"a",
>   lpeg.V(2),
> }
>
> The use of numeric indices avoids the string hash randomization; as
> a result, at least in my machine, we get a consistent seg. fault in
> every run.
>
> -- Roberto
>

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Sebastian Cato
I've learnt a great deal today, I didn't think a debugging session
could be so rewarding.

For the test code:

p = lpeg.P{
  lpeg.P"x" * lpeg.C(lpeg.V(2)),
  lpeg.P"a",
  lpeg.V(2),
}

running in gdb, traversing the tree starting at lpcode.c:compile:

ktable: [1 = 2  2 = 2  3 = 1  ]

                 TGrammar
                /       \
       TRule key: 3    [...]
             /
          TSeq
         /    \
 TChar(n=120) TCapture
             /
           TCall key: 1 ( == rule: 2)
           /    \
        [...]  TRule key: 2
                /       \
         TChar(n=97)  TRule key: 0
                       /
                TCall key: 2 ( == rule: 2)
                 /       \
               [...]    TRule key: 2
                         /       \
                   TChar(n=97)   TRule key: 0 ( == rule: 3)
                                 /
                           TCall key: 2 ( == rule: 2)
                            /        \
                         [...]      TRule key: 2
                                     /       \
                               TChar(n=97)  TRule key: 0
                                             /
                                           [...]

etc.

When following sib2(tree) of TRule key 2 (the second rule) in
lpcode.c:hascaptures, we "bleed over" to the third rule, which doesn't
have a key because it's unreferenced. The third rule then references
the second rule which bleeds over to the third rule again, and so on.

We follow sib2(tree) of TRule nodes because we reach the default case
and numsiblings[tree->tag] for TRule is 2, but I don't *think* (I
don't know) that we want to do that for hascaptures, since sib2 will
be another rule and I don't see a case when we want to follow that. I
think we only want to follow sib1.

Here's a patch for it, including the part from the previous e-mail
about not compiling unreferenced rules. It passes lpeg-1.0.0/test.lua,
and no segfault.

--- lpeg-1.0.0/lpcode.c    2016-09-13 13:44:09.909301209 +0200
+++ lpeg-1.0.0_branch/lpcode.c    2016-09-13 13:44:57.250960443 +0200
@@ -135,6 +135,8 @@
       return 1;
     case TCall:
       tree = sib2(tree); goto tailcall;  /* return hascaptures(sib2(tree)); */
+    case TRule:
+      tree = sib1(tree); goto tailcall; /* return hascaptures(sib1(tree)); */
     case TOpenCall: assert(0);
     default: {
       switch (numsiblings[tree->tag]) {
@@ -847,7 +849,9 @@
   jumptohere(compst, firstcall);
   for (rule = sib1(grammar); rule->tag == TRule; rule = sib2(rule)) {
     positions[rulenumber++] = gethere(compst);  /* save rule position */
-    codegen(compst, sib1(rule), 0, NOINST, fullset);  /* code rule */
+    if (rule->key > 0) { /* only generate code for referenced rules */
+      codegen(compst, sib1(rule), 0, NOINST, fullset);  /* code rule */
+    }
     addinstruction(compst, IRet, 0);
   }
   assert(rule->tag == TTrue);

//Sebastian Cato

On Tue, Sep 13, 2016 at 10:15 AM, Sebastian Cato <[hidden email]> wrote:

>>> If I understand things correctly (and I may not), it seems that the
>>> "Q" rule references itself instead of "P" (call key: 2 instead of call
>>> key: 1 in the failing tree above). This  leads to infinite
>>> recursion/stack exhaustion during compilation.
>>
>> This is correct. The 'key' there is the index in the ktable (shown
>> in the beginning of the tree) of the key of the rule being called.
>> In that table, "2 = P", which means that the rule is calling "P"
>> (which is correct).
>
> Ah, so the tree itself is correct?
>
>> [1 = P  2 = P  3 = Line  ]
>> grammar 3
>>   rule n: 0  key: 3
>>     seq
>>       char 'x'
>>       capture cap: 5  key: 0  n: 0
>>         call key: 1
>>   rule n: 1  key: 2
>>     char 'a'
>>   rule n: 2  key: 0
>>     call key: 2
>
> rule 0 says "char 'x' followed by the capture of rule ktable[1]
> rule 1 says "char 'a'"
> rule 2 says "rule ktable[2]"
>
> and rules are zero indexed and the ktable is indexed from 1. Am I
> understanding it correctly?
>
> I thought before that rule n: 2 [...] call key: 2 was a circular call,
> but if it's in fact calling ktable[2] then that doesn't hold up
> anymore. The stack trace also suggests that the problem happens when
> generating code for the capture. For the test case above:
>
> [...]
> #261648 0x00007fd414f8c92c in hascaptures (tree=0x556503112b34) at lpcode.c:144
> #261649 0x00007fd414f8c92c in hascaptures (tree=0x556503112b34) at lpcode.c:144
> #261650 0x00007fd414f8c92c in hascaptures (tree=0x556503112b34) at lpcode.c:144
> #261651 0x00007fd414f8dc6e in codecapture (compst=0x7ffc92cf02f0,
> tree=0x556503112b14, tt=-1, fl=0x7fd414f8fd60 <fullset_>) at
> lpcode.c:720
> #261652 0x00007fd414f8e67b in codegen (compst=0x7ffc92cf02f0,
> tree=0x556503112b14, opt=0, tt=-1, fl=0x7fd414f8fd60 <fullset_>) at
> lpcode.c:907
> #261653 0x00007fd414f8e303 in codegrammar (compst=0x7ffc92cf02f0,
> grammar=0x556503112af4) at lpcode.c:851
> #261654 0x00007fd414f8e6ae in codegen (compst=0x7ffc92cf02f0,
> tree=0x556503112af4, opt=0, tt=-1, fl=0x7fd414f8fd60 <fullset_>) at
> lpcode.c:909
> #261655 0x00007fd414f8e9c2 in compile (L=0x556503109018,
> p=0x556503112ae8) at lpcode.c:979
> [...]
>
> Another (semi-/unrelated) thing. In lpcode.c:codegrammar, code
> generation is performed on all rules. I saw that lptree:verifygrammar
> skips unreferenced rules. Is it safe to skip code generation on
> unreferenced rules too? e.g.,
>
> lpeg = require"lpeg"
>
> lpeg.P{
>   lpeg.V(2),
>   lpeg.P"a",
>   lpeg.P"z" + (1-lpeg.P"q")*lpeg.V(2),
> }:pcode()
>
> generates:
>
> [1 = 2  2 = 2  3 = 1  ]
> 00: call -> 7
> 02: end
> 03: any
> 04: jmp -> 7
> 06: ret
> 07: char 'a'
> 08: ret
> 09: testchar 'z'-> 14
> 11: any
> 12: ret
> 13: any
> 14: set [(00-70)(72-ff)]
> 23: jmp -> 7
> 25: ret
> 26: end
>
> with a patch to lpcode.c:codegrammar [1], it generates:
>
> [1 = 2  2 = 2  3 = 1  ]
> 00: call -> 7
> 02: end
> 03: any
> 04: jmp -> 7
> 06: ret
> 07: char 'a'
> 08: ret
> 09: ret
> 10: end
>
> It's probably possible to skip the extra 'ret' too, but there was an
> assertion on that rules should end with IRet in lpcode.c:correctcalls
> which I didn't want to touch. test.lua passed when I ran the patched
> version against it.
>
> [1]:
> --- lpeg-1.0.0/lpcode.c    2015-09-28 19:40:46.000000000 +0200
> +++ lpeg-1.0.0_branch/lpcode.c    2016-09-13 09:20:07.209067309 +0200
> @@ -847,7 +847,9 @@
>    jumptohere(compst, firstcall);
>    for (rule = sib1(grammar); rule->tag == TRule; rule = sib2(rule)) {
>      positions[rulenumber++] = gethere(compst);  /* save rule position */
> -    codegen(compst, sib1(rule), 0, NOINST, fullset);  /* code rule */
> +    if (rule->key > 0) { /* only generate code for referenced rules */
> +      codegen(compst, sib1(rule), 0, NOINST, fullset);  /* code rule */
> +    }
>      addinstruction(compst, IRet, 0);
>    }
>    assert(rule->tag == TTrue);
>
> On Mon, Sep 12, 2016 at 8:25 PM, Roberto Ierusalimschy
> <[hidden email]> wrote:
>>> I also reduced the test case further.
>>>
>>>
>>> #!/usr/bin/env lua
>>> lpeg = require "lpeg"
>>>
>>> p = lpeg.P{
>>>   "Line";
>>>   P = lpeg.P"a",
>>>   Q = lpeg.V"P",
>>>   Line = lpeg.P"x" * lpeg.C(lpeg.V"P"),
>>> }
>>>
>>> p:ptree()
>>> p:match("xx")
>>>
>>> [...]
>>>
>>> I think the rule ordering is different because of string (key) hash
>>> randomization.
>>
>> We can simplify a little more that grammar, using numeric indices:
>>
>> p = lpeg.P{
>>   (lpeg.P"x" * lpeg.C(1-lpeg.V(2))),
>>   lpeg.P"a",
>>   lpeg.V(2),
>> }
>>
>> The use of numeric indices avoids the string hash randomization; as
>> a result, at least in my machine, we get a consistent seg. fault in
>> every run.
>>
>> -- Roberto
>>

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Roberto Ierusalimschy
In reply to this post by Sebastian Cato
> >> If I understand things correctly (and I may not), it seems that the
> >> "Q" rule references itself instead of "P" (call key: 2 instead of call
> >> key: 1 in the failing tree above). This  leads to infinite
> >> recursion/stack exhaustion during compilation.
> >
> > This is correct. The 'key' there is the index in the ktable (shown
> > in the beginning of the tree) of the key of the rule being called.
> > In that table, "2 = P", which means that the rule is calling "P"
> > (which is correct).
>
> Ah, so the tree itself is correct?
>
> > [1 = P  2 = P  3 = Line  ]
> > grammar 3
> >   rule n: 0  key: 3
> >     seq
> >       char 'x'
> >       capture cap: 5  key: 0  n: 0
> >         call key: 1
> >   rule n: 1  key: 2
> >     char 'a'
> >   rule n: 2  key: 0
> >     call key: 2
>
> rule 0 says "char 'x' followed by the capture of rule ktable[1]
> rule 1 says "char 'a'"
> rule 2 says "rule ktable[2]"

Yes. The tree is fine.


> and rules are zero indexed and the ktable is indexed from 1. Am I
> understanding it correctly?

Yes.

The bug is a stupid one in 'hascaptures'. It follows calls without
checking for cycles. (It is rare because 'hascaptures' is only used for
patterns with fixed length.) It also follows "next rule", to be able
to traverse an entire grammar looking for captures. So, rule 2 calls
rule 1, which has rule 2 as its next rule, etc.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Roberto Ierusalimschy
In reply to this post by Sebastian Cato
> Here's a patch for it, including the part from the previous e-mail
> about not compiling unreferenced rules. It passes lpeg-1.0.0/test.lua,
> and no segfault.

I think you are right about not being necessary to follow the next
rule (sib2 of TRule), but we still need to control recursion in
TCall nodes (as I said in the previous message). Try the following
grammar with your patch:

p = lpeg.C(-lpeg.P{lpeg.P'x' * lpeg.V(1) + lpeg.P'y'})

p:match("xx")

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Sebastian Cato
On Tue, Sep 13, 2016 at 4:32 PM, Roberto Ierusalimschy
<[hidden email]> wrote:
> I think you are right about not being necessary to follow the next
> rule (sib2 of TRule), but we still need to control recursion in
> TCall nodes (as I said in the previous message). Try the following
> grammar with your patch:
>
> p = lpeg.C(-lpeg.P{lpeg.P'x' * lpeg.V(1) + lpeg.P'y'})
>
> p:match("xx")

Yes, I didn't think about that case. Interesting problem from a CS
perspective. I see some similarities with verifygrammar, verifyrule
and the testing for left recursive patterns here as well, maybe it's
possible to combine them?

A couple of options I can think about:

1) the way verifygrammar and verifyrule works, with an int
passed[MAXRULES], and npassed as a counter to keep track of the
visited rules. (I think 'passed' can be short ints, since keys are
short ints, if we want to save stack space).

2) reserve a bit in each TTree node (e.g., the upper most bit of tag)
to use as a 'visited' flag and perform the depth first search. If
encountering a node with the bit set, we know that the graph is
cyclic. Would require clearing the bits later if it's to be done more
than once. Could maybe be used for verifygrammar, verifyrule too.

3) use a strongly connected component (SCC) algorithm, e.g., Tarjan's
SCC. Maybe overkill.

4) use a lua table as a set, with keys being the addresses to the
visited nodes. Perform a DFS, for each TRule node: if the address is
in the set, the graph is cyclic. I'm thinking about the case where we
have TCall -> TRule, I don't know if there's any other nodes that can
lead to cycles.

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Roberto Ierusalimschy
> A couple of options I can think about:
>
> 1) the way verifygrammar and verifyrule works, with an int
> passed[MAXRULES], and npassed as a counter to keep track of the
> visited rules. (I think 'passed' can be short ints, since keys are
> short ints, if we want to save stack space).
>
> 2) reserve a bit in each TTree node (e.g., the upper most bit of tag)
> to use as a 'visited' flag and perform the depth first search. If
> encountering a node with the bit set, we know that the graph is
> cyclic. Would require clearing the bits later if it's to be done more
> than once. Could maybe be used for verifygrammar, verifyrule too.
>
> 3) use a strongly connected component (SCC) algorithm, e.g., Tarjan's
> SCC. Maybe overkill.
>
> 4) use a lua table as a set, with keys being the addresses to the
> visited nodes. Perform a DFS, for each TRule node: if the address is
> in the set, the graph is cyclic. I'm thinking about the case where we
> have TCall -> TRule, I don't know if there's any other nodes that can
> lead to cycles.

I opted for (2), using 'key' itself as a flag. (I change it to 0 when
visiting the call, and restore it before returning.)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Sebastian Cato
Ah, that's better than what I thought of. Out of curiosity, did you
pass along the lua_State* to hascaptures to raise an error in case a
cyclical reference is encountered, or is it sufficient to return zero
at that point? Also, will there be a new release soon?

I'll take this opportunity to thank you for a very good piece of
software, and after today's debugging session and e-mail conversation
I appreciate it even more than I did before. Thank you.

//Sebastian Cato

On Tue, Sep 13, 2016 at 6:33 PM, Roberto Ierusalimschy
<[hidden email]> wrote:

>> A couple of options I can think about:
>>
>> 1) the way verifygrammar and verifyrule works, with an int
>> passed[MAXRULES], and npassed as a counter to keep track of the
>> visited rules. (I think 'passed' can be short ints, since keys are
>> short ints, if we want to save stack space).
>>
>> 2) reserve a bit in each TTree node (e.g., the upper most bit of tag)
>> to use as a 'visited' flag and perform the depth first search. If
>> encountering a node with the bit set, we know that the graph is
>> cyclic. Would require clearing the bits later if it's to be done more
>> than once. Could maybe be used for verifygrammar, verifyrule too.
>>
>> 3) use a strongly connected component (SCC) algorithm, e.g., Tarjan's
>> SCC. Maybe overkill.
>>
>> 4) use a lua table as a set, with keys being the addresses to the
>> visited nodes. Perform a DFS, for each TRule node: if the address is
>> in the set, the graph is cyclic. I'm thinking about the case where we
>> have TCall -> TRule, I don't know if there's any other nodes that can
>> lead to cycles.
>
> I opted for (2), using 'key' itself as a flag. (I change it to 0 when
> visiting the call, and restore it before returning.)
>
> -- Roberto
>

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Roberto Ierusalimschy
> Ah, that's better than what I thought of. Out of curiosity, did you
> pass along the lua_State* to hascaptures to raise an error in case a
> cyclical reference is encountered, or is it sufficient to return zero
> at that point?

It cannot raise errors. A cyclical reference is OK; it will happen with
any recursive grammar (and the raison d'ĂȘtre for grammars in Lpeg is
to write recursive patterns).

As I said, the reason this bug is rare is because 'hascaptures' is
called only after some other tests.


> Also, will there be a new release soon?

Probably yes, at least to fix this bug.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] intermittent stack exhaustion

Sean Conner
It was thus said that the Great Roberto Ierusalimschy once stated:

> > Ah, that's better than what I thought of. Out of curiosity, did you
> > pass along the lua_State* to hascaptures to raise an error in case a
> > cyclical reference is encountered, or is it sufficient to return zero
> > at that point?
>
> It cannot raise errors. A cyclical reference is OK; it will happen with
> any recursive grammar (and the raison d'ĂȘtre for grammars in Lpeg is
> to write recursive patterns).
>
> As I said, the reason this bug is rare is because 'hascaptures' is
> called only after some other tests.
>
>
> > Also, will there be a new release soon?
>
> Probably yes, at least to fix this bug.

  Any information about this fix?  Someone brought it up with respect to my
email parser:

https://github.com/spc476/LPeg-Parsers/issues/2#issuecomment-272256013

  And yes, it's still an issue.

  -spc