lpeg: bug or intentional? `… * (Cmt(0,f1) + Cmt(0,f2)) * …` doesn't call f2

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

lpeg: bug or intentional? `… * (Cmt(0,f1) + Cmt(0,f2)) * …` doesn't call f2

nobody
Hey everyone,

this behavior has surprised me a bit.  I was trying to use this to keep
track of the nesting of structures (to produce a backtrace in terms of
grammar rules when stuff fails to match), but it doesn't work as
expected…  Simplified example:

(checked in versions 1.0.1 and 1.0.2)

     lpeg = require "lpeg" ; C, Cmt, P = lpeg.C, lpeg.Cmt, lpeg.P
     depth = 0
     enter = Cmt( 0, function(_,p)  depth = depth + 1 ; return p  end )
     leave = Cmt( 0, function(_,p)  depth = depth - 1 ; return p  end )

     -- wrap pattern with enter/leave (pattern simplified to "foo" here)
     -- leave should always ping once (on success or by backtracking)
     p = (enter + leave) * C"foo" * leave

     -- tests
     p:match "foo" ; print( depth ) --> 0 - ok!
     p:match "bar" ; print( depth ) --> 1 - huh?!?


In other words, `leave` doesn't get called when backtracking.  As a
further check, manually rewriting this as

     q = (Cmt( 1, function( _,p,c )
                   depth = depth + 1
                   if c == "f" then  return p  end
                 end) + leave) * (P"oo"/"foo") * leave

(shifting the test for "f" into the Cmt, thus turning the 0+0 choice
into a 1+0 choice) results in the expected output

     q:match "foo" ; print( depth ) --> 0
     q:match "bar" ; print( depth ) --> 0

which makes me believe that the version at the top correctly describes
the intended behavior.


Looking at the debug output, the last step of p:match "bar" results in
varying huge numbers for the PC:

s: |bar| stck:0, dyncaps:0, caps:0  -17672968042060: giveup
s: |bar| stck:0, dyncaps:0, caps:0  -17849526133724: giveup
s: |bar| stck:0, dyncaps:0, caps:0  -17844495977256: giveup
etc.

(They stay constant over repeated runs against the same pattern, but
change every time you re-create `p` or re-start Lua.  So maybe this is
some uninitialized / misinterpreted memory problem?)


Irrespective of whether this is a bug or not, does anyone know of a way
to get an equivalent of the `(enter+leave)*pat*leave` pattern to work?

-- nobody


Full debug info: (nothing else below)

The :ptree() of p seems to look sane (as far as I can tell…):

     [1 = function  2 = function  3 = function  ]
     seq
       choice
         run-time
           true
         run-time
           true
       seq
         capture kind: 'simple'  key: 0
           seq
             char 'f'
             seq
               char 'o'
               char 'o'
         run-time
           true

The :pcode() for reference to accompany the next block:

     [1 = function  2 = function  3 = function  ]
     00: testany -> 9
     02: choice -> 9
     04: opencapture group (idx = 1)
     05: any
     06: closeruntime
     07: commit -> 11
     09: opencapture group (idx = 2)
     10: closeruntime
     11: char 'f'
     12: char 'o'
     13: char 'o'
     14: fullcapture simple (size = 3)  (idx = 0)
     15: opencapture group (idx = 3)
     16: closeruntime
     17: end

The -DDEBUG trace: (with ((comments)) added)

((the first part looks ok i guess – do the `enter`))
-------------------------------------
>======
=======
s: |bar| stck:1, dyncaps:0, caps:0  00: choice -> 6
-------------------------------------
>======
=======
s: |bar| stck:2, dyncaps:0, caps:0  02: opencapture group (idx = 1)
-------------------------------------
>======
group (idx: 1 - size: 0) -> 0x5563448d07a8
=======
s: |bar| stck:2, dyncaps:0, caps:1  03: closeruntime
-------------------------------------
>======
=======
s: |bar| stck:2, dyncaps:0, caps:0  04: commit -> 8

((now f doesn't match b, so backtrack… - ok))
-------------------------------------
>======
=======
s: |bar| stck:1, dyncaps:0, caps:0  08: char 'f'
**FAIL**

((huge (random!) negative number for PC - this looks fishy))
-------------------------------------
>======
=======
s: |bar| stck:0, dyncaps:0, caps:0  -17672968042060: giveup

Reply | Threaded
Open this post in threaded view
|

Re: lpeg: bug or intentional? `… * (Cmt(0,f1) + Cmt(0,f2)) * …` doesn't call f2

Sean Conner
It was thus said that the Great nobody once stated:

> Hey everyone,
>
> this behavior has surprised me a bit.  I was trying to use this to keep
> track of the nesting of structures (to produce a backtrace in terms of
> grammar rules when stuff fails to match), but it doesn't work as
> expected…  Simplified example:
>
> (checked in versions 1.0.1 and 1.0.2)
>
>     lpeg = require "lpeg" ; C, Cmt, P = lpeg.C, lpeg.Cmt, lpeg.P
>     depth = 0
>     enter = Cmt( 0, function(_,p)  depth = depth + 1 ; return p  end )
>     leave = Cmt( 0, function(_,p)  depth = depth - 1 ; return p  end )
>
>     -- wrap pattern with enter/leave (pattern simplified to "foo" here)
>     -- leave should always ping once (on success or by backtracking)
>     p = (enter + leave) * C"foo" * leave
>
>     -- tests
>     p:match "foo" ; print( depth ) --> 0 - ok!
>     p:match "bar" ; print( depth ) --> 1 - huh?!?

  For p:match("bar"), the enter rule will *always* match, because it matches
the empty string; leave will never be called at all.  So depth becomes 1.
Then the rule C"foo" is attempted, and since it fails, the entire match
fails at that point, so leave (which is an "and this pattern") is never
called.  To properly keep track of level, you might have to do:

        p = C"foo" * enter * rest_of_foo * leave

If you do:

        p = enter * C"foo" * rest_of_foo * leave
          + leave

depth will become out of sync with the actual depth level because you are
leaving a level you never entered.  You could simplify this with:

        function enter(pattern)
          return pattern
               * Cmt(0,function(_,p) depth = depth + 1 return p end )
        end

        p = enter(C"foo") * rest_of_grammar * leave

or maybe even

        function adjust_depth(delta)
          return Cmt(0,function(_,p) depth = depth + delta return p end )
        end

        function deeper(start_pattern,rest_pattern)
          return start_pattern
               * adjust_depth(1)
               * rest_pattern)
               * adjust_depth(-1)
        end

        p = deeper(C"foo",rest_of_foo)
          + deeper(C"bar",rest_of_bar)

  -spc (Or maybe use LPEG to to parse a grammar that tracks the depth of a
        gammar ... )

Reply | Threaded
Open this post in threaded view
|

Re: lpeg: bug or intentional? `… * (Cmt(0,f1) + Cmt(0,f2)) * …` doesn't call f2

nobody
On 08/04/2019 03.07, Sean Conner wrote:

> It was thus said that the Great nobody once stated:
>>      -- wrap pattern with enter/leave (pattern simplified to "foo" here)
>>      -- leave should always ping once (on success or by backtracking)
>>      p = (enter + leave) * C"foo" * leave
>
>    For p:match("bar"), the enter rule will *always* match, because it matches
> the empty string; leave will never be called at all.  So depth becomes 1.
> Then the rule C"foo" is attempted, and since it fails, the entire match
> fails at that point, so leave (which is an "and this pattern") is never
> called.  To properly keep track of level, you might have to do:
>
> p = C"foo" * enter * rest_of_foo * leave
>
> If you do:
>
> p = enter * C"foo" * rest_of_foo * leave
>  + leave

Oh, right.  As a sub-pattern, (enter+leave) succeeded via the first
branch, and there's no backtracking *into* that sub-pattern – that was
my misconception.  So the correct thing is

     p = enter * (C"foo" * leave + leave)

or (de-simplified a bit)

     p = enter * (C"foo" * leave + fail_hard)

and things work as expected now.


That still left me wondering about those huge negative numbers in the
debug output.  Digging further, they seem to be produced by any failing
match, even by something as simple as

     require"lpeg".P"x":match""

and looking at the source reveals that the PC is computed by pointer
difference to the pcode base address, but `giveup` is actually a

     static const Instruction giveup = {{IGiveup, 0, 0}};

i.e. located in a completely different place, hence the big difference.

So all is well – thanks for the help!

-- nobody