LPEG: P(fct) seems not to consume input

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

LPEG: P(fct) seems not to consume input

Dirk Laurie-2
I define a function to be used with LPEG that simply returns the first
character. It _cannot_ match the empty string:

function fct(str)
  if #str==0 then return false
  else return 2,str:sub(1,1)
  end
end

I turn it into an LPEG pattern:

lpeg.version() --> 1.0.0
patt=lpeg.P(fct)

"patt" is supposed to do exactly the same as lepg.C(1). It works OK on its own:
  patt:match""  --> nil
  patt:match"abc" --> a

But I can't make it match twice:
  (patt*patt):match"abc" --> a a [expected: a b]

This seems to contradict the manual's statement that: "If the call
returns a number, the match succeeds and the returned number becomes
the new current position."

Reply | Threaded
Open this post in threaded view
|

Re: LPEG: P(fct) seems not to consume input

Sean Conner
It was thus said that the Great Dirk Laurie once stated:
> I define a function to be used with LPEG that simply returns the first
> character. It _cannot_ match the empty string:
>
> function fct(str)
>   if #str==0 then return false
>   else return 2,str:sub(1,1)
>   end
> end

The manual states:

        lpeg.P (value)

                ...

                If the argument is a function, returns a pattern equivalent
                to a match-time capture over the empty string.

And for the match-time capture:

        lpeg.Cmt(patt, function)

                Creates a match-time capture. Unlike all other captures,
                this one is evaluated immediately when a match occurs. It
                forces the immediate evaluation of all its nested captures
                and then calls function.

                The given function gets as arguments the entire subject, the
                current position (after the match of patt), plus any capture
                values produced by patt.

So the function should be:

        function fct(subject,position,capture)
          if #subject == 0 then
            return false
          else
            return position,subject:sub(position-1,position-1)
          end
        end

> I turn it into an LPEG pattern:
>
> lpeg.version() --> 1.0.0
> patt=lpeg.P(fct)

  You aren't giving it any pattern to match, so I suspect it's the same as
if you did:

        patt = lpeg.Cmt(lpeg.P"" * fct)

> "patt" is supposed to do exactly the same as lepg.C(1). It works OK on its own:
>   patt:match""  --> nil
>   patt:match"abc" --> a
>
> But I can't make it match twice:
>   (patt*patt):match"abc" --> a a [expected: a b]

  Because the pattern, the empty string, is being matched twice by the two
calls.  If you change it to:

        patt = lpeg.P(lpeg.P(1) * fct)

it will work.

> This seems to contradict the manual's statement that: "If the call
> returns a number, the match succeeds and the returned number becomes
> the new current position."

  Also, your original code is always returning the first character,
regardless of where the match happens.

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: LPEG: P(fct) seems not to consume input

Dirk Laurie-2
Op Wo., 31 Okt. 2018 om 21:49 het Sean Conner <[hidden email]> geskryf:

>
> It was thus said that the Great Dirk Laurie once stated:
> > I define a function to be used with LPEG that simply returns the first
> > character. It _cannot_ match the empty string:
> >
> > function fct(str)
> >   if #str==0 then return false
> >   else return 2,str:sub(1,1)
> >   end
> > end

> So the function should be:
>
>         function fct(subject,position,capture)
>           if #subject == 0 then
>             return false
>           else
>             return position,subject:sub(position-1,position-1)
>           end
>         end

Aha! I don't get *str+pos, I get *str,pos.

Let me try that in my notation:

function fct(str,pos)
  if #str<pos then return false
  else return pos+1, str:sub(pos,pos)
  end
end

patt=lpeg.P(fct)
(patt^-4):match"abc"  --> a b c [as expected, no fourth value]

But:
(patt^0):match"abc" --> stdin:1: loop body may accept empty string

I don't understand why I get that.

Reply | Threaded
Open this post in threaded view
|

Re: LPEG: P(fct) seems not to consume input

Sean Conner
It was thus said that the Great Dirk Laurie once stated:

> Op Wo., 31 Okt. 2018 om 21:49 het Sean Conner <[hidden email]> geskryf:
> >
> > It was thus said that the Great Dirk Laurie once stated:
> > > I define a function to be used with LPEG that simply returns the first
> > > character. It _cannot_ match the empty string:
> > >
> > > function fct(str)
> > >   if #str==0 then return false
> > >   else return 2,str:sub(1,1)
> > >   end
> > > end
>
> > So the function should be:
> >
> >         function fct(subject,position,capture)
> >           if #subject == 0 then
> >             return false
> >           else
> >             return position,subject:sub(position-1,position-1)
> >           end
> >         end
>
> Aha! I don't get *str+pos, I get *str,pos.
>
> Let me try that in my notation:
>
> function fct(str,pos)
>   if #str<pos then return false
>   else return pos+1, str:sub(pos,pos)
>   end
> end

  function fct(str,pos)
    if #str < pos then
      return false
    else
      return pos + 1 , str:sub(pos,pos)
    end
  end

  There, fixed that for you 8-P

> patt=lpeg.P(fct)
> (patt^-4):match"abc"  --> a b c [as expected, no fourth value]
>
> But:
> (patt^0):match"abc" --> stdin:1: loop body may accept empty string
>
> I don't understand why I get that.

  I don't either.  When I get that error, I start making changes to the code
until the error goes away and the code does what I want.  You could try:

        (patt^1):match"abc" + lpeg.Cc(false)

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: LPEG: P(fct) seems not to consume input

Albert Chan
In reply to this post by Sean Conner

On Oct 31, 2018, at 3:48 PM, Sean Conner <[hidden email]> wrote:
>  
>  Because the pattern, the empty string, is being matched twice by the two
> calls.  If you change it to:
>
>    patt = lpeg.P(lpeg.P(1) * fct)
>
> it will work.

With this patt, the check for empty string in fct is not needed
Since P(fct) does not capture anything, argument capture is not needed.

patt is just a 1 liner:

patt = 1 * P(function(s, i) return i, s:sub(i-1, i-1) end)


Reply | Threaded
Open this post in threaded view
|

Re: LPEG: P(fct) seems not to consume input

Dirk Laurie-2
Op Wo., 31 Okt. 2018 om 23:16 het Albert Chan <[hidden email]> geskryf:

>
>
> On Oct 31, 2018, at 3:48 PM, Sean Conner <[hidden email]> wrote:
> >
> >  Because the pattern, the empty string, is being matched twice by the two
> > calls.  If you change it to:
> >
> >    patt = lpeg.P(lpeg.P(1) * fct)
> >
> > it will work.
>
> With this patt, the check for empty string in fct is not needed
> Since P(fct) does not capture anything, argument capture is not needed.

I'll be using P(fct)/action in the application, which translates a
script to Lua.

> patt is just a 1 liner:
>
> patt = 1 * P(function(s, i) return i, s:sub(i-1, i-1) end)

This is a toy pattern, illustrating the problem I had at first, which
Sean has cleared up for me. The actual pattern I wish to capture is
based on a Lua pattern involving "%b", which in LPEG requires
techniques I have not mastered.

Reply | Threaded
Open this post in threaded view
|

Re: LPEG: P(fct) seems not to consume input

Sean Conner
It was thus said that the Great Dirk Laurie once stated:
>
> This is a toy pattern, illustrating the problem I had at first, which
> Sean has cleared up for me. The actual pattern I wish to capture is
> based on a Lua pattern involving "%b", which in LPEG requires
> techniques I have not mastered.

  There was a thread kind of about this last year on this list.  Start here
for a direct reference to %b and LPeg:

        http://lua-users.org/lists/lua-l/2017-10/msg00126.html

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: LPEG: P(fct) seems not to consume input

Albert Chan
In reply to this post by Sean Conner


On Oct 31, 2018, at 4:43 PM, Sean Conner <[hidden email]> wrote:

>
>> patt=lpeg.P(fct)
>> (patt^-4):match"abc"  --> a b c [as expected, no fourth value]
>>
>> But:
>> (patt^0):match"abc" --> stdin:1: loop body may accept empty string
>>
>> I don't understand why I get that.
>
>  I don't either.  When I get that error, I start making changes to the code
> until the error goes away and the code does what I want.  You could try:
>
>    (patt^1):match"abc" + lpeg.Cc(false)
>
>  -spc

that were a lpeg safety feature, by checking fixedlen(patt) > 0
Since patt = P(fct) is matching pattern "", fixedlen = 0

All this check is to avoid patt^n get into infinite loops.

P(fct) does not know fct will skip forward.
To be safe, lpeg assumed no skipping.
To be double safe, P(fct) is not allowed to go "backward".

patt^1 + Cc(false) will not compile.
Possible infinite loop situation remained.






Reply | Threaded
Open this post in threaded view
|

Re: LPEG: P(fct) seems not to consume input

Dirk Laurie-2
Op Do., 1 Nov. 2018 om 00:31 het Albert Chan <[hidden email]> geskryf:

> that were a lpeg safety feature, by checking fixedlen(patt) > 0
> Since patt = P(fct) is matching pattern "", fixedlen = 0
>
> All this check is to avoid patt^n get into infinite loops.
>
> P(fct) does not know fct will skip forward.

No, it doesn't —  but it is obvious and easy for P(fct) to call
fct("",1) and check that "false" is returned, and taking the message
"may accept empty string" literally, that's all that the xhwxk should
be worried about.

I have not yet had the temerity to look into the LPEG source, but this
time, I might.

Reply | Threaded
Open this post in threaded view
|

Re: LPEG: P(fct) seems not to consume input

Dirk Laurie-2
In reply to this post by Sean Conner
Op Wo., 31 Okt. 2018 om 23:58 het Sean Conner <[hidden email]> geskryf:

>
> It was thus said that the Great Dirk Laurie once stated:
> >
> > This is a toy pattern, illustrating the problem I had at first, which
> > Sean has cleared up for me. The actual pattern I wish to capture is
> > based on a Lua pattern involving "%b", which in LPEG requires
> > techniques I have not mastered.
>
>   There was a thread kind of about this last year on this list.  Start here
> for a direct reference to %b and LPeg:
>
>         http://lua-users.org/lists/lua-l/2017-10/msg00126.html

Memory, n.
  The faculty by which a lua-l member remains aware of past own contributions.

I can still remember clearly mine :-), an opinion that has not changed.

    http://lua-users.org/lists/lua-l/2017-10/msg00125.html

But thanks for reminding me of your demonstration of %b, and
especially that it could also do multibyte delimiters. It will solve
also the more tricky problem in my actual application.

Reply | Threaded
Open this post in threaded view
|

Re: LPEG: P(fct) seems not to consume input

Albert Chan
In reply to this post by Dirk Laurie-2

> On Nov 1, 2018, at 1:02 AM, Dirk Laurie <[hidden email]> wrote:
>> All this check is to avoid patt^n get into infinite loops.
>>
>> P(fct) does not know fct will skip forward.
>
> No, it doesn't —  but it is obvious and easy for P(fct) to call
> fct("",1) and check that "false" is returned, and taking the message
> "may accept empty string" literally, that's all that the xhwxk should
> be worried about.

That will be hard to do, checking not just position 1, but all others.
Also, fct(s,i) first argument is the string to be matched, not ""
It is possible fct() might stop advancing for some (s,i).

I had a similar issue trying to patch Lpeg to go backward.
https://github.com/achan001/LPeg-anywhere

The solution is to assume moving back n positions also have fixedlen of -n.
Infinite loops can happen, but you gain flexibility with matching.

lua> lpeg = require 'lpeg'   -- my patched lpeg
lua>  -- 3 steps forward, 2 steps back, fixedlen = 3-2 = 1
lua> patt = lpeg.C(3) * lpeg.B(-2)

lua> lpeg.match(patt^0, '123456')
123  234  345  456