LPEG output

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

LPEG output

Kees Jan Hermans
Hi,

I've just finished reading Mr Ierusalemschi's 2008 paper on Lua
Parsing Expression Grammars, and I'm very enthusiastic about the
possibilities of PEG in general. Especially the intermediate language
that it produces (because I want to play with creating an engine that
executes that code). And there lies a bit of a problem for me: I
cannot seem to make the traditional PEG libraries output the
intermediate language, for example in the form that is also used in
the paper (the 'ASM' of the language, with the instructions 'Char',
'Jump', etc). Is there a way to use the PEG library much more
modularly, in a way that, for example, a C compiler can generally also
output ASM? Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: LPEG output

Roberto Ierusalimschy
> I've just finished reading Mr Ierusalemschi's 2008 paper on Lua
> Parsing Expression Grammars, and I'm very enthusiastic about the
> possibilities of PEG in general. Especially the intermediate language
> that it produces (because I want to play with creating an engine that
> executes that code). And there lies a bit of a problem for me: I
> cannot seem to make the traditional PEG libraries output the
> intermediate language, for example in the form that is also used in
> the paper (the 'ASM' of the language, with the instructions 'Char',
> 'Jump', etc). Is there a way to use the PEG library much more
> modularly, in a way that, for example, a C compiler can generally also
> output ASM? Thanks!

The current version (now at github [1]) has a method to print
these bytecodes. ('pattern:pcode()', when compiled with LPEG_DEBUG
defined.) However, there is no explicit way to export or have
other access to the code. But it shouldn't be difficult for another
library to access that code inside the corresponding userdata.
(See the print function 'lp_printcode' for inspiration.)

Please keep in mind that the current instruction set is different from
the one used back there in that paper, but the general idea is the same.

[1] https://github.com/roberto-ieru/LPeg

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: LPEG output

Kees Jan Hermans
Hi Roberto,

Thank you for the quick and to the point response. Also very nice to
speak with the Main Man himself ;-) Follow-up question: is the LPEG
intermediate language independently developed and documented (so that
someone like me, who wants to write a different execution engine for
it, can rely on change announcements and documentation)? If so, where?

On Wed, 19 Jun 2019 at 14:31, Roberto Ierusalimschy
<[hidden email]> wrote:

>
> > I've just finished reading Mr Ierusalemschi's 2008 paper on Lua
> > Parsing Expression Grammars, and I'm very enthusiastic about the
> > possibilities of PEG in general. Especially the intermediate language
> > that it produces (because I want to play with creating an engine that
> > executes that code). And there lies a bit of a problem for me: I
> > cannot seem to make the traditional PEG libraries output the
> > intermediate language, for example in the form that is also used in
> > the paper (the 'ASM' of the language, with the instructions 'Char',
> > 'Jump', etc). Is there a way to use the PEG library much more
> > modularly, in a way that, for example, a C compiler can generally also
> > output ASM? Thanks!
>
> The current version (now at github [1]) has a method to print
> these bytecodes. ('pattern:pcode()', when compiled with LPEG_DEBUG
> defined.) However, there is no explicit way to export or have
> other access to the code. But it shouldn't be difficult for another
> library to access that code inside the corresponding userdata.
> (See the print function 'lp_printcode' for inspiration.)
>
> Please keep in mind that the current instruction set is different from
> the one used back there in that paper, but the general idea is the same.
>
> [1] https://github.com/roberto-ieru/LPeg
>
> -- Roberto
>

Reply | Threaded
Open this post in threaded view
|

Re: LPEG output

Kees Jan Hermans
In reply to this post by Roberto Ierusalimschy
Also (I'm unfamiliar with Lua native module building): do I build your
github code using something other than:

$ export C_INCLUDE_PATH=/usr/include/lua5.1
$ make && sudo make install

In other words: is there a specific Lua toolchain related command that
I can fire off inside that directory that will automatically build and
install it in the Lua world?

On Wed, 19 Jun 2019 at 14:31, Roberto Ierusalimschy
<[hidden email]> wrote:

>
> > I've just finished reading Mr Ierusalemschi's 2008 paper on Lua
> > Parsing Expression Grammars, and I'm very enthusiastic about the
> > possibilities of PEG in general. Especially the intermediate language
> > that it produces (because I want to play with creating an engine that
> > executes that code). And there lies a bit of a problem for me: I
> > cannot seem to make the traditional PEG libraries output the
> > intermediate language, for example in the form that is also used in
> > the paper (the 'ASM' of the language, with the instructions 'Char',
> > 'Jump', etc). Is there a way to use the PEG library much more
> > modularly, in a way that, for example, a C compiler can generally also
> > output ASM? Thanks!
>
> The current version (now at github [1]) has a method to print
> these bytecodes. ('pattern:pcode()', when compiled with LPEG_DEBUG
> defined.) However, there is no explicit way to export or have
> other access to the code. But it shouldn't be difficult for another
> library to access that code inside the corresponding userdata.
> (See the print function 'lp_printcode' for inspiration.)
>
> Please keep in mind that the current instruction set is different from
> the one used back there in that paper, but the general idea is the same.
>
> [1] https://github.com/roberto-ieru/LPeg
>
> -- Roberto
>

Reply | Threaded
Open this post in threaded view
|

Re: LPEG output

Kees Jan Hermans
Ok. Never mind on this question. I've got it to work.

On Thu, 20 Jun 2019 at 12:09, Kees Jan Hermans <[hidden email]> wrote:

>
> Also (I'm unfamiliar with Lua native module building): do I build your
> github code using something other than:
>
> $ export C_INCLUDE_PATH=/usr/include/lua5.1
> $ make && sudo make install
>
> In other words: is there a specific Lua toolchain related command that
> I can fire off inside that directory that will automatically build and
> install it in the Lua world?
>
> On Wed, 19 Jun 2019 at 14:31, Roberto Ierusalimschy
> <[hidden email]> wrote:
> >
> > > I've just finished reading Mr Ierusalemschi's 2008 paper on Lua
> > > Parsing Expression Grammars, and I'm very enthusiastic about the
> > > possibilities of PEG in general. Especially the intermediate language
> > > that it produces (because I want to play with creating an engine that
> > > executes that code). And there lies a bit of a problem for me: I
> > > cannot seem to make the traditional PEG libraries output the
> > > intermediate language, for example in the form that is also used in
> > > the paper (the 'ASM' of the language, with the instructions 'Char',
> > > 'Jump', etc). Is there a way to use the PEG library much more
> > > modularly, in a way that, for example, a C compiler can generally also
> > > output ASM? Thanks!
> >
> > The current version (now at github [1]) has a method to print
> > these bytecodes. ('pattern:pcode()', when compiled with LPEG_DEBUG
> > defined.) However, there is no explicit way to export or have
> > other access to the code. But it shouldn't be difficult for another
> > library to access that code inside the corresponding userdata.
> > (See the print function 'lp_printcode' for inspiration.)
> >
> > Please keep in mind that the current instruction set is different from
> > the one used back there in that paper, but the general idea is the same.
> >
> > [1] https://github.com/roberto-ieru/LPeg
> >
> > -- Roberto
> >

Reply | Threaded
Open this post in threaded view
|

Re: LPEG output

Kees Jan Hermans
In reply to this post by Roberto Ierusalimschy
Hi Roberto (and everyone else in Lua PEG),

A couple of questions:
- the white paper says that a 'commit' pops the first element off the
stack, which it states must be a 'choice' stack element. However, if
there was a 'call' instruction in between, the 'choice' and the
'commit' or 'fail' instructions, there will be another type of element
on the stack.
- what does the 'closeruntime' instruction do?

On Wed, 19 Jun 2019 at 14:31, Roberto Ierusalimschy
<[hidden email]> wrote:

>
> > I've just finished reading Mr Ierusalemschi's 2008 paper on Lua
> > Parsing Expression Grammars, and I'm very enthusiastic about the
> > possibilities of PEG in general. Especially the intermediate language
> > that it produces (because I want to play with creating an engine that
> > executes that code). And there lies a bit of a problem for me: I
> > cannot seem to make the traditional PEG libraries output the
> > intermediate language, for example in the form that is also used in
> > the paper (the 'ASM' of the language, with the instructions 'Char',
> > 'Jump', etc). Is there a way to use the PEG library much more
> > modularly, in a way that, for example, a C compiler can generally also
> > output ASM? Thanks!
>
> The current version (now at github [1]) has a method to print
> these bytecodes. ('pattern:pcode()', when compiled with LPEG_DEBUG
> defined.) However, there is no explicit way to export or have
> other access to the code. But it shouldn't be difficult for another
> library to access that code inside the corresponding userdata.
> (See the print function 'lp_printcode' for inspiration.)
>
> Please keep in mind that the current instruction set is different from
> the one used back there in that paper, but the general idea is the same.
>
> [1] https://github.com/roberto-ieru/LPeg
>
> -- Roberto
>

Reply | Threaded
Open this post in threaded view
|

Re: LPEG output

Roberto Ierusalimschy
> A couple of questions:
> - the white paper says that a 'commit' pops the first element off the
> stack, which it states must be a 'choice' stack element. However, if
> there was a 'call' instruction in between, the 'choice' and the
> 'commit' or 'fail' instructions, there will be another type of element
> on the stack.

There cannot be a 'call' instruction in between. A commit is always in
the same rule that generated the corresponding choice; so, any call
between a choice and its corresponding commit must have returned.


> - what does the 'closeruntime' instruction do?

It closes a runtime capture, which involves calling the corresponding
Lua function. (This is by far the most complex instruction.)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: LPEG output

Kees Jan Hermans
"There cannot be a 'call' instruction in between. A commit is always in
the same rule that generated the corresponding choice; so, any call
between a choice and its corresponding commit must have returned."

Ok. But what if the failure happens during the call? Then the cleaning
up of the stack encounters the 'call' stack element first.
So, instructions encountered are: 1) choice, 2) call, 3) commit, and
the failure (no match) happens between 2 and 3.

On Tue, 25 Jun 2019 at 17:53, Roberto Ierusalimschy
<[hidden email]> wrote:

>
> > A couple of questions:
> > - the white paper says that a 'commit' pops the first element off the
> > stack, which it states must be a 'choice' stack element. However, if
> > there was a 'call' instruction in between, the 'choice' and the
> > 'commit' or 'fail' instructions, there will be another type of element
> > on the stack.
>
> There cannot be a 'call' instruction in between. A commit is always in
> the same rule that generated the corresponding choice; so, any call
> between a choice and its corresponding commit must have returned.
>
>
> > - what does the 'closeruntime' instruction do?
>
> It closes a runtime capture, which involves calling the corresponding
> Lua function. (This is by far the most complex instruction.)
>
> -- Roberto
>

Reply | Threaded
Open this post in threaded view
|

Re: LPEG output

Roberto Ierusalimschy
> "There cannot be a 'call' instruction in between. A commit is always in
> the same rule that generated the corresponding choice; so, any call
> between a choice and its corresponding commit must have returned."
>
> Ok. But what if the failure happens during the call? Then the cleaning
> up of the stack encounters the 'call' stack element first.
> So, instructions encountered are: 1) choice, 2) call, 3) commit, and
> the failure (no match) happens between 2 and 3.

Then the commit is not executed. You asked about commit, not about fail.
A commit always find the choice at the top of the stack.

A fail (unlike a commit) must remove any call in the stack before
getting to a choice, which may not exist. This is all explained in the
paper.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: LPEG output

Kees Jan Hermans
You are completely right; the happy flow always has the commit see the
backtrack as the top element on the stack; it's the failures that
might see other elements in between. Other question: I see in the code
(llvm.c):

      case IChar: {
        if ((byte)*s == p->i.aux && s < e) { p++; s++; }
        else goto fail;
        continue;
      }
      case ITestChar: {
        if ((byte)*s == p->i.aux && s < e) p += 2;
        else p += getoffset(p);
        continue;
      }

And in the white paper:

TestChar char label
Checks whether the character in the current subject position is equal
to char. If it is equal, the machine consumes the current character
and goes to the next instruction. Otherwise it jumps to label.

I've just looked at it, but is the 'testchar' behaviour still the same
as in the paper? The code seems to suggest that 'testchar' jumps two
instructions on a match?

On Tue, 25 Jun 2019 at 22:27, Roberto Ierusalimschy
<[hidden email]> wrote:

>
> > "There cannot be a 'call' instruction in between. A commit is always in
> > the same rule that generated the corresponding choice; so, any call
> > between a choice and its corresponding commit must have returned."
> >
> > Ok. But what if the failure happens during the call? Then the cleaning
> > up of the stack encounters the 'call' stack element first.
> > So, instructions encountered are: 1) choice, 2) call, 3) commit, and
> > the failure (no match) happens between 2 and 3.
>
> Then the commit is not executed. You asked about commit, not about fail.
> A commit always find the choice at the top of the stack.
>
> A fail (unlike a commit) must remove any call in the stack before
> getting to a choice, which may not exist. This is all explained in the
> paper.
>
> -- Roberto
>

Reply | Threaded
Open this post in threaded view
|

Re: LPEG output

Kees Jan Hermans
Ok. Never mind that. I see now that any test* instruction (as well as
choice and call, etc) keep a single instruction space ahead of them.
So p += 2 actually means 'jump to the next instruction'. But am I
correct in assuming that, contrary to the white paper, the test*
functions never consume input?

On Thu, 27 Jun 2019 at 15:03, Kees Jan Hermans <[hidden email]> wrote:

>
> You are completely right; the happy flow always has the commit see the
> backtrack as the top element on the stack; it's the failures that
> might see other elements in between. Other question: I see in the code
> (llvm.c):
>
>       case IChar: {
>         if ((byte)*s == p->i.aux && s < e) { p++; s++; }
>         else goto fail;
>         continue;
>       }
>       case ITestChar: {
>         if ((byte)*s == p->i.aux && s < e) p += 2;
>         else p += getoffset(p);
>         continue;
>       }
>
> And in the white paper:
>
> TestChar char label
> Checks whether the character in the current subject position is equal
> to char. If it is equal, the machine consumes the current character
> and goes to the next instruction. Otherwise it jumps to label.
>
> I've just looked at it, but is the 'testchar' behaviour still the same
> as in the paper? The code seems to suggest that 'testchar' jumps two
> instructions on a match?
>
> On Tue, 25 Jun 2019 at 22:27, Roberto Ierusalimschy
> <[hidden email]> wrote:
> >
> > > "There cannot be a 'call' instruction in between. A commit is always in
> > > the same rule that generated the corresponding choice; so, any call
> > > between a choice and its corresponding commit must have returned."
> > >
> > > Ok. But what if the failure happens during the call? Then the cleaning
> > > up of the stack encounters the 'call' stack element first.
> > > So, instructions encountered are: 1) choice, 2) call, 3) commit, and
> > > the failure (no match) happens between 2 and 3.
> >
> > Then the commit is not executed. You asked about commit, not about fail.
> > A commit always find the choice at the top of the stack.
> >
> > A fail (unlike a commit) must remove any call in the stack before
> > getting to a choice, which may not exist. This is all explained in the
> > paper.
> >
> > -- Roberto
> >