[ANN] LPeg 0.11

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[ANN] LPeg 0.11

Roberto Ierusalimschy
What is LPeg? LPeg is a new (well, not that new now) pattern-matching
library for Lua, based on Parsing Expression Grammars (PEGs).

What is new in version 0.11?

  + complete reimplementation of the code generator
  + new syntax for table captures
  + new functions in module 're'
  + other small improvements

Where to get it?

  http://www.inf.puc-rio.br/~roberto/lpeg/lpeg-0.11.tar.gz

(The implementation of LPeg became too complex. I spent a long time
trying to document it so that I could understand it again, but in the
end I had to give up and accept that I could not maintain it as it
was.  So, I reimplemented the code generator [the real complicated part]
almost from scratch.  The new implementation is somewhat larger than the
old one, because now it first creates an AST for each expression and
later compiles the AST into opcodes for the virtual machine. But now I
can understand the optimizations.)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Daurnimator
On 27 March 2013 15:23, Roberto Ierusalimschy <[hidden email]> wrote:
> What is new in version 0.11?
>   + new syntax for table captures

I quickly scanned over the documentation, and didn't notice how they
have changed.
Are you able to provide an example?

> (The implementation of LPeg became too complex. I spent a long time
> trying to document it so that I could understand it again, but in the
> end I had to give up and accept that I could not maintain it as it
> was.  So, I reimplemented the code generator [the real complicated part]
> almost from scratch.  The new implementation is somewhat larger than the
> old one, because now it first creates an AST for each expression and
> later compiles the AST into opcodes for the virtual machine. But now I
> can understand the optimizations.)
>

Damn, maybe for the best though :)

Does this leave any opening for my previous feature requests?
i.e.
 - Yielding
 - Feeding more data (e.g. provide a callback for when lpeg reaches
the end of input)

I assume this means that the lpeg-jit has little hope of completion
now? (I'm so sad no one picked that up)

Thanks Roberto!
J

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Mark Gabby
In reply to this post by Roberto Ierusalimschy

Thanks for the update! Great to see you're working on this again. I'll have to plug the new version into my language and see how it works.

No kidding about the code being complex. I made some aftermarket modifications and it was quite a challenge :)

On Mar 27, 2013 12:23 PM, "Roberto Ierusalimschy" <[hidden email]> wrote:
What is LPeg? LPeg is a new (well, not that new now) pattern-matching
library for Lua, based on Parsing Expression Grammars (PEGs).

What is new in version 0.11?

  + complete reimplementation of the code generator
  + new syntax for table captures
  + new functions in module 're'
  + other small improvements

Where to get it?

  http://www.inf.puc-rio.br/~roberto/lpeg/lpeg-0.11.tar.gz

(The implementation of LPeg became too complex. I spent a long time
trying to document it so that I could understand it again, but in the
end I had to give up and accept that I could not maintain it as it
was.  So, I reimplemented the code generator [the real complicated part]
almost from scratch.  The new implementation is somewhat larger than the
old one, because now it first creates an AST for each expression and
later compiles the AST into opcodes for the virtual machine. But now I
can understand the optimizations.)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Peter Cawley
In reply to this post by Daurnimator
On Wed, Mar 27, 2013 at 12:38 PM, Daurnimator <[hidden email]> wrote:
I assume this means that the lpeg-jit has little hope of completion
now? (I'm so sad no one picked that up)

I've occasionally toyed with the idea of using LuaJIT's DynASM for JIT compiling LPEG. Perhaps with the deprecation of the previous effort in this area, now is a good time to actually play with it.

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Roberto Ierusalimschy
In reply to this post by Daurnimator
> On 27 March 2013 15:23, Roberto Ierusalimschy <[hidden email]> wrote:
> > What is new in version 0.11?
> >   + new syntax for table captures
>
> I quickly scanned over the documentation, and didn't notice how they
> have changed.
> Are you able to provide an example?

That was only for the re module, sorry for the lousy description :)

Inside a regex, instead of "p -> {}" for a table capture, now
we write "{| p |}" (the old syntax still works, for compatibility).


> Does this leave any opening for my previous feature requests?
> i.e.
>  - Yielding
>  - Feeding more data (e.g. provide a callback for when lpeg reaches
> the end of input)

Only in the sense that now I can think about them :)  But both features
seem hard to implement without impairing the performance of the regular
cases.


> I assume this means that the lpeg-jit has little hope of completion
> now? (I'm so sad no one picked that up)

This should be easier now, as the whole code generator is clearer and
better documented.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Meer, H. van der
In reply to this post by Roberto Ierusalimschy
Just installed lua 5.2.2 with: make macosx;make install (working fine it looks).
The to installing lpeg 0.11
Alas, lots of errors. Why?
env  gcc -shared -fPIC lpvm.o lpcap.o lptree.o lpcode.o lpprint.o -o lpeg.so
Undefined symbols for architecture x86_64:
  "_luaL_addlstring", referenced from:
      _stringcap in lpcap.o
      _substcap in lpcap.o
  "_luaL_addvalue", referenced from:
      _addonestring in lpcap.o
…many more…
      ...
  "_lua_typename", referenced from:
      _addonestring in lpcap.o
      _val2str in lptree.o
ld: symbol(s) not found for architecture x86_64
collect2: ld returned 1 exit status
make: *** [lpeg.so] Error 1

The changed in the makefile to: LUADIR = /usr/include/lua5.2/ (instead of 5.2) idem /usr/local/include: not working

my path values:
LUA_VERSION=5.2
LUA_DIR=/usr/local
LUA_INCL=$LUA_DIR/include
LUA_SHAREDIR=$LUA_DIR/share/lua/$LUA_VERSION
LUA_LIBDIR=$LUA_DIR/lib/lua/$LUA_VERSION
LUA_PATH="./?.lua;./?/?.lua;$LUA_SHAREDIR/?.lua;$LUA_SHAREDIR/?/?.lua"
LUA_CPATH="./?.so;./?/?.so;$LUA_LIBDIR/?.so;$LUA_LIBDIR/?/?.so"
(all exported of course)

Where does it go wrong?

Hans van der Meer


On 27 mrt. 2013, at 20:23, Roberto Ierusalimschy <[hidden email]> wrote:

What is LPeg? LPeg is a new (well, not that new now) pattern-matching
library for Lua, based on Parsing Expression Grammars (PEGs).

What is new in version 0.11?

 + complete reimplementation of the code generator
 + new syntax for table captures
 + new functions in module 're'
 + other small improvements

Where to get it?

 http://www.inf.puc-rio.br/~roberto/lpeg/lpeg-0.11.tar.gz

(The implementation of LPeg became too complex. I spent a long time
trying to document it so that I could understand it again, but in the
end I had to give up and accept that I could not maintain it as it
was.  So, I reimplemented the code generator [the real complicated part]
almost from scratch.  The new implementation is somewhat larger than the
old one, because now it first creates an AST for each expression and
later compiles the AST into opcodes for the virtual machine. But now I
can understand the optimizations.)

-- Roberto




Hans van der Meer



Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Luiz Henrique de Figueiredo
> Alas, lots of errors. Why?
> env  gcc -shared -fPIC lpvm.o lpcap.o lptree.o lpcode.o lpprint.o -o lpeg.so

For Mac OS X you need to edit makefile and uncomment the line below:

# DLLFLAGS = -bundle -undefined dynamic_lookup

No need to uncomment the ENV line just above it.

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Meer, H. van der
OK and thanks, compiling now.
I had to change also the top line in test.lua to
#! /usr/local/bin/lua
because the env is there used too.

On 27 mrt. 2013, at 21:45, Luiz Henrique de Figueiredo <[hidden email]> wrote:

>> Alas, lots of errors. Why?
>> env  gcc -shared -fPIC lpvm.o lpcap.o lptree.o lpcode.o lpprint.o -o lpeg.so
>
> For Mac OS X you need to edit makefile and uncomment the line below:
>
> # DLLFLAGS = -bundle -undefined dynamic_lookup
>
> No need to uncomment the ENV line just above it.
>

Hans van der Meer




Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Leo Razoumov
In reply to this post by Roberto Ierusalimschy
On Wed, Mar 27, 2013 at 3:23 PM, Roberto Ierusalimschy
<[hidden email]> wrote:

> What is LPeg? LPeg is a new (well, not that new now) pattern-matching
> library for Lua, based on Parsing Expression Grammars (PEGs).
>
> What is new in version 0.11?
>
>   + complete reimplementation of the code generator
>   + new syntax for table captures
>   + new functions in module 're'
>   + other small improvements
>
> Where to get it?
>
>   http://www.inf.puc-rio.br/~roberto/lpeg/lpeg-0.11.tar.gz

I presume it is written for Lua-5.2.x. Will it work with Lua-5.1.x and LuaJIT?

--Leo--

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Peter Drahoš
On 28 Mar, 2013, at 1:49 , Leo Razoumov <[hidden email]> wrote:

> On Wed, Mar 27, 2013 at 3:23 PM, Roberto Ierusalimschy
> <[hidden email]> wrote:
>> What is LPeg? LPeg is a new (well, not that new now) pattern-matching
>> library for Lua, based on Parsing Expression Grammars (PEGs).
>>
>> What is new in version 0.11?
>>
>>  + complete reimplementation of the code generator
>>  + new syntax for table captures
>>  + new functions in module 're'
>>  + other small improvements
>>
>> Where to get it?
>>
>>  http://www.inf.puc-rio.br/~roberto/lpeg/lpeg-0.11.tar.gz
>
> I presume it is written for Lua-5.2.x. Will it work with Lua-5.1.x and LuaJIT?

Compiles and runs tests successfully for lua-5.1.5, lua-5.2.2 and LuaJIT 2.0.1.

pd




Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Miles Bader-2
In reply to this post by Roberto Ierusalimschy
Roberto Ierusalimschy <[hidden email]> writes:
> What is LPeg? LPeg is a new (well, not that new now) pattern-matching
> library for Lua, based on Parsing Expression Grammars (PEGs).
>
> What is new in version 0.11?

Excellent!

Benchmarkin' time... :]

-miles

--
永日の 澄んだ紺から 永遠へ

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Sérgio Medeiros
In reply to this post by Peter Cawley
De: Peter Cawley <[hidden email]>
Para: Lua mailing list <[hidden email]>
Enviadas: Quarta-feira, 27 de Março de 2013 16:41
Assunto: Re: [ANN] LPeg 0.11
 

> I've occasionally toyed with the idea of using LuaJIT's DynASM for JIT compiling LPEG. Perhaps with the deprecation of the

> previous effort in this area, now is a good time to actually play with it.

I think it would not be to complicated to do a translation
from the pseudo-assembly that I have used in my LPeg
JIT compiler to an assembly accepted by DynASM.

This approach may give a working implementation faster,
but not the faster one.

Another option, in the spirit of LPeg 0.11, is to start to implement from scratch :-)

Sérgio 

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Mark Gabby
In reply to this post by Roberto Ierusalimschy
FYI, this version is giving me "pattern too large" errors. Since the maximum pattern size hasn't changed (I checked) I'm guessing I had some pattern that was on the brink of being too big and the upgrade's optimization changes caused it to end up longer.

There was mention about working to eliminate this limit[1]; is that still on the todo list? Is there a todo list? :)

[1] http://lua-users.org/lists/lua-l/2008-11/msg00470.html
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Roberto Ierusalimschy
> There was mention about working to eliminate this limit[1]; is that still
> on the todo list? Is there a todo list? :)
>
> [1] http://lua-users.org/lists/lua-l/2008-11/msg00470.html

I will create a todo list and add this item to it :)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Miles Bader-2
Roberto Ierusalimschy <[hidden email]> writes:

>> There was mention about working to eliminate this limit[1]; is that still
>> on the todo list? Is there a todo list? :)
>>
>> [1] http://lua-users.org/lists/lua-l/2008-11/msg00470.html
>
> I will create a todo list and add this item to it :)
>
> -- Roberto

Hmmm, could you add my request about LPeg memory management too?

   http://lua-users.org/lists/lua-l/2012-04/msg00898.html

I'm not really sure how clear that message was, but ... basically it
would be nice if one could add grammar hints to control memory
management in LPeg, so that parsing extremely large-but-flat inputs
with a single grammar is possible.

Thanks,

-miles

--
Rational, adj. Devoid of all delusions save those of observation, experience
and reflection.

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Roberto Ierusalimschy
> Hmmm, could you add my request about LPeg memory management too?
>
>    http://lua-users.org/lists/lua-l/2012-04/msg00898.html
>
> I'm not really sure how clear that message was, but ... basically it
> would be nice if one could add grammar hints to control memory
> management in LPeg, so that parsing extremely large-but-flat inputs
> with a single grammar is possible.

I am not sure I understood what you want. If you do not want to keep
any captures from ONE_STATEMENT, why does it have captures in the
first place?

Anyway, I guess you can force the release of all captures made by
ONE_STATEMENT with the following code:

  lpeg.Cmt(ONE_STATEMENT / 0, function () return true end)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Miles Bader-2
Roberto Ierusalimschy <[hidden email]> writes:
> I am not sure I understood what you want. If you do not want to keep
> any captures from ONE_STATEMENT, why does it have captures in the
> first place?

Any results from ONE_STATEMENT are already recorded via side-effects,
so any captures inside it have already been used, and I want to
discard them for the "next iteration" (of ONE_STATEMENT).

Trying this in real life leads to memory explosion, whereas just using
an explicit loop outside of LPeg doesn't, so clearly something's
being kept around inside LPeg between iterations.

-miles

--
Virtues, n. pl. Certain abstentions.

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Patrick Donnelly
On Mon, Apr 1, 2013 at 6:21 PM, Miles Bader <[hidden email]> wrote:
> Roberto Ierusalimschy <[hidden email]> writes:
>> I am not sure I understood what you want. If you do not want to keep
>> any captures from ONE_STATEMENT, why does it have captures in the
>> first place?
>
> Any results from ONE_STATEMENT are already recorded via side-effects,
> so any captures inside it have already been used, and I want to
> discard them for the "next iteration" (of ONE_STATEMENT).

LPeg can't know those captures can be discarded and so can't "flush"
the captures. The reason for this is quite simple and you brought it
up: side-effects. Consider (automagically) flushing a pattern's
captures which calls capture functions with side-effects. However, the
enclosing pattern *does not match* and so those side-effects should
never have happened.

> Trying this in real life leads to memory explosion, whereas just using
> an explicit loop outside of LPeg doesn't, so clearly something's
> being kept around inside LPeg between iterations.

You talk about memory explosion, do you mean LPeg stack overflow? If
so, the solution I've used for an LPeg Lua source formatter is to
flatten large numbers of captures at strategic points into a single
capture using:

-- The formatter uses captures to indent code. We necessarily use thousands and
-- thousands of them. At various strategic points, we concatenate these captures
-- so we don't overflow the Lua stack.
local function FLATTEN (pattern)
  return Ct(pattern) / concat;
end


The other solution is to use match-time captures which flush captures
-- which you wanted in your referenced posting. Unfortunately
match-time captures slow down matching considerably.

--
- Patrick Donnelly

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Miles Bader-2
Patrick Donnelly <[hidden email]> writes:

> On Mon, Apr 1, 2013 at 6:21 PM, Miles Bader <[hidden email]> wrote:
>> Roberto Ierusalimschy <[hidden email]> writes:
>>> I am not sure I understood what you want. If you do not want to
>>> keep any captures from ONE_STATEMENT, why does it have captures in
>>> the first place?
>>
>> Any results from ONE_STATEMENT are already recorded via
>> side-effects, so any captures inside it have already been used, and
>> I want to discard them for the "next iteration" (of ONE_STATEMENT).
>
> LPeg can't know those captures can be discarded and so can't "flush"
> the captures. The reason for this is quite simple and you brought it
> up: side-effects. Consider (automagically) flushing a pattern's
> captures which calls capture functions with side-effects. However,
> the enclosing pattern *does not match* and so those side-effects
> should never have happened.

Right, which is why I'm suggesting an explicit LPeg directive that
says "this is OK, I know what I'm doing."  The onus is then on the
user to make sure that they deal with the effects of this.

Typically this would mean that any error handling resulting from a
failed top-level match would discard previously recorded results, or
that it doesn't matter (e.g. when the program just errors out and
terminates as a result).

[My memory of Prolog is pretty fuzzy, but I think this sort of
corresponds to the Prolog "cut" operator.]

>> Trying this in real life leads to memory explosion, whereas just
>> using an explicit loop outside of LPeg doesn't, so clearly
>> something's being kept around inside LPeg between iterations.
>
> You talk about memory explosion, do you mean LPeg stack overflow?

No, I mean it uses so much heap space that my (4GB) system starts to
thrash itself to death... :]

The exact same grammar with the loop moved "outside of LPeg" is fine,
so the memory issues are not due to actual recorded results.

For very simple top-level grammars, e.g. those that simply cover the
input with the repetition of single pattern, doing the loop outside of
LPeg is reasonably easy, but for anything even a just a little more
complex, it becomes a pain, and sort of ugly to boot.

Hmm, I suppose I should try to come up with a succinct example that
illustrates this.

Thanks,

-miles

--
Suburbia: where they tear out the trees and then name streets after them.

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] LPeg 0.11

Miles Bader-2
Miles Bader <[hidden email]> writes:
> Hmm, I suppose I should try to come up with a succinct example that
> illustrates this.

Ok, take the attached Lua program; it's intended to be run as an
executable shell-script (so if you try it, do "chmod +x" first).

It takes two arguments, MODE, and FILENAME.  MODE should be "inside"
to do the outermost loop in LPeg, and "outside" to do the loop in Lua,
calling LPeg for each iteration.  FILENAME is the input file to use;
it should be a big file to get measurable results.

This is of course an extreme example, as the inner LPeg pattern only
matches a single character... :)  But it demonstrates the issue which
I've run into with real world use with much less simplistic patterns.

Here's a test run:

   $ ls -l $snogr/images/dof18.exr
   -rw-r--r-- 1 miles miles 6761826 Dec 31  2011 /home/miles/src/snogray/images/dof18.exr

   $ /usr/bin/time ./lpeg-loop.lua outside $snogr/images/dof18.exr
   mode    outside
   char_sum        857014934
   5.50user 0.02system 0:05.53elapsed 99%CPU (0avgtext+0avgdata 18504maxresident)k
   0inputs+0outputs (0major+5673minor)pagefaults 0swaps

   $ /usr/bin/time ./lpeg-loop.lua inside $snogr/images/dof18.exr
   mode    inside
   char_sum        857014934
   2.67user 0.29system 0:02.97elapsed 99%CPU (0avgtext+0avgdata 213688maxresident)k
   0inputs+0outputs (0major+64827minor)pagefaults 0swaps


Note that the "inside LPeg" loop is:

  1. faster (I guess it avoids various LPeg invocation setup costs)

  2. Uses ten times as much memory (the "maxresident" value)... ><

It's (2) that's the real problem, of course, as my real-world tests
ended up exceeding system memory for inputs that worked OK with an
outside-of-LPeg loop.  As I also mentioned, however, doing everything
in LPeg would also make non-trivial grammars much easier to handle.

Thanks,

-Miles




--
Saa, shall we dance?  (from a dance-class advertisement)

lpeg-loop.lua (1K) Download Attachment
12