[release] parser-gen

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[release] parser-gen

Benas Vaitkevičius
Dear Lua community,

I am glad to announce parser-gen, a parser generator that I created together with LabLua this summer. The code can be found on GitHub: https://github.com/vsbenas/parser-gen

The tool extends the LPeg(Label) module to provide these features:
1) generate parsers based on a PEG grammar description (similar to ANTLR and re);
2) allow easier error description and reporting;
3) automatically handle space characters, comments or any other patterns set by the user;
4) build ASTs automatically based on the rule name;
5) generate recovery grammars automatically, build partial ASTs for inputs with errors;
6) generate error labels for LL(1) grammars.

All these features should make the description of parsers easier, faster and more concise. Any of these features can be disabled if necessary (for example, to build a custom AST).

The grammars used for this tool are described using a PEG-like syntax, that is identical to the one provided by the "re" module, with some extensions used in the "relabel" module and "ANTLR" grammars. See a working example parser here: 
The package can be installed using luarocks:
$ luarocks install parser-gen

This is a beta release, please report any bugs when found.


Reply | Threaded
Open this post in threaded view
|

Re: [release] parser-gen

Martin

On 08/30/2017 04:07 PM, Benas Vaitkevičius wrote:

> Dear Lua community,
>
> I am glad to announce parser-gen, a parser generator that I created
> together with LabLua this summer. The code can be found on GitHub:
> https://github.com/vsbenas/parser-gen
>
> The tool extends the LPeg(Label) module to provide these features:
> 1) generate parsers based on a PEG grammar description (similar to ANTLR
> and re);
> 2) allow easier error description and reporting;
> 3) automatically handle space characters, comments or any other patterns
> set by the user;
> 4) build ASTs automatically based on the rule name;
> 5) generate recovery grammars automatically, build partial ASTs for
> inputs with errors;
> 6) generate error labels for LL(1) grammars.
>
> All these features should make the description of parsers easier, faster
> and more concise. Any of these features can be disabled if necessary
> (for example, to build a custom AST).
>
> The grammars used for this tool are described using a PEG-like syntax,
> that is identical to the one provided by the "re" module, with some
> extensions used in the "relabel" module and "ANTLR" grammars. See a
> working example parser here:
> https://github.com/vsbenas/parser-gen#example-tiny-parser
>
> For a more advanced example check out the Lua parser:
> https://github.com/vsbenas/parser-gen/blob/master/parsers/lua-parser.lua
>
> The package can be installed using luarocks:
> $ luarocks install parser-gen
>
> This is a beta release, please report any bugs when found.

Wow, good job!

If I understand correctly, your code contains parser for string with
LPEG grammar and executor for parsed grammar, which returns AST.

I've implemented similar executor (which is used in "lcf" (lua code
formatter and "autoldoc") but my grammars passed as lua table.

Do your parser-gen have hard limits? And under what license it is
distributed?

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: [release] parser-gen

Peter Melnichenko-2
In reply to this post by Benas Vaitkevičius
On Wed, Aug 30, 2017 at 6:07 PM, Benas Vaitkevičius <[hidden email]> wrote:

> Dear Lua community,
>
> I am glad to announce parser-gen, a parser generator that I created together
> with LabLua this summer. The code can be found on GitHub:
> https://github.com/vsbenas/parser-gen
>
> The tool extends the LPeg(Label) module to provide these features:
> 1) generate parsers based on a PEG grammar description (similar to ANTLR and
> re);
> 2) allow easier error description and reporting;
> 3) automatically handle space characters, comments or any other patterns set
> by the user;
> 4) build ASTs automatically based on the rule name;
> 5) generate recovery grammars automatically, build partial ASTs for inputs
> with errors;
> 6) generate error labels for LL(1) grammars.
>
> All these features should make the description of parsers easier, faster and
> more concise. Any of these features can be disabled if necessary (for
> example, to build a custom AST).
>
> The grammars used for this tool are described using a PEG-like syntax, that
> is identical to the one provided by the "re" module, with some extensions
> used in the "relabel" module and "ANTLR" grammars. See a working example
> parser here:
> https://github.com/vsbenas/parser-gen#example-tiny-parser
>
> For a more advanced example check out the Lua parser:
> https://github.com/vsbenas/parser-gen/blob/master/parsers/lua-parser.lua
>
> The package can be installed using luarocks:
> $ luarocks install parser-gen
>
> This is a beta release, please report any bugs when found.

This looks interesting, thanks!

I've tried the Lua parser on some real code.
Testing on files in src/luacheck/ in
https://github.com/mpeterv/luacheck at commit ca21257:

analyze.lua, linearize.lua, parser.lua: errors with `too many
captures` in lpeglabel.match call.
format.lua: it seems to be confused by `{[[...]]}` construction.

    Syntax error #1: expected an expression after '[' for the table
key at line 332(col 18)
    Syntax error #2: expected '=' after the table key at line 332(col 58)
    Syntax error #3: expected an expression after '=' at line 333(col 4)
    Syntax error #4: expected '}' to close the table constructor at
line 333(col 4)

Testing on files from Luarocks show some other types of errors. But
these can be fixed.
The `too many captures` error is more troubling as it appears for most
larger files.
Additionally, the memory consumption of the parser seemed very high on
these larger files
(hint: turn off swap before testing this if you have it enabled normally).
Not sure if this is due to limitations of parser-gen or lpeglabel or
perhaps some bad case in grammar.

-- Best regards,
-- Peter Melnichenko

Reply | Threaded
Open this post in threaded view
|

Re: [release] parser-gen

Peter Melnichenko-2
On Wed, Aug 30, 2017 at 9:00 PM, Peter Melnichenko <[hidden email]> wrote:

> I've tried the Lua parser on some real code.
> Testing on files in src/luacheck/ in
> https://github.com/mpeterv/luacheck at commit ca21257:
>
> analyze.lua, linearize.lua, parser.lua: errors with `too many
> captures` in lpeglabel.match call.
> format.lua: it seems to be confused by `{[[...]]}` construction.
>
>     Syntax error #1: expected an expression after '[' for the table
> key at line 332(col 18)
>     Syntax error #2: expected '=' after the table key at line 332(col 58)
>     Syntax error #3: expected an expression after '=' at line 333(col 4)
>     Syntax error #4: expected '}' to close the table constructor at
> line 333(col 4)
>
> Testing on files from Luarocks show some other types of errors. But
> these can be fixed.
> The `too many captures` error is more troubling as it appears for most
> larger files.
> Additionally, the memory consumption of the parser seemed very high on
> these larger files
> (hint: turn off swap before testing this if you have it enabled normally).
> Not sure if this is due to limitations of parser-gen or lpeglabel or
> perhaps some bad case in grammar.

For reference, https://github.com/andremm/lua-parser does not have any issues
with any of these files.

-- Best regards,
-- Peter Melnichenko

Reply | Threaded
Open this post in threaded view
|

Re: [release] parser-gen

KHMan
In reply to this post by Peter Melnichenko-2
On 8/31/2017 2:00 AM, Peter Melnichenko wrote:

> On Wed, Aug 30, 2017 at 6:07 PM, Benas Vaitkevičius wrote:
>> Dear Lua community,
>>
>> I am glad to announce parser-gen, a parser generator that I created together
>> with LabLua this summer. The code can be found on GitHub:
>> https://github.com/vsbenas/parser-gen
>>[snip snip snip]
>>
>> This is a beta release, please report any bugs when found.
>
> [snip snip snip]
> Additionally, the memory consumption of the parser seemed very high on
> these larger files

Last time I looked a few years ago, often academic papers on PEG
mentions memory consumption as an issue. PEG on Wikipedia [1] puts
memory usage as the first item on Disadvantages. So I guess
nothing much has changed.

Any PEG being used for heavy-duty complex processing these days?
Without searching for it, I have not come across any. High memory
consumption would hit the CPU caches and negatively impact
performance. Folks are not moving to PEG in droves.

[1] https://en.wikipedia.org/wiki/Parsing_expression_grammar

> (hint: turn off swap before testing this if you have it enabled normally).
> Not sure if this is due to limitations of parser-gen or lpeglabel or
> perhaps some bad case in grammar.

--
Cheers,
Kein-Hong Man (esq.)
Selangor, Malaysia


Reply | Threaded
Open this post in threaded view
|

Re: [release] parser-gen

William Ahern
On Thu, Aug 31, 2017 at 10:16:46AM +0800, KHMan wrote:

> On 8/31/2017 2:00 AM, Peter Melnichenko wrote:
> > On Wed, Aug 30, 2017 at 6:07 PM, Benas Vaitkevičius wrote:
> > > Dear Lua community,
> > >
> > > I am glad to announce parser-gen, a parser generator that I created together
> > > with LabLua this summer. The code can be found on GitHub:
> > > https://github.com/vsbenas/parser-gen
> > > [snip snip snip]
> > >
> > > This is a beta release, please report any bugs when found.
> >
> > [snip snip snip]
> > Additionally, the memory consumption of the parser seemed very high on
> > these larger files
>
> Last time I looked a few years ago, often academic papers on PEG mentions
> memory consumption as an issue. PEG on Wikipedia [1] puts memory usage as
> the first item on Disadvantages. So I guess nothing much has changed.

Memoizing PEG engines (aka packrat parsers) inherently use alot of memory,
and that's what the Wikipedia article is discussing. But LPeg isn't a
packrat parser.

PEG-based parsers do tend to require relatively many matching rules because
the lexing and parsing phases are combined. IIRC, a pure PEG to match an
IPv6 address precisely requires a very large number of matching rules. One
way to decrease memory usage is to split parsing into separate phases, just
as with traditional parsing methods. So, for example, only fuzzily match
IPv6 addresses in your PEG (enough to disambiguate from other syntax, but
not enough to only match a valid IPv6 address), then in a separate pass
check IPv6 address nodes using other methods. A hand-written IPv6 parser in
Lua, using a splitting function and a loop, is just a dozen or so lines of
code.

Also, LPeg in particular makes it incredibly easy to programmatically
generate PEGs. Which means it's easy to generate PEGs with too many,
possibly unnecessary matching rules.