lpeg could infer the validity of this grammar.

11 messages
Open this post in threaded view
|

lpeg could infer the validity of this grammar.

 This is my first post to Lua list, so let me say hello and thank you.Eliminating left recursion is a sometimes frustrating part of writing a PEG grammar. One way to do this rewrite is shown below. Sadly, lpeg doesn't allow this.```luarequire "lpeg"match, P, V, R, S = lpeg.match, lpeg.P, lpeg.V, lpeg.R, lpeg.Sspace = P" "digit = R"09"operand = S"+-*/"token = space + P"(" + digit^1 + P")" + operandtokenizer = token^1group = P { "groups"; groups = #token * V"group" + token ; group = P"(" * V"groups"^0 * P")";}---[[expr = P { "expr"; expr = token + ( V"group" + V"factor" + V"term")  group = P"(" * V"expr"^0 * P")"; factor = #token * V"expr" * (P"*" + P"/") * V"expr"; term = token; -- we do not reach this point. }--]] -- The following asserts are valid if expr is commented:test_s = "(1 + 2 / 3) + 4 * 5 + (6)"group_s = "(1 (2))"invalid_token =  "(1 + 2 / ; 3) + 4 * 5 + (6)"assert(match(tokenizer,test_s) == #test_s+1)assert(match(tokenizer,invalid_token) ~= #invalid_token+1)assert(match(group,group_s))```This program fails with "rule 'expr' may be left recursive".I assure you, this is not possible. This program is somewhat of a logical proof of that. The # predicate means that, if token is called by the rule, it will consume part of the string. Because we put it at the start of expr, it is given a *chance*to*fail* before any infinite recursion can take place. lpeg does need to follow the call chain and ensure that token is an inevitable call. This is static inference of the grammar with no runtime cost. It is moderately complex, we must assure that it's impossible for the call to expr to succeed on empty without first calling token. PEGs are deterministic, so this is guaranteed to be a possible analysis. This was designed so that it should succeed deterministically. The naive state machine will always check for token in expr before checking the grouped terms, and factor uses #token to assure that this first parse will succeed before expr is allowed to be called. This grammar will never recurse, on valid or invalid input. It will behave as specified. It's badly broken as a real expression parser, intended only to demonstrate the principle.  I am willing to accept a grammar that shows that it's impossible to statically infer whether #pattern propagation can or cannot be performed, as a counterargument. Otherwise, I would very much like to see lpeg support this inference.Thank you for your consideration. cheers,-Sam Putman.
Open this post in threaded view
|

Re: lpeg could infer the validity of this grammar.

 2014-12-14 21:39 GMT+02:00 Sam Putman <[hidden email]>: > expr = token + ( V"group" + V"factor" + V"term") -- semicolon needed here > group = P"(" * V"expr"^0 * P")"; > factor = #token * V"expr" * (P"*" + P"/") * V"expr"; > This program fails with "rule 'expr' may be left recursive". If `expr` is not `token` or `group`, it tries `factor`. The rule for `factor` invokes V"expr" before any input has been consumed.
Open this post in threaded view
|

Re: lpeg could infer the validity of this grammar.

 On Sun, Dec 14, 2014 at 12:56 PM, Dirk Laurie wrote:2014-12-14 21:39 GMT+02:00 Sam Putman <[hidden email]>: > expr = token + ( V"group" + V"factor" + V"term") -- semicolon needed here > group = P"(" * V"expr"^0 * P")"; > factor = #token * V"expr" * (P"*" + P"/") * V"expr"; > This program fails with "rule 'expr' may be left recursive". If `expr` is not `token` or `group`, it tries `factor`. The rule for `factor` invokes V"expr" before any input has been consumed.Hi Dirk,I do understand the cause for the failure to infer. The grammar is valid, but not valid lpeg, as lpeg stands.I'm not looking for help rewriting grammars, I can get around this particular hitch. Because factor calls #token, you and I know that the next call to token will succeed, if it is reached. So the call to expr, which calls token first, will succeed. So we can infer statically that it's safe to call factor. No loop will in fact occur.#token fails with an illegal character or end of string, so factor will not call expr in that case.
Open this post in threaded view
|

Re: lpeg could infer the validity of this grammar.

 On Sun, Dec 14, 2014 at 1:50 PM, Sam Putman wrote:On Sun, Dec 14, 2014 at 12:56 PM, Dirk Laurie wrote:2014-12-14 21:39 GMT+02:00 Sam Putman <[hidden email]>: > expr = token + ( V"group" + V"factor" + V"term") -- semicolon needed here > group = P"(" * V"expr"^0 * P")"; > factor = #token * V"expr" * (P"*" + P"/") * V"expr"; > This program fails with "rule 'expr' may be left recursive". If `expr` is not `token` or `group`, it tries `factor`. The rule for `factor` invokes V"expr" before any input has been consumed.I do understand the cause for the failure to infer. The grammar is valid, but not valid lpeg, as lpeg stands.To elaborate, I understand that #token returns the empty string, and it isn't always safe to call "expr" within "expr". When hand-writing recursive descent, it's nice when possible to do a few tokens of lookahead to ensure completion, then recur. This grammar attempts to mimic that idiom.
Open this post in threaded view
|

Re: lpeg could infer the validity of this grammar.

 It was thus said that the Great Sam Putman once stated: > On Sun, Dec 14, 2014 at 1:50 PM, Sam Putman <[hidden email]> wrote: > > On Sun, Dec 14, 2014 at 12:56 PM, Dirk Laurie <[hidden email]> > > wrote: > >> 2014-12-14 21:39 GMT+02:00 Sam Putman <[hidden email]>: > >> > >> > expr = token + ( V"group" + V"factor" + V"term") > >> -- semicolon needed here > >> > group = P"(" * V"expr"^0 * P")"; > >> > factor = #token * V"expr" * (P"*" + P"/") * V"expr"; > >> > >> > This program fails with "rule 'expr' may be left recursive". > >> > >> If `expr` is not `token` or `group`, it tries `factor`. > >> The rule for `factor` invokes V"expr" before any > >> input has been consumed. > > > > I do understand the cause for the failure to infer. The grammar is valid, > > but not valid lpeg, as lpeg stands. > > To elaborate, I understand that #token returns the empty string, and it > isn't always safe to call "expr" within "expr". When hand-writing recursive > descent, it's nice when possible to do a few tokens of lookahead to ensure > completion, then recur. This grammar attempts to mimic that idiom.   The rules I use for this type of thing look like: re = require "re" -- it's still part of LPeg.                   -- and I find it easier to follow the grammar parser = re.compile [[         expr    <- term   (termop   term)*         term    <- factor (factorop factor)*         factor  <- number / open expr close         space           <- %s*         number          <- space {'-'^-1 [0-9]^+1} space         termop          <- space {[+-]} space         factorop        <- space {[*/]} space         open            <- space '('    space         close           <- space ')'    space ]]   It should be straightforward to translate this to LPeg (seeing how re can do that).   -spc
Open this post in threaded view
|

Re: lpeg could infer the validity of this grammar.

 On Sun, Dec 14, 2014 at 6:14 PM, Sean Conner wrote:   The rules I use for this type of thing look like: re = require "re" -- it's still part of LPeg.                   -- and I find it easier to follow the grammar parser = re.compile [[         expr    <- term   (termop   term)*         term    <- factor (factorop factor)*         factor  <- number / open expr close         space           <- %s*         number          <- space {'-'^-1 [0-9]^+1} space         termop          <- space {[+-]} space         factorop        <- space {[*/]} space         open            <- space '('    space         close           <- space ')'    space ]]I'm glad you posted this! Someone, somewhere, is going to search for 'left recursion lpeg' and solve their problem. I've been merrily rewriting my grammar and am pleased to find it's working. My issue is that lpeg won't pass a valid grammar because empty-return propagation doesn't account for the predicate in #patt. I think this should change, the additional complexity to the library is at compile-time. The grammar, as written, will not loop, and the compiler can reason through the call chain, just like we do, to confirm that.
Open this post in threaded view
|

Re: lpeg could infer the validity of this grammar.

 2014-12-15 4:49 GMT+02:00 Sam Putman <[hidden email]>: > The grammar, as written, will not loop, and the compiler can reason through > the call chain, just like we do, to confirm that. In this case, it can. But a mechanism that can in general determine whether a given grammar will terminate for all input --- didn't Alan Turing prove that to be impossible?
Open this post in threaded view
|

Re: lpeg could infer the validity of this grammar.

 On Sun, Dec 14, 2014 at 8:24 PM, Dirk Laurie wrote:2014-12-15 4:49 GMT+02:00 Sam Putman <[hidden email]>: > The grammar, as written, will not loop, and the compiler can reason through > the call chain, just like we do, to confirm that. In this case, it can. But a mechanism that can in general determine whether a given grammar will terminate for all input --- didn't Alan Turing prove that to be impossible? ^_^  rule : patt / funcfunc = function() return func() endClearly we cannot infer all non-termination. I would accept, a la Turing the Great, a grammar for which this #patt business cannot be sorted into "may be left recursive" or "is not left recurisve".
Open this post in threaded view
|

Re: lpeg could infer the validity of this grammar.

 In reply to this post by Sam Putman It was thus said that the Great Sam Putman once stated: > On Sun, Dec 14, 2014 at 12:56 PM, Dirk Laurie <[hidden email]> wrote: > > > > 2014-12-14 21:39 GMT+02:00 Sam Putman <[hidden email]>: > > > > > expr = token + ( V"group" + V"factor" + V"term") > > -- semicolon needed here > > > group = P"(" * V"expr"^0 * P")"; > > > factor = #token * V"expr" * (P"*" + P"/") * V"expr"; > > > > > This program fails with "rule 'expr' may be left recursive". > > > > If `expr` is not `token` or `group`, it tries `factor`. > > The rule for `factor` invokes V"expr" before any > > input has been consumed. > > > > Hi Dirk, > > I do understand the cause for the failure to infer. The grammar is valid, > but not valid lpeg, as lpeg stands.   LPeg is a Lua implementation of Parsing Expresson Grammars, which are *NOT* (I repeat, *NOT*) the same as LR() or LL() parsers.  There's a bit more information about this on the Wikipedia: http://en.wikipedia.org/wiki/Parsing_expression_grammar#Indirect_left_recursion  -spc