[LPeg] How can I make a recursive pattern?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[LPeg] How can I make a recursive pattern?

Soni "They/Them" L.
Sean Conner's incomplete solution uses lpeg.Carg(1) which doesn't work
recursively.

This is what I'm trying to parse: https://github.com/SoniEx2/MDXML

This is what I currently have:

----------------------------------
   --[[
     EOF/newlines/whitespace/inline comments
   --]]
   local eof = lpeg.P(-1)
   local nl = (lpeg.P "\r")^-1 * lpeg.P "\n" + lpeg.P "\\n" + eof -- \r
for winblows compat

   local ws = lpeg.S(" \t") + nl - nl * nl
   local inlineComment = lpeg.P("`") * (1 - (lpeg.S("`\n"))) ^ 0 *
lpeg.P("`")
   local wsc = ws + inlineComment -- comments count as whitespace

   --[[
     Escape sequences
   --]]

   local backslashEscaped
   = lpeg.P("\\ ") / " " -- escaped spaces
   + lpeg.P("\\\\") / "\\" -- escaped escape character
   + lpeg.P("\\#") / "#"
   + lpeg.P("\\>") / ">"
   + lpeg.P("\\`") / "`"
   + lpeg.P("\\n") -- \\n newlines count as backslash escaped
   + lpeg.P("\\") * lpeg.P(function(_, i)
       error("Unknown backslash escape at position " .. i)
     end)

   local content = lpeg.Cs((wsc / " " + (backslashEscaped + 1 -
nl*nl))^0) * nl * nl

   --[[
     Process tag/attribute/value/etc
     (the `c` in the argument names means "captured")
   --]]

   local tag = lpeg.P"#" * lpeg.C((1 - nl)^1) * nl / function(ctag)
     return content:match(ctag)
   end

   local attribute = lpeg.P"##" * lpeg.C((1 - nl)^1) * nl /
function(cattribute)
     return content:match(cattribute)
   end

   local value = lpeg.P"###" * lpeg.C((1 - nl)^1) * nl / function(cvalue)
     return content:match(cvalue)
   end

   local tagns = lpeg.P"####" * lpeg.C((1 - nl)^1) * nl / function(ctagns)
     return content:match(ctagns)
   end

   local attrns = lpeg.P"#####" * lpeg.C((1 - nl)^1) * nl /
function(cattrns)
     return content:match(cattrns)
   end

   --[[
     Recursive grammar.
   --]]

   local doc = lpeg.P { lpeg.V"entry",
     entry = ((lpeg.Carg(1) * lpeg.C(tag))
     / function(t,tag)
       local x = { [0] = tag }
       table.insert(t,x)
       return t
     end
     * (
       (lpeg.Carg(1) * lpeg.C(tagns))
       / function (t, ns)
         local x = t[#t]
         local tag = x[0]
         x[0] = {tag=tag, ns=ns}
       end
       )^0
     * (
       (lpeg.Carg(1) * lpeg.C(attribute) * lpeg.C(value))
       /  function(t,a,v)
         local x = t[#t]
         x[a] = v
         return t
       end
       )^0
     * (
       (lpeg.Carg(1) * lpeg.C(content))
       /  function(t,c)
         local x = t[#t]
         table.insert(x,c)
         return t
       end
       )^0)
   }
----------------------------------

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] How can I make a recursive pattern?

Patrick Donnelly
Soni,

On Wed, Jul 27, 2016 at 10:27 AM, Soni L. <[hidden email]> wrote:
> Sean Conner's incomplete solution uses lpeg.Carg(1) which doesn't work
> recursively.
>
> This is what I'm trying to parse: https://github.com/SoniEx2/MDXML
>
> This is what I currently have:
> [...]

Most people don't have time to grok your code. Can you give a smaller
example/explanation of what you're trying to do? It sounds like you
may want to use group and back captures to carry information in
recursive matching. Not sure though...

--
Patrick Donnelly

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] How can I make a recursive pattern?

Soni "They/Them" L.


On 27/07/16 12:34 PM, Patrick Donnelly wrote:

> Soni,
>
> On Wed, Jul 27, 2016 at 10:27 AM, Soni L. <[hidden email]> wrote:
>> Sean Conner's incomplete solution uses lpeg.Carg(1) which doesn't work
>> recursively.
>>
>> This is what I'm trying to parse: https://github.com/SoniEx2/MDXML
>>
>> This is what I currently have:
>> [...]
> Most people don't have time to grok your code. Can you give a smaller
> example/explanation of what you're trying to do? It sounds like you
> may want to use group and back captures to carry information in
> recursive matching. Not sure though...
>
Basically (really oversimplifying here) I have:

tag = # <stuff> \n
tagns = #### <stuff> \n
attr = ## <stuff> \n
val = ### <stuff> \n
atrrns = ##### <stuff> \n
content = > <stuff> [\n > <stuff>] \n \n

and I want to capture things like:

tag and {
   [0] = tagns and {tag=tag.stuff, ns=tagns.stuff} or tag.stuff,
   [attr.stuff] = attrns and {val=val.stuff, ns=attrns.stuff} or val.stuff,
   RECURSION_POINT
} or content.stuff

note that RECURSION_POINT repeats 0 or more times.

E.g. the following MDXML document:

##version
###1.1
##encoding
###utf-8

`This is a list of programming languages and programming language books, in MDXML`

#programming

> #languages
> > #language
> > > #name
> > > > Lua
> > >
> > > #link
> > > > http://www.lua.org/
> >
> > #language
> > > #name
> > > > Python
> > >
> > > #link
> > > > https://www.python.org/
>
> # books
> > #book
> > ##edition
> > ###3
> > > #name
> > > > Programming in Lua
> > >
> > > #ISBN
> > > > 859037985X

Should produce the following table: (The weird table structure is so
each line matches the equivalent line in the MDXML.)

{ [0] = 'programming',
   { [0] = 'languages',
     { [0] = 'language',
       { [0] = 'name',
         'Lua'
       },
       { [0] = 'link',
         'http://www.lua.org/'
       } },
     { [0] = 'language',
       { [0] = 'name',
         'Python'
         },
       { [0] = 'link',
         'https://www.python.org/'
       } } },
   { [0] = 'books',
     { [0] = 'book',
       ['edition'] =
       '3',
       { [0] = 'name',
         'Programming in Lua'
       },
       { [0] = 'ISBN',
         '859037985X'
       } } } }

And the following MDXML document:

##version
###1.1
##encoding
###utf-8

#root
 > #a
 > #b
 > #c
 > stuff

Should produce:

{ [0] = 'root',
   {[0] = 'a'},
   {[0] = 'b'},
   {[0] = 'c'},
   'stuff'
}

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] How can I make a recursive pattern?

William Ahern
In reply to this post by Soni "They/Them" L.

On Wed, Jul 27, 2016 at 11:27:35AM -0300, Soni L. wrote:
> Sean Conner's incomplete solution uses lpeg.Carg(1) which doesn't work
> recursively.
>
<snip>

The answer is lpeg.V and the table mode of lpeg.P.

See http://www.inf.puc-rio.br/~roberto/lpeg/#grammar


Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] How can I make a recursive pattern?

Soni "They/Them" L.


On 27/07/16 01:43 PM, William Ahern wrote:

> On Wed, Jul 27, 2016 at 11:27:35AM -0300, Soni L. wrote:
>> Sean Conner's incomplete solution uses lpeg.Carg(1) which doesn't work
>> recursively.
>>
> <snip>
>
> The answer is lpeg.V and the table mode of lpeg.P.
>
> See http://www.inf.puc-rio.br/~roberto/lpeg/#grammar
>
>
How would I use it here and get rid of lpeg.Carg(1)?

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] How can I make a recursive pattern?

Sean Conner
In reply to this post by Soni "They/Them" L.
It was thus said that the Great Soni L. once stated:
>
>
> On 27/07/16 12:34 PM, Patrick Donnelly wrote:
> >Soni,
> >
> >On Wed, Jul 27, 2016 at 10:27 AM, Soni L. <[hidden email]> wrote:
> >>Sean Conner's incomplete solution uses lpeg.Carg(1) which doesn't work
> >>recursively.

  At the time, I wasn't sure of the output format and I had assumed (badly,
I see) that you wanted something like:

        {
          key = value
        }

where both key and value are specified in the text you are parsing.  Doing
that in LPeg isn't straightforward and thus, lpeg.Carg() was used to pass in
a resulting table.  Given the example, it's both easy and hard.  Easy,
because collecting results into an array is easy in LPeg:

        a = lpeg.P"a"
        r = lpeg.Ct(lpeg.C(a)^0)
        x = r:match "aaaaaaaa"

Even recursively:

        nested = lpeg.P {
                "start",
                start = lpeg.Ct((lpeg.V"a" + lpeg.V"list")^0),
                a     = lpeg.C(lpeg.P"a"),
                list  = lpeg.P"("
                      * lpeg.V"start"
                      * lpeg.P")",
        }

        r = lpeg.Ct(nested)
        x = r:match "aaa(aaaa)aaa(a)aaa"

  Hard, because it appears you want to start with index [0], which makes it
... an interesting problem (there doesn't appear to be an easy way to get to
the table that lpeg.Ct() creates, which is why this is an interesting
problem).  Perhaps some combination of lpeg.Cg() with lpeg.P(true) divided
by a function that returns a new table and lpeg.Cb() but it's something
you'll have to experiment with.

  Or you could relax the [0] index requirement ...

  -spc (You're in luck ... work is slow today ... )


Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] How can I make a recursive pattern?

Sean Conner
It was thus said that the Great Sean Conner once stated:
>
>   Hard, because it appears you want to start with index [0], which makes it
> ... an interesting problem

  Actually, it was easier than I thought---a close reading of the LPeg
documentation showed that

        lpeg.Ct(lpeg.Cg(pattern,0))

does return the patter in index 0 of the table.  So it's not as hard as I
thought it would be.

  -spc (learn something new every day ... )




Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] How can I make a recursive pattern?

Soni "They/Them" L.
In reply to this post by Sean Conner


On 27/07/16 02:15 PM, Sean Conner wrote:

> It was thus said that the Great Soni L. once stated:
>>
>> On 27/07/16 12:34 PM, Patrick Donnelly wrote:
>>> Soni,
>>>
>>> On Wed, Jul 27, 2016 at 10:27 AM, Soni L. <[hidden email]> wrote:
>>>> Sean Conner's incomplete solution uses lpeg.Carg(1) which doesn't work
>>>> recursively.
>    At the time, I wasn't sure of the output format and I had assumed (badly,
> I see) that you wanted something like:
>
> {
>  key = value
> }
>
> where both key and value are specified in the text you are parsing.  Doing
> that in LPeg isn't straightforward and thus, lpeg.Carg() was used to pass in
> a resulting table.  Given the example, it's both easy and hard.  Easy,
> because collecting results into an array is easy in LPeg:
>
> a = lpeg.P"a"
> r = lpeg.Ct(lpeg.C(a)^0)
> x = r:match "aaaaaaaa"
>
> Even recursively:
>
> nested = lpeg.P {
>        "start",
>        start = lpeg.Ct((lpeg.V"a" + lpeg.V"list")^0),
>        a     = lpeg.C(lpeg.P"a"),
>        list  = lpeg.P"("
>              * lpeg.V"start"
>              * lpeg.P")",
> }
>
> r = lpeg.Ct(nested)
> x = r:match "aaa(aaaa)aaa(a)aaa"
>
>    Hard, because it appears you want to start with index [0], which makes it
> ... an interesting problem (there doesn't appear to be an easy way to get to
> the table that lpeg.Ct() creates, which is why this is an interesting
> problem).  Perhaps some combination of lpeg.Cg() with lpeg.P(true) divided
> by a function that returns a new table and lpeg.Cb() but it's something
> you'll have to experiment with.
>
>    Or you could relax the [0] index requirement ...
>
>    -spc (You're in luck ... work is slow today ... )
>
>
The [0] is only for the tag name.

Also I'm currently looking into something like this:

lpeg.Cg(tag, 0) * lpeg.Cg(lpeg.Ct(lpeg.Cg(lpeg.Cb(0), "tag")), 0)

This puts the tag name in a table like {tag=<tagname>}. I just need to
make it only do that if there's a namespace.

Only issue is that the tag namespace could come right after the tag name
or after all attributes (or, at least in theory, even in the middle of
content...).

In other words, this:

#Tag
 > Content

####TagNS

is equivalent to this:

#Tag
####TagNS
 > Content

no matter the content. That is, if there are more tags in the content,
they cannot cause any interference with how this is parsed.

(*Technically*, doing that is undefined. For all you know, that document
could set your computer on fire. But, you know, w/e.)

No idea how to handle that.

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] How can I make a recursive pattern?

Soni "They/Them" L.
I decided to go with a mix of LPeg and plain Lua. You can see what it
looks like here: https://bitbucket.org/SoniEx2/mdxml

It was much easier this way (probably because I don't know LPeg :P).

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] How can I make a recursive pattern?

franktom
In reply to this post by Soni "They/Them" L.
On Wed, Jul 27, 2016 at 01:45:33PM -0300, Soni L. wrote:

>
>
> On 27/07/16 01:43 PM, William Ahern wrote:
> >On Wed, Jul 27, 2016 at 11:27:35AM -0300, Soni L. wrote:
> >>Sean Conner's incomplete solution uses lpeg.Carg(1) which doesn't work
> >>recursively.
> >>
> ><snip>
> >
> >The answer is lpeg.V and the table mode of lpeg.P.
> >
> >See http://www.inf.puc-rio.br/~roberto/lpeg/#grammar
> >
> >
> How would I use it here and get rid of lpeg.Carg(1)?
>
> --
> Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.
>
>
when I read your answer,That's goods,But
I think there a better resolver?

Reply | Threaded
Open this post in threaded view
|

Re: [LPeg] How can I make a recursive pattern?

franktom
In reply to this post by Soni "They/Them" L.
On Wed, Jul 27, 2016 at 01:45:33PM -0300, Soni L. wrote:

>
>
> On 27/07/16 01:43 PM, William Ahern wrote:
> >On Wed, Jul 27, 2016 at 11:27:35AM -0300, Soni L. wrote:
> >>Sean Conner's incomplete solution uses lpeg.Carg(1) which doesn't work
> >>recursively.
> >>
> ><snip>
> >
> >The answer is lpeg.V and the table mode of lpeg.P.
> >
> >See http://www.inf.puc-rio.br/~roberto/lpeg/#grammar
> >
> >
> How would I use it here and get rid of lpeg.Carg(1)?
>
> --
> Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.
>
>
this is a test from ...