LPeg question about substitution captures with group captures

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

LPeg question about substitution captures with group captures

Sean Conner

  I'm working on a personal project [1] and for some media types, I'm using
mailcap files to specify external programs to view media types not directly
supported by the program I'm writing.  So I have a mailcap file:

application/x-foo; foo -t %t %s
application/x-bar; bar -t %t

  This, I can parse [2].  The first field is the MIME type, followed by the
command to run, but there are substitutions that need to happen before the
command is run.  The '%t' is replaced by the MIME type, and the '%s' is
replaced by the file; if '%s' is *NOT* specified, then the data is piped in
via stdin. This is where I'm having an issue.  I would like to have LPeg do
the substitutions but the part I'm having trouble with is indicating if '%s'
was indeed, part of the command.  While I could check to see if '%s' exists
in the string before I do the substition, I'd prefer if I didn't have to.

  My current attempt:

lpeg = require "lpeg"

char = lpeg.P"%s" * lpeg.Carg(1) / "%1" * lpeg.Cg(lpeg.Cc(false),'redirect')
     + lpeg.P"%t" * lpeg.Carg(2) / "%1"
     + lpeg.R" ~"
cmd  = lpeg.Cg(lpeg.Cc(true),'redirect')
     * lpeg.Cs(char^1)
     * lpeg.Cb'redirect'

print(cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo"))
print(cmd:match("bar -t %t",   1,"/tmp/foo.bar","application/x-bar"))

  This outputs:

foo -t application/x-foo /tmp/bar.foo   true
bar -t application/x-bar        true

I'd like the output to be:

foo -t application/x-foo /tmp/bar.foo   false
bar -t application/x-bar        true

Now, lpeg.Cg() states:

        An anonymous group serves to join values from several captures into
        a single capture. A named group has a different behavior. In most
        situations, a named group returns no values at all. Its values are
        only relevant for a following back capture or when used inside a
        table capture.

and lpeg.Cs():

        Creates a substitution capture, which captures the substring of the
        subject that matches patt, with substitutions. For any capture
        inside patt with a value, the substring that matched the capture is
        replaced by the capture value (which should be a string). The final
        captured value is the string resulting from all replacements.

  I'm using a named group to track if I need redirection or not, and since a
named group does not return a value, it shouldn't affect the substitution
capture (and it doesn't).  But the group capture in the char expression
seems to be ignored.

  What's going on here?  Am I misunderstanding the documentation?

  -spc

[1] A gopher client for those curious.

[2] There's more to the format but I don't want to bog down the issue
        more than I have to, and as I said, parsing the mailcap file isn't
        the issue.

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question about substitution captures with group captures

Sean Conner
It was thus said that the Great Sean Conner once stated:

>
>   My current attempt:
>
> lpeg = require "lpeg"
>
> char = lpeg.P"%s" * lpeg.Carg(1) / "%1" * lpeg.Cg(lpeg.Cc(false),'redirect')
>      + lpeg.P"%t" * lpeg.Carg(2) / "%1"
>      + lpeg.R" ~"
> cmd  = lpeg.Cg(lpeg.Cc(true),'redirect')
>      * lpeg.Cs(char^1)
>      * lpeg.Cb'redirect'
>
> print(cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo"))
> print(cmd:match("bar -t %t",   1,"/tmp/foo.bar","application/x-bar"))
>
>   This outputs:
>
> foo -t application/x-foo /tmp/bar.foo   true
> bar -t application/x-bar        true
>
> I'd like the output to be:
>
> foo -t application/x-foo /tmp/bar.foo   false
> bar -t application/x-bar        true
>
>   I'm using a named group to track if I need redirection or not, and since a
> named group does not return a value, it shouldn't affect the substitution
> capture (and it doesn't).  But the group capture in the char expression
> seems to be ignored.
>
>   What's going on here?  Am I misunderstanding the documentation?

  I think I'm misunderstanding the documention.  lpeg.Cb() states:

        Creates a back capture. This pattern matches the empty string and
        produces the values produced by the most recent group capture named
        name (where name can be any Lua value).

        Most recent means the last complete outermost group capture with the
        given name. A Complete capture means that the entire pattern
        corresponding to the capture has matched. An Outermost capture means
        that the capture is not inside another complete capture.

        In the same way that LPeg does not specify when it evaluates
        captures, it does not specify whether it reuses values previously
        produced by the group or re-evaluates them.

  So even if I were to use lpeg.Cmt() to force evaluation of all nested
captures, I'm still not garenteed to get what I want (I think---I tried and
no, it still didn't work, but I would like to hear from Roberto if I'm
interpreting this correctly.

  -spc (I would still like to find an LPeg solution, but not hopeful ... )

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question about substitution captures with group captures

Andrew Gierth
In reply to this post by Sean Conner
>>>>> "Sean" == Sean Conner <[hidden email]> writes:

 Sean> I'd like the output to be:

 Sean> foo -t application/x-foo /tmp/bar.foo   false
 Sean> bar -t application/x-bar        true

My solution:

local lpeg = require "lpeg"

local P,R = lpeg.P, lpeg.R
local Carg,Cc,C,Cg,Ct,Cs = lpeg.Carg, lpeg.Cc, lpeg.C, lpeg.Cg, lpeg.Ct, lpeg.Cs

local char = P"%s" * Carg(1) * Cg(Cc(true),'noredirect')
           + P"%t" * Carg(2)
           + C(P"%")  -- I'm guessing a P"%%" * Cc"%" is missing here
           + C((R" ~" - P"%")^1)

local cmd  = Ct( char^1 )

t = cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo")
print(table.concat(t), not t.noredirect)
t = cmd:match("bar -t %t",   1,"/tmp/foo.bar","application/x-bar")
print(table.concat(t), not t.noredirect)

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question about substitution captures with group captures

Dirk Laurie-2
In reply to this post by Sean Conner
Op Sa. 8 Des. 2018 om 08:17 het Sean Conner <[hidden email]> geskryf:
>
> It was thus said that the Great Sean Conner once stated:

>   So even if I were to use lpeg.Cmt() to force evaluation of all nested
> captures, I'm still not garenteed to get what I want (I think---I tried and
> no, it still didn't work, but I would like to hear from Roberto if I'm
> interpreting this correctly.

1. Is this a question about what Cb, Cg and Cmt are supposed to do or
a challenge to achieve your task?
2. Are you aware that Lua without Lpeg can do that task effortlessly?

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question about substitution captures with group captures

Sean Conner
It was thus said that the Great Dirk Laurie once stated:

> Op Sa. 8 Des. 2018 om 08:17 het Sean Conner <[hidden email]> geskryf:
> >
> > It was thus said that the Great Sean Conner once stated:
>
> >   So even if I were to use lpeg.Cmt() to force evaluation of all nested
> > captures, I'm still not garenteed to get what I want (I think---I tried and
> > no, it still didn't work, but I would like to hear from Roberto if I'm
> > interpreting this correctly.
>
> 1. Is this a question about what Cb, Cg and Cmt are supposed to do or
> a challenge to achieve your task?

  Yes and maybe, respectively.

> 2. Are you aware that Lua without Lpeg can do that task effortlessly?

  Support for mailcap has to deal with the following examples:

        /usr/bin/foo %s %t %{opt}
        /usr/bin/foo %s %t
        /usr/bin/foo %t %{foo} %{bar}

"%s" is replaced with a filename if it exists; otherwise the data is to be
piped in via stdin.  The "%{tag}" stuff relates to content type options, as
explained in the man page:

        If the command field contains "%{" followed by a parameter name and
        a closing "}", then all those characters will be replaced by the
        value of the named parameter, if any, from the Content-type header.

  I was trying to keep things simple, which is why I didn't include that
bit.  I have found a way to get this to work using LPeg:

lpeg = require "lpeg"

char = lpeg.P"%s" * lpeg.Carg(1) * lpeg.Carg(2)
     / function(s,f) s.redirect = false return f end
     + lpeg.P"%t" * lpeg.Carg(3) / "%1"
     + lpeg.P"%{" * lpeg.Carg(4) * lpeg.C(lpeg.R"az"^1) * lpeg.P"}"
     / function(tab,opt)
         return tab[opt] or ""
       end
     + lpeg.R" ~"
cmd  = lpeg.Carg(1) / function(s) s.redirect = true end
     * lpeg.Cs(char^1) * lpeg.Carg(1)
     / function(c,s) return c,s.redirect end

print(cmd:match("foo -t %t %s",1,{},"/tmp/bar.foo","application/x-foo",{}))
print(cmd:match("bar -t %t %{opt} %{beta}",   1,{},"/tmp/foo.bar",
        "application/x-bar",{ opt='alpha' }))

  The first extra parameter is a table used for storing stateful infomration
(in this case, just a flag); the second parameter is the file, third is the
type and the final one is a list of options (the parsing of the name is
simplified for this example; the actual pattern includes both upper and
lower case letters, digits, and "-_.").  I can live with the stateful table
(I'm kind of 'meh' about it actually).

  Personally, I prefer LPeg as I find it easier to read than the Lua
patterns.  For instance, I found it easy to add support for the '%{opt}'
substitution and it's still one pass over the data.  Also, I'm already using
LPeg for parsing of URLs, parsing the mailcap file itself, gopher index
files, the mimetype value, sanitizing text strings [1] and converting HTML
entities to UTF-8 [2] so I'm already using it extensively.

  But hey, I'm interested in seeing an alternative ...

  -spc (Not sure I'll use it though ... )

[1] Basically, removing control codes and escape sequences from text
        files.  Dumping raw text from unknown sources to a terminal is
        downright dangerous!

[2] I have encountered gopher documents with HTML entities, since quite
        a bit of gopher content is mirrored from the web.  The LPeg code for
        that is quote short:

        local char = lpeg.P"&#" * lpeg.C(lpeg.R"09"^1)             * lpeg.P";" / utf8.char
                   + lpeg.P"&"  * lpeg.C(lpeg.R("az","AZ","09")^1) * lpeg.P";" / ENTITIES
                   + lpeg.P(1)
                   
        return lpeg.Cs(char^0)

        The ENTITIES tables, however, is a bit longer ...

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question about substitution captures with group captures

Dirk Laurie-2
Op Sa. 8 Des. 2018 om 12:37 het Sean Conner <[hidden email]> geskryf:

>   Personally, I prefer LPeg as I find it easier to read than the Lua
> patterns.

There is nothing that forces you to write a Lua pattern as a single
daunting string literal. You can compose it as table.concat{first,
second, third}, defining the components separately. As you would in a
decently written piece of Lpeg code.

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question about substitution captures with group captures

Sean Conner
In reply to this post by Andrew Gierth
It was thus said that the Great Andrew Gierth once stated:

> >>>>> "Sean" == Sean Conner <[hidden email]> writes:
>
>  Sean> I'd like the output to be:
>
>  Sean> foo -t application/x-foo /tmp/bar.foo   false
>  Sean> bar -t application/x-bar        true
>
> My solution:
>
> local lpeg = require "lpeg"
>
> local P,R = lpeg.P, lpeg.R
> local Carg,Cc,C,Cg,Ct,Cs = lpeg.Carg, lpeg.Cc, lpeg.C, lpeg.Cg, lpeg.Ct, lpeg.Cs
>
> local char = P"%s" * Carg(1) * Cg(Cc(true),'noredirect')
>            + P"%t" * Carg(2)
>            + C(P"%")  -- I'm guessing a P"%%" * Cc"%" is missing here
>            + C((R" ~" - P"%")^1)
>
> local cmd  = Ct( char^1 )
>
> t = cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo")
> print(table.concat(t), not t.noredirect)
> t = cmd:match("bar -t %t",   1,"/tmp/foo.bar","application/x-bar")
> print(table.concat(t), not t.noredirect)

  That's a nice solution and it lends itself to an easy way to support
redirection if required:

        t = cmd:match(cmdstring,1,filename,type)
        if not t.noredirect then
          table.insert(t,string.format("< %s",filename))
        end
        print(table.concat(t),not t.noredirect)

  The only change I'd make is to remove the double negation aspect of it.

        local char = P"%s" * Carg(1) * Cg(Cc(false),'redirect')
                   + ...
        local cmd = Ct(Cg(Cc(true),'redirect') * char^1)

  Unless that too, is technically undefined per the LPeg spec (I hope
not---I use that idiom [1] quite often).

  -spc (Because I was taught don't use no double negatives ... )

[1]     Of setting a table field to a defined value before parsing the rest
        of the string.

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question about substitution captures with group captures

Gabriel Bertilson
In reply to this post by Sean Conner
Cs is apparently a barrier that blocks outside access to all its
captures. They are only accessible to patterns inside of Cs, not those
outside. Minimal testcase:

local lpeg = require 'lpeg'
setmetatable(_ENV, {__index = lpeg})

local patt1 = Cg(Cc('test!'), 'not inside Cs') * Cb 'not inside Cs'
local patt2 = Cs(Cg(Cc('test!'), 'inside Cs')) * Cb 'inside Cs'

print(patt1:match '') -- no error
print(patt2:match '') -- error!

So you can't do a substitution over the whole "char" pattern and
access the capture named "redirect". But if Cs is put at a lower level
of the pattern, you can:

lpeg = require "lpeg"

char = lpeg.Cs(lpeg.P"%s" * lpeg.Carg(1) / "%1") *
lpeg.Cg(lpeg.Cc(false),'redirect')
     + lpeg.Cs(lpeg.P"%t" * lpeg.Carg(2) / "%1")
     + lpeg.C(lpeg.R" ~")
cmd  = lpeg.Cg(lpeg.Cc(true),'redirect')
     * lpeg.Cf(char^1, function (a, b) return a .. (b or "") end)
     * lpeg.Cb'redirect'

print(cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo"))
print(cmd:match("bar -t %t",   1,"/tmp/foo.bar","application/x-bar"))

It's a messy solution because it requires concatenating all the
captures from "char^1" (inefficient because it creates a bunch of
intermediate string objects!), I don't know if this is better or worse
than the solutions already posted.

— Gabriel

On Fri, Dec 7, 2018 at 11:55 PM Sean Conner <[hidden email]> wrote:

>
>
>   I'm working on a personal project [1] and for some media types, I'm using
> mailcap files to specify external programs to view media types not directly
> supported by the program I'm writing.  So I have a mailcap file:
>
> application/x-foo; foo -t %t %s
> application/x-bar; bar -t %t
>
>   This, I can parse [2].  The first field is the MIME type, followed by the
> command to run, but there are substitutions that need to happen before the
> command is run.  The '%t' is replaced by the MIME type, and the '%s' is
> replaced by the file; if '%s' is *NOT* specified, then the data is piped in
> via stdin. This is where I'm having an issue.  I would like to have LPeg do
> the substitutions but the part I'm having trouble with is indicating if '%s'
> was indeed, part of the command.  While I could check to see if '%s' exists
> in the string before I do the substition, I'd prefer if I didn't have to.
>
>   My current attempt:
>
> lpeg = require "lpeg"
>
> char = lpeg.P"%s" * lpeg.Carg(1) / "%1" * lpeg.Cg(lpeg.Cc(false),'redirect')
>      + lpeg.P"%t" * lpeg.Carg(2) / "%1"
>      + lpeg.R" ~"
> cmd  = lpeg.Cg(lpeg.Cc(true),'redirect')
>      * lpeg.Cs(char^1)
>      * lpeg.Cb'redirect'
>
> print(cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo"))
> print(cmd:match("bar -t %t",   1,"/tmp/foo.bar","application/x-bar"))
>
>   This outputs:
>
> foo -t application/x-foo /tmp/bar.foo   true
> bar -t application/x-bar        true
>
> I'd like the output to be:
>
> foo -t application/x-foo /tmp/bar.foo   false
> bar -t application/x-bar        true
>
> Now, lpeg.Cg() states:
>
>         An anonymous group serves to join values from several captures into
>         a single capture. A named group has a different behavior. In most
>         situations, a named group returns no values at all. Its values are
>         only relevant for a following back capture or when used inside a
>         table capture.
>
> and lpeg.Cs():
>
>         Creates a substitution capture, which captures the substring of the
>         subject that matches patt, with substitutions. For any capture
>         inside patt with a value, the substring that matched the capture is
>         replaced by the capture value (which should be a string). The final
>         captured value is the string resulting from all replacements.
>
>   I'm using a named group to track if I need redirection or not, and since a
> named group does not return a value, it shouldn't affect the substitution
> capture (and it doesn't).  But the group capture in the char expression
> seems to be ignored.
>
>   What's going on here?  Am I misunderstanding the documentation?
>
>   -spc
>
> [1]     A gopher client for those curious.
>
> [2]     There's more to the format but I don't want to bog down the issue
>         more than I have to, and as I said, parsing the mailcap file isn't
>         the issue.
>

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question about substitution captures with group captures

Gabriel Bertilson
Never mind, this doesn't work after all. cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo") returns "foo -t application/x-foo /tmp/bar.foo", true.

— Gabriel



On Sat, Dec 8, 2018 at 4:34 PM Gabriel Bertilson <[hidden email]> wrote:
Cs is apparently a barrier that blocks outside access to all its
captures. They are only accessible to patterns inside of Cs, not those
outside. Minimal testcase:

local lpeg = require 'lpeg'
setmetatable(_ENV, {__index = lpeg})

local patt1 = Cg(Cc('test!'), 'not inside Cs') * Cb 'not inside Cs'
local patt2 = Cs(Cg(Cc('test!'), 'inside Cs')) * Cb 'inside Cs'

print(patt1:match '') -- no error
print(patt2:match '') -- error!

So you can't do a substitution over the whole "char" pattern and
access the capture named "redirect". But if Cs is put at a lower level
of the pattern, you can:

lpeg = require "lpeg"

char = lpeg.Cs(lpeg.P"%s" * lpeg.Carg(1) / "%1") *
lpeg.Cg(lpeg.Cc(false),'redirect')
     + lpeg.Cs(lpeg.P"%t" * lpeg.Carg(2) / "%1")
     + lpeg.C(lpeg.R" ~")
cmd  = lpeg.Cg(lpeg.Cc(true),'redirect')
     * lpeg.Cf(char^1, function (a, b) return a .. (b or "") end)
     * lpeg.Cb'redirect'

print(cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo"))
print(cmd:match("bar -t %t",   1,"/tmp/foo.bar","application/x-bar"))

It's a messy solution because it requires concatenating all the
captures from "char^1" (inefficient because it creates a bunch of
intermediate string objects!), I don't know if this is better or worse
than the solutions already posted.

— Gabriel

On Fri, Dec 7, 2018 at 11:55 PM Sean Conner <[hidden email]> wrote:
>
>
>   I'm working on a personal project [1] and for some media types, I'm using
> mailcap files to specify external programs to view media types not directly
> supported by the program I'm writing.  So I have a mailcap file:
>
> application/x-foo; foo -t %t %s
> application/x-bar; bar -t %t
>
>   This, I can parse [2].  The first field is the MIME type, followed by the
> command to run, but there are substitutions that need to happen before the
> command is run.  The '%t' is replaced by the MIME type, and the '%s' is
> replaced by the file; if '%s' is *NOT* specified, then the data is piped in
> via stdin. This is where I'm having an issue.  I would like to have LPeg do
> the substitutions but the part I'm having trouble with is indicating if '%s'
> was indeed, part of the command.  While I could check to see if '%s' exists
> in the string before I do the substition, I'd prefer if I didn't have to.
>
>   My current attempt:
>
> lpeg = require "lpeg"
>
> char = lpeg.P"%s" * lpeg.Carg(1) / "%1" * lpeg.Cg(lpeg.Cc(false),'redirect')
>      + lpeg.P"%t" * lpeg.Carg(2) / "%1"
>      + lpeg.R" ~"
> cmd  = lpeg.Cg(lpeg.Cc(true),'redirect')
>      * lpeg.Cs(char^1)
>      * lpeg.Cb'redirect'
>
> print(cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo"))
> print(cmd:match("bar -t %t",   1,"/tmp/foo.bar","application/x-bar"))
>
>   This outputs:
>
> foo -t application/x-foo /tmp/bar.foo   true
> bar -t application/x-bar        true
>
> I'd like the output to be:
>
> foo -t application/x-foo /tmp/bar.foo   false
> bar -t application/x-bar        true
>
> Now, lpeg.Cg() states:
>
>         An anonymous group serves to join values from several captures into
>         a single capture. A named group has a different behavior. In most
>         situations, a named group returns no values at all. Its values are
>         only relevant for a following back capture or when used inside a
>         table capture.
>
> and lpeg.Cs():
>
>         Creates a substitution capture, which captures the substring of the
>         subject that matches patt, with substitutions. For any capture
>         inside patt with a value, the substring that matched the capture is
>         replaced by the capture value (which should be a string). The final
>         captured value is the string resulting from all replacements.
>
>   I'm using a named group to track if I need redirection or not, and since a
> named group does not return a value, it shouldn't affect the substitution
> capture (and it doesn't).  But the group capture in the char expression
> seems to be ignored.
>
>   What's going on here?  Am I misunderstanding the documentation?
>
>   -spc
>
> [1]     A gopher client for those curious.
>
> [2]     There's more to the format but I don't want to bog down the issue
>         more than I have to, and as I said, parsing the mailcap file isn't
>         the issue.
>
Reply | Threaded
Open this post in threaded view
|

Re: LPeg question about substitution captures with group captures

Andrew Gierth
In reply to this post by Sean Conner
>>>>> "Sean" == Sean Conner <[hidden email]> writes:

 Sean> The only change I'd make is to remove the double negation aspect
 Sean> of it.

You could do that by just renaming the table field to 'filename_used'
or something of that ilk.

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question about substitution captures with group captures

Roberto Ierusalimschy
In reply to this post by Sean Conner
> [...]
>
>   So even if I were to use lpeg.Cmt() to force evaluation of all nested
> captures, I'm still not garenteed to get what I want (I think---I tried and
> no, it still didn't work, but I would like to hear from Roberto if I'm
> interpreting this correctly.

I am not sure I understood exactly what is your question.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question about substitution captures with group captures

Sean Conner
It was thus said that the Great Roberto Ierusalimschy once stated:
> > [...]
> >
> >   So even if I were to use lpeg.Cmt() to force evaluation of all nested
> > captures, I'm still not garenteed to get what I want (I think---I tried and
> > no, it still didn't work, but I would like to hear from Roberto if I'm
> > interpreting this correctly.
>
> I am not sure I understood exactly what is your question.

  The question is the exepected behavior of lpeg.Cg() in the following code:

lpeg = require "lpeg"

char = lpeg.P"%s" * lpeg.Carg(1) / "%1" * lpeg.Cg(lpeg.Cc(false),'redirect')
     + lpeg.P"%t" * lpeg.Carg(2) / "%1"
     + lpeg.R" ~"
cmd  = lpeg.Cg(lpeg.Cc(true),'redirect')
     * lpeg.Cs(char^1)
     * lpeg.Cb'redirect'

print(cmd:match("foo -t %t %s",1,"/tmp/bar.foo","application/x-foo"))

  I was expecting the match to return false as the second capture, and it
was as if the call to lpeg.Cg() in the char expression was being dropped.
The description of lpeg.Cb():

        Creates a back capture. This pattern matches the empty string and
        produces the values produced by the most recent group capture named
        name (where name can be any Lua value).

        Most recent means the last complete outermost group capture with the
        given name. A Complete capture means that the entire pattern
        corresponding to the capture has matched. An Outermost capture means
        that the capture is not inside another complete capture.

        In the same way that LPeg does not specify when it evaluates
        captures, it does not specify whether it reuses values previously
        produced by the group or re-evaluates them.

seems to indicate that indeed, the use of lpeg.Cg() within the context of
lpeg.Cs() means it is ignored when using lpeg.Cb() to retrieve the value (if
I read everything right), and that even using lpeg.Cmt() (which forces
evaluations of all captures at that time) won't work either.

  -spc

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question about substitution captures with group captures

Roberto Ierusalimschy
> The description of lpeg.Cb():
>
>         Creates a back capture. This pattern matches the empty string and
>         produces the values produced by the most recent group capture named
>         name (where name can be any Lua value).
>
>         Most recent means the last complete outermost group capture with the
>         given name. A Complete capture means that the entire pattern
>         corresponding to the capture has matched. An Outermost capture means
>         that the capture is not inside another complete capture.
>
>         In the same way that LPeg does not specify when it evaluates
>         captures, it does not specify whether it reuses values previously
>         produced by the group or re-evaluates them.
>
> seems to indicate that indeed, the use of lpeg.Cg() within the context of
> lpeg.Cs() means it is ignored when using lpeg.Cb() to retrieve the value (if
> I read everything right), and that even using lpeg.Cmt() (which forces
> evaluations of all captures at that time) won't work either.

If I understand you correctly, I guess the answer to your question
is the definition of Outermost capture:

    An Outermost capture means that the capture is not inside another
    complete capture.

So, if a group capture is inside lpeg.Cs() (or any other capture!), it is
not an outermost capture. Therefore, it is not considered as "the most
recent group capture named name".

(Informally, Cb and Cg must be sibblings; Cb cannot be an uncle of Cg.)

-- Roberto