LPeg: inconsistencies regarding empty captures

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

LPeg: inconsistencies regarding empty captures

Pierre-Yves Gérardy
/number skips captures with no values, whereas /string counts them. Example:

    > (Cg(Cc"foo", "z") * Cb"z" / 1):match"" --> "foo"

    > (Cg(Cc"foo", "z") * Cb"z" / "%1"):match""
    stdin:2: no values in capture index 1

    > (Cg(Cc"foo", "z") * Cb"z" / "%2"):match"" --> "foo"

In a similar vein, while Cf complains if the first capture is empty, a
Cg whose only captures are empty will behave as if there were no
captures in it, and produce his whole match...

    > Cg(Cc""/{}*1):match"R" --> "R"

but

    > Cf(Cc""/{}*C(1),print):match"R"
    stdin:1: no initial value for fold capture

It would be nice if all captures could behave identically. Personally,
I'd rather have them skip these empty captures in all circumstances...

-- Pierre-Yves

Reply | Threaded
Open this post in threaded view
|

Re: LPeg: inconsistencies regarding empty captures

Roberto Ierusalimschy
> /number skips captures with no values, whereas /string counts them. Example:
>
>     > (Cg(Cc"foo", "z") * Cb"z" / 1):match"" --> "foo"
>
>     > (Cg(Cc"foo", "z") * Cb"z" / "%1"):match""
>     stdin:2: no values in capture index 1
>
>     > (Cg(Cc"foo", "z") * Cb"z" / "%2"):match"" --> "foo"

This is really weird, although compatible with the documentation :). One
counts values, the other counts captures. The difference is not only for
captures with no values:

  ((m.Cg(m.Cc'a' * m.Cc'b') * m.Cg(m.Cc'c' * m.Cc'd')) / 2):match('')
    --> b

  ((m.Cg(m.Cc'a' * m.Cc'b') * m.Cg(m.Cc'c' * m.Cc'd')) / "%2"):match('')
    --> c


> In a similar vein, while Cf complains if the first capture is empty, a
> Cg whose only captures are empty will behave as if there were no
> captures in it, and produce his whole match...
>
>     > Cg(Cc""/{}*1):match"R" --> "R"
>
> but
>
>     > Cf(Cc""/{}*C(1),print):match"R"
>     stdin:1: no initial value for fold capture

This seems to be a different case. Cg works like other captures that
take a variable number of captures (e.g., function captures and
table captures): when there is no values, they use the entire capture as
a single value. Cf also works with a variable number of captures, but,
unlike the other captures, it handles its first value in a very special
way. So, it makes sense for it to complain when that value is missing.
Moreover, for Cg (and Ct and /function), the behavior of using the
entire match is quite convenient. For Cf, it seems useless.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: LPeg: inconsistencies regarding empty captures

Dirk Laurie-2
2013/5/8 Roberto Ierusalimschy <[hidden email]>:
>> /number skips captures with no values, whereas /string counts them.
>
> This is really weird, although compatible with the documentation :). One
> counts values, the other counts captures.

I'm really scared of saying something is a bug nowadays.
So I'll confine myself to asking whether the following behaviour
is reproducible on other systems.

$ lua -l lpeg
Lua 5.2.2  Copyright (C) 1994-2013 Lua.org, PUC-Rio
> =lpeg.version()
0.12
> local Cc=lpeg.Cc; return (Cc(nil)*Cc()/1):match""
Segmentation fault (core dumped)

Reply | Threaded
Open this post in threaded view
|

Re: LPeg: inconsistencies regarding empty captures

Roberto Ierusalimschy
> I'm really scared of saying something is a bug nowadays.
> So I'll confine myself to asking whether the following behaviour
> is reproducible on other systems.
>
> $ lua -l lpeg
> Lua 5.2.2  Copyright (C) 1994-2013 Lua.org, PUC-Rio
> > =lpeg.version()
> 0.12
> > local Cc=lpeg.Cc; return (Cc(nil)*Cc()/1):match""
> Segmentation fault (core dumped)

It seems to be a bug (but it only happens with Lua 5.2).

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: LPeg: inconsistencies regarding empty captures

Roberto Ierusalimschy
> > I'm really scared of saying something is a bug nowadays.
> > So I'll confine myself to asking whether the following behaviour
> > is reproducible on other systems.
> >
> > $ lua -l lpeg
> > Lua 5.2.2  Copyright (C) 1994-2013 Lua.org, PUC-Rio
> > > =lpeg.version()
> > 0.12
> > > local Cc=lpeg.Cc; return (Cc(nil)*Cc()/1):match""
> > Segmentation fault (core dumped)
>
> It seems to be a bug (but it only happens with Lua 5.2).

And it seems to happen with an even simpler example:

> print(lpeg.Cc(nil):match"")

The constant 'nil' does not create an 'uvalue' for the pattern, so
when LPeg tries to access it, there is no table there. In Lua 5.1,
all userdata have an 'fenv', so there is no bug.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: LPeg: inconsistencies regarding empty captures

Pierre-Yves Gérardy
In reply to this post by Roberto Ierusalimschy
On Wed, May 8, 2013 at 4:23 PM, Roberto Ierusalimschy
<[hidden email]> wrote:

>> /number skips captures with no values, whereas /string counts them. Example:
>>
>>     > (Cg(Cc"foo", "z") * Cb"z" / 1):match"" --> "foo"
>>
>>     > (Cg(Cc"foo", "z") * Cb"z" / "%1"):match""
>>     stdin:2: no values in capture index 1
>>
>>     > (Cg(Cc"foo", "z") * Cb"z" / "%2"):match"" --> "foo"
>
> This is really weird, although compatible with the documentation :).

The tests also rely on this behavior :).

> This seems to be a different case. Cg works like other captures that
> take a variable number of captures (e.g., function captures and
> table captures): when there is no values, they use the entire capture as
> a single value. Cf also works with a variable number of captures, but,
> unlike the other captures, it handles its first value in a very special
> way. So, it makes sense for it to complain when that value is missing.
> Moreover, for Cg (and Ct and /function), the behavior of using the
> entire match is quite convenient. For Cf, it seems useless.

In some cases, a subsequent capture may also be used as accumulator.
Actually, in some cases, the first value is not special.

function min(array) return fold(array,function(a,b) return a<b and a or b end)

I'd rather have fold behave like the others, if only for the
simplification of the documentation.
Something like this :

static int foldcap (CapState *cs) {
      int n;
      lua_State *L = cs->L;
      int idx = cs->cap->idx;
      if (isfullcap(cs->cap++) ||  /* no nested captures? */
-         isclosecap(cs->cap) ||  /* no nested captures (large subject)? */
+         isclosecap(cs->cap)      /* no nested captures (large subject)? */
-         (n = pushcapture(cs)) == 0)  /* nested captures with no values? */
        return luaL_error(L, "no initial value for fold capture");
+      do {
+        n = pushcapture(cs) /* get the first value */
+      } while (n == 0 && !isclosecap(cs->cap));
+      if (n == 0)
+        lua_pushlstring(cs->L, cs->cap->s, cs->cap->siz - 1);
+        /*? return luaL_error(L, "no value in fold capture"); */
      if (n > 1)
        lua_pop(L, n - 1);  /* leave only one result for accumulator */
      while (!isclosecap(cs->cap)) {
        lua_pushvalue(L, updatecache(cs, idx));  /* get folding function */
        lua_insert(L, -2);  /* put it before accumulator */
        n = pushcapture(cs);  /* get next capture's values */
        lua_call(L, n + 1, 1);  /* call folding function */
      }
      cs->cap++;  /* skip close entry */
      return 1;  /* only accumulator left on the stack */
    }

Remains the case where the first capture returns more than one value.
It could also make sense to apply the function in that case, rather
than to truncate the list.

If the user wanted to truncate the first capture, he could use /1.

-- Pierre-Yves