A bug in string.gmatch and string.gsub?

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: A bug in string.gmatch and string.gsub?

Dirk Laurie-2
2013/4/29 TNHarris <[hidden email]>:

> On Monday, April 29, 2013 03:10:53 PM Dirk Laurie wrote:
>> Besides `sed`, there's also Python:
>> >>> import re
>> >>> re.subn(re.compile("a*"),"ITEM",";a;")
>>
>> ('ITEM;ITEM;ITEM', 3)
>>
>> Many Lua users come from a background in which they know sed and Python.
>> That tends to influence what things they find intuitive.
>
> However, Perl and PHP do the same thing as Lua.

I know no PHP, but I'll grant you Perl. It's the best-documented
language I've ever come across. The exact point is discussed in its
reference manual, and it says:

   The higher-level loops preserve an additional state between iterations:
   whether the last match was zero-length.  To break the loop, the following
   match after a zero-length match is prohibited to have a length of zero.

I.e. in Perl, the way it is implemented in Lua is actually documented to be
the way it is. One can't argue with that.

But in Lua it is not so documented; it is no more part of Lua than the
pre-5.2.2 treatment of out-of-range index values to table.reduce and
table.insert was part of Lua. In Lua it is merely the way that strlib.c
happens to implement it. One can argue with that.

> And LPEG...
>> re = require"re"
>> print(re.gsub(";a;","'a'*","ITEM"))
> /home/tnharris/share/lua/5.2/re.lua:230: loop body may accept empty string

I don't think I can grant you LPEG. LPEG simply says a pattern inside
a loop is not allowed to match the empty string. That's already so at
the pure LPEG level: `(lpeg.P"a"^0)^2` is illegal. There is no gotcha
here.  Again, one can't argue with that.

Reply | Threaded
Open this post in threaded view
|

Re: A bug in string.gmatch and string.gsub?

Dirk Laurie-2
In reply to this post by Dirk Laurie-2
2013/4/29 Dirk Laurie <[hidden email]>:

> Still, if it is an implementation detail, it can be changed without
> changing the language. That possibility is demonstrated in the attached
> version of `lstrlib.c`.

The modified code does not actually in all cases do what it is supposed
to, and I do not understand the Lua source code well enough to fix the
mistake.

Moreover, I failed to change the header comments, or to change the
filename before attaching it. These are unpardonable sins.

I therefore unreservedly withdraw any implied support for this code
file. Please rename or destroy any copy that may have been made of it.

Reply | Threaded
Open this post in threaded view
|

Re: A bug in string.gmatch and string.gsub?

Tom N Harris
In reply to this post by Dirk Laurie-2
On Tuesday, April 30, 2013 02:32:58 AM Dirk Laurie wrote:
> I.e. in Perl, the way it is implemented in Lua is actually documented to be
> the way it is. One can't argue with that.
>
> But in Lua it is not so documented; it is no more part of Lua than the
> pre-5.2.2 treatment of out-of-range index values to table.reduce and
> table.insert was part of Lua. In Lua it is merely the way that strlib.c
> happens to implement it. One can argue with that.
>

Yes, I'm was pointing out that if you're going to apply the principle of least
surprise with respect to how regular expressions behave in other languages,
the majority opinion seems to be with Perl which Lua follows. I also checked
Node.JS and Ruby and they're the same.

On the other hand, Lua does not advertise regular expressions but just string
patterns. So breaking with tradition would not be unwarranted.

--
tom <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: A bug in string.gmatch and string.gsub?

Dirk Laurie-2
2013/4/30 TNHarris <[hidden email]>:

> Yes, I'm was pointing out that if you're going to apply the principle of least
> surprise with respect to how regular expressions behave in other languages,
> the majority opinion seems to be with Perl which Lua follows. I also checked
> Node.JS and Ruby and they're the same.

Not how regular expression behave, how iterated search/replace behaves.
The distinction is very clear in LPEG: iterated search/replace might loop
infinitely, thus it is not allowed if the pattern can match empty strings.
The Perl manual argues that "long experience has shown that many
programming tasks may be significantly simplified by using repeated
subexpressions that may match zero-length substrings" and therefore
"Perl allows such constructs, _by forcefully breaking the infinite loop._"

The Perl way is motivated by one consideration only: to avoid an infinite
loop when zero-length matches are possible.

When it comes to choosing between Perl and Python as a model, on the
basis of what is more intuitive and natural, I'd go for Python 99% of the time.

Dirk

-----

sub e {
    $D =~ s/(.*)U$/U$1/;
    $D =~ s/U(.)/$1U/;
    $D =~ s/(.*)V$/V$1/; $D =~ s/V(.)/$1V/;
    $D =~ s/(.*)V$/V$1/; $D =~ s/V(.)/$1V/;
    $D =~ s/(.*)([UV].*[UV])(.*)/$3$2$1/;
    $c=&v(53);
    $D =~ s/(.{$c})(.*)(.)/$2$1$3/;
    if ($k) {
        $D =~ s/(.{$k})(.*)(.)/$2$1$3/;
        return;
    }
    $c=&v(&v(0));
    $c>52?&e:$c;
}

12