gsub bug? 2 results from anchored gsub

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

gsub bug? 2 results from anchored gsub

Daurnimator
I want to ensure that a string always ends in a single "/".
If it has more than one, the extras should be removed
If it has none, a "/" should be appended.

"/*$" should match all the '/' at the end of the string, and replace
them with a single "/".
I got an unexpected result:

    > ("d//"):gsub("/*$", "/")
    d// 2

This result suggests that there is an empty string being matched
between the last "/" and the end of the string.
It's matching the // and replacing that with "/"; but then it gets
confused and matches the empty string at the end, and ends up
inserting an extra /
Using 'print' as the match confirms:

    > ("d//"):gsub("/*$", print)
    //

    d// 2

Is this a bug in string.gsub?
It seems odd to me that you could get 2 replacements for an anchored match.
Though as far as I can see, a strict reading of the manual doesn't disallow it.


Daurn.

Reply | Threaded
Open this post in threaded view
|

Re: gsub bug? 2 results from anchored gsub

Soni "They/Them" L.


On 26/07/15 11:40 PM, Daurnimator wrote:

> I want to ensure that a string always ends in a single "/".
> If it has more than one, the extras should be removed
> If it has none, a "/" should be appended.
>
> "/*$" should match all the '/' at the end of the string, and replace
> them with a single "/".
> I got an unexpected result:
>
>      > ("d//"):gsub("/*$", "/")
>      d// 2
>
> This result suggests that there is an empty string being matched
> between the last "/" and the end of the string.
> It's matching the // and replacing that with "/"; but then it gets
> confused and matches the empty string at the end, and ends up
> inserting an extra /
> Using 'print' as the match confirms:
>
>      > ("d//"):gsub("/*$", print)
>      //
>
>      d// 2
>
> Is this a bug in string.gsub?
> It seems odd to me that you could get 2 replacements for an anchored match.
> Though as far as I can see, a strict reading of the manual doesn't disallow it.
>
>
> Daurn.
>
$ doesn't consume the end of the string?

You'll probably find this issue in most pattern matchers?

--
Disclaimer: these emails are public and can be accessed from <TODO: get a non-DHCP IP and put it here>. If you do not agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: gsub bug? 2 results from anchored gsub

Sean Conner
In reply to this post by Daurnimator
It was thus said that the Great Daurnimator once stated:

> I want to ensure that a string always ends in a single "/".
> If it has more than one, the extras should be removed
> If it has none, a "/" should be appended.
>
> "/*$" should match all the '/' at the end of the string, and replace
> them with a single "/".
> I got an unexpected result:
>
>     > ("d//"):gsub("/*$", "/")
>     d// 2
>
> This result suggests that there is an empty string being matched
> between the last "/" and the end of the string.
> It's matching the // and replacing that with "/"; but then it gets
> confused and matches the empty string at the end, and ends up
> inserting an extra /
> Using 'print' as the match confirms:
>
>     > ("d//"):gsub("/*$", print)
>     //
>
>     d// 2
>
> Is this a bug in string.gsub?

  No.  You're telling Lua that you want to match zero or more '/' followed
by the end of string.  Going through "d//", it first finds "//", which is
zero or more '/', and replaces it with a slash.  It then finds "END OF LINE"
[1], which is zero or more '/', and replaces it with a slash.  It's working
as intended.

  -spc (You may have to switch to LPEG ... )

[1] Thus sayeth the Master Control Program.

Reply | Threaded
Open this post in threaded view
|

Re: gsub bug? 2 results from anchored gsub

Tim Hill

> On Jul 27, 2015, at 12:28 AM, Sean Conner <[hidden email]> wrote:
>
>>
>>> ("d//"):gsub("/*$", "/")
>>    d// 2
>>
>> This result suggests that there is an empty string being matched
>> between the last "/" and the end of the string.
>> It's matching the // and replacing that with "/"; but then it gets
>> confused and matches the empty string at the end, and ends up
>> inserting an extra /
>> Using 'print' as the match confirms:
>>
>>> ("d//"):gsub("/*$", print)
>>    //
>>
>>    d// 2
>>
>> Is this a bug in string.gsub?
>
>  No.  You're telling Lua that you want to match zero or more '/' followed
> by the end of string.  Going through "d//", it first finds "//", which is
> zero or more '/', and replaces it with a slash.  It then finds "END OF LINE"
> [1], which is zero or more '/', and replaces it with a slash.  It's working
> as intended.
>
>  -spc (You may have to switch to LPEG ... )
>
> [1] Thus sayeth the Master Control Program.
>

Hmm .. my vote goes with the OP. Matches are greedy so the first match should be on “//“ AND the end of the string. I found this interesting:

(“d//“):gsub(“/+$”, “/“)
        d/ 1

—Tim



Reply | Threaded
Open this post in threaded view
|

Re: gsub bug? 2 results from anchored gsub

Dirk Laurie-2
2015-07-27 10:52 GMT+02:00 Tim Hill <[hidden email]>:

>
>> On Jul 27, 2015, at 12:28 AM, Sean Conner <[hidden email]> wrote:
>>
>>>
>>>> ("d//"):gsub("/*$", "/")
>>>    d// 2
>>>
>>> This result suggests that there is an empty string being matched
>>> between the last "/" and the end of the string.
>>> It's matching the // and replacing that with "/"; but then it gets
>>> confused and matches the empty string at the end, and ends up
>>> inserting an extra /
>>> Using 'print' as the match confirms:
>>>
>>>> ("d//"):gsub("/*$", print)
>>>    //
>>>
>>>    d// 2
>>>
>>> Is this a bug in string.gsub?
>>
>>  No.  You're telling Lua that you want to match zero or more '/' followed
>> by the end of string.  Going through "d//", it first finds "//", which is
>> zero or more '/', and replaces it with a slash.  It then finds "END OF LINE"
>> [1], which is zero or more '/', and replaces it with a slash.  It's working
>> as intended.
>>
>>  -spc (You may have to switch to LPEG ... )
>>
>> [1]   Thus sayeth the Master Control Program.
>>
>

I raised a similar point some time ago [1,2]. Roberto made three
contributions to the thread (the quoted lines are me arguing futilely):

~~~
Please stop calling "bug" something that does not behave as you
wanted or imagined.
~~~
I may be wrong, but it seems that the two rules can be stated like that:

1) Do not match two empty strings in the same position. (current Lua rule)

2) Do not match an empty string in the same position of another match
(not necessarily empty). (sed rule)

Is rule 2 really more intuitive in general or it just happen to do what
you want in this particular case?
~~~
| It has the advantage of making `split` trivial instead of requiring the
| sort of thing that takes the Lua Wiki 300 lines to explain.

True, but that does not make it more intuitive; it makes it more useful
in one particular case. Are there other scenarios where it is more (or
less) useful?
~~~

So I can't see him agreeing this time round either.

> Hmm .. my vote goes with the OP. Matches are greedy so the
> first match should be on “//“ AND the end of the string. I found
> this interesting:
>
> (“d//“):gsub(“/+$”, “/“)
>         d/ 1

That's what the OP should have written. Matches involving *
are almost always buggy. If you modify the OP's example by
removing the $, the mistake becomes glaringly obvious.

> ("d//"):gsub("/*", "/")
/d//    3

[1] http://lua-users.org/lists/lua-l/2013-04/msg00812.html

Reply | Threaded
Open this post in threaded view
|

Re: gsub bug? 2 results from anchored gsub

Matthew Wild
In reply to this post by Daurnimator
On 27 July 2015 at 03:40, Daurnimator <[hidden email]> wrote:

> I want to ensure that a string always ends in a single "/".
> If it has more than one, the extras should be removed
> If it has none, a "/" should be appended.
>
> "/*$" should match all the '/' at the end of the string, and replace
> them with a single "/".
> I got an unexpected result:
>
>     > ("d//"):gsub("/*$", "/")
>     d// 2

FWIW because of this my approach for this (e.g. in path/URL
normalization) is generally s:gsub("/+$", "").."/"

Regards,
Matthew

Reply | Threaded
Open this post in threaded view
|

Re: gsub bug? 2 results from anchored gsub

Parke
In reply to this post by Sean Conner
On Mon, Jul 27, 2015 at 12:28 AM, Sean Conner <[hidden email]> wrote:

> (You may have to switch to LPEG ... )

Perhaps not.

print ( ('d//'):gsub ( '([^/])/*$', '%1/' ) )

print ( ('d//'):gsub ( '/*$', '/', 1 ) )

-Parke

Reply | Threaded
Open this post in threaded view
|

Re: gsub bug? 2 results from anchored gsub

Soni "They/Them" L.
In reply to this post by Soni "They/Them" L.


On 27/07/15 12:05 AM, Soni L. wrote:

>
>
> On 26/07/15 11:40 PM, Daurnimator wrote:
>> I want to ensure that a string always ends in a single "/".
>> If it has more than one, the extras should be removed
>> If it has none, a "/" should be appended.
>>
>> "/*$" should match all the '/' at the end of the string, and replace
>> them with a single "/".
>> I got an unexpected result:
>>
>>      > ("d//"):gsub("/*$", "/")
>>      d// 2
>>
>> This result suggests that there is an empty string being matched
>> between the last "/" and the end of the string.
>> It's matching the // and replacing that with "/"; but then it gets
>> confused and matches the empty string at the end, and ends up
>> inserting an extra /
>> Using 'print' as the match confirms:
>>
>>      > ("d//"):gsub("/*$", print)
>>      //
>>
>>      d// 2
>>
>> Is this a bug in string.gsub?
>> It seems odd to me that you could get 2 replacements for an anchored
>> match.
>> Though as far as I can see, a strict reading of the manual doesn't
>> disallow it.
>>
>>
>> Daurn.
>>
> $ doesn't consume the end of the string?
>
> You'll probably find this issue in most pattern matchers?
>
I've been writing a pattern matcher lately, so let's look at what it'd
do (a bit simplified to be easier to read):

Pattern: /*$ ->
Root[GreedyZeroOrMore["/"], EndOfString]

Matcher:

Cursor position: 0
d//
^
Matched /*, cursor position: 0
d//
^
Doesn't match $. Put char on buffer, increment cursor position and repeat.

Cursor position: 1
d//
  ^
Matched /*, cursor position: 3
d//
    ^
Matched $, cursor position: 3
End of pattern, put replacement on buffer (in this case "/"). Repeat.

Cursor position: 3
d//
    ^
Matched /*, cursor position: 3
d//
    ^
Matched $, cursor position: 3
End of pattern, put replacement on buffer (in this case "/"). Start
cursor position == end cursor position, so advance cursor.

Cursor position: 4
d//
     ^
End of string, return buffer.

So you end up with 2 matches and "d//".

--
Disclaimer: these emails are public and can be accessed from <TODO: get a non-DHCP IP and put it here>. If you do not agree with this, DO NOT REPLY.