possible doc/behavior error of patterns

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

possible doc/behavior error of patterns

szbnwer@gmail.com
hi folks! :)


i get:
```
hippi@vas:~$ lua5.3
Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> print(('aaa x bbb'):match'(.-) x (.*)')
aaa bbb
```
(lj 2.0.5 does this as well)

i would expect that the 1st capture should be empty without a `^` at
the beginning of the pattern, as the manuals (up to lua 5.4) say:
"a single character class followed by '-', which also matches zero or
more repetitions of characters in the class. Unlike '*', these
repetition items will always match the shortest possible sequence;"

am i wrong here?

(((actually it came from here:
https://github.com/moonjit/moonjit/blob/v2.1/src/jit/p.lua#L88 )))


thx for any info and all the bests to all of u! :)

Reply | Threaded
Open this post in threaded view
|

Re: possible doc/behavior error of patterns

Roberto Ierusalimschy
> ```
> hippi@vas:~$ lua5.3
> Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> > print(('aaa x bbb'):match'(.-) x (.*)')
> aaa bbb
> ```
> (lj 2.0.5 does this as well)
>
> i would expect that the 1st capture should be empty without a `^` at
> the beginning of the pattern, as the manuals (up to lua 5.4) say:
> "a single character class followed by '-', which also matches zero or
> more repetitions of characters in the class. Unlike '*', these
> repetition items will always match the shortest possible sequence;"

string.match looks for the *first* match. The resulting match starts
at position 1, while a match with an empty capture (no 'a's) starts at
position 4; therefore it is not the first one.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: possible doc/behavior error of patterns

Jonathan Goble
In reply to this post by szbnwer@gmail.com
On Fri, Nov 22, 2019 at 5:06 AM [hidden email] <[hidden email]> wrote:

>
> hi folks! :)
>
>
> i get:
> ```
> hippi@vas:~$ lua5.3
> Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> > print(('aaa x bbb'):match'(.-) x (.*)')
> aaa     bbb
> ```
> (lj 2.0.5 does this as well)
>
> i would expect that the 1st capture should be empty without a `^` at
> the beginning of the pattern, as the manuals (up to lua 5.4) say:
> "a single character class followed by '-', which also matches zero or
> more repetitions of characters in the class. Unlike '*', these
> repetition items will always match the shortest possible sequence;"
>
> am i wrong here?

Regardless of shortest versus longest match, Lua always attempts
matches from left to right. In this case it first attempts a match
from the start of the string and succeeds, so it does not try any
other starting points. So the match returned will always begin with
the leftmost character that can start a match, and this takes priority
over shortest/longest.

Reply | Threaded
Open this post in threaded view
|

Re: possible doc/behavior error of patterns

Philippe Verdy
In reply to this post by Roberto Ierusalimschy
yes but the construct (.-) can still match more characters as needed to satisfy the first required space.
All the leading repeated 'a' are then part of the match of (.-).
The difference with (.*) occurs when (.-) is followed by another repeated subpattern and there's an ambiguity about which of the two should capture the content: with '.-' the repetition stops as soon as the following subpattern start matching, but with (.*) if the following starts matching but then fails later, there will be no rollback in '.-' to try eating another repetition that could potentially be eaten by what follows.
This makes '-' much faster than '*' in many patterns. But both will still match one or more characters ('*' is greedy and attempts to match the longest then will try matching the rest: if it fails, it will get backward to retry with less matches; the other '-' is not, so instead, when there's a space here it attempts to match that space until the full regexp is matched successfully and if it matches, then '-' will not get backward; badically '-' is used to match left context, '*' for the right context; when '-' is not follwoed by any repeated subpattern, both '-' and '*' are equivalent)
Here the unconditional subpattern to match is " x ", it is not repeated, so (.-) or (.*) before it are equivalent. for the input "aaa x bbb".

You would see however a difference with the input "aaa x bbb x ccc":
- with '(.-) x (.*)', the first capture would be 'aaa' and the second one would be 'bbb x ccc'
- with '(.*) x (.*)', the first capture would be 'aaa x bbb' and the second one would be 'ccc'


Le ven. 22 nov. 2019 à 18:27, Roberto Ierusalimschy <[hidden email]> a écrit :
> ```
> hippi@vas:~$ lua5.3
> Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> > print(('aaa x bbb'):match'(.-) x (.*)')
> aaa   bbb
> ```
> (lj 2.0.5 does this as well)
>
> i would expect that the 1st capture should be empty without a `^` at
> the beginning of the pattern, as the manuals (up to lua 5.4) say:
> "a single character class followed by '-', which also matches zero or
> more repetitions of characters in the class. Unlike '*', these
> repetition items will always match the shortest possible sequence;"

string.match looks for the *first* match. The resulting match starts
at position 1, while a match with an empty capture (no 'a's) starts at
position 4; therefore it is not the first one.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: possible doc/behavior error of patterns

szbnwer@gmail.com
In reply to this post by Roberto Ierusalimschy
ahh, fine, thx, clearly got it! :D

bests! :)