regex question

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

regex question

John Belmonte-2
Hello,

I'm trying to make a regex that splits a string of elements delimited by
some character.  The following doesn't work.  I can think of other ways but
I'd like to know why this doesn't work, and also why strfind is returning an
apparently illegal result (start greater than end).

    > print(strfind("hello:there", "(.-)%:?"))
    1       0


-John



Reply | Threaded
Open this post in threaded view
|

Re: regex question

Reuben Thomas-4
> I'm trying to make a regex that splits a string of elements delimited by
> some character.  The following doesn't work.  I can think of other ways but
> I'd like to know why this doesn't work, and also why strfind is returning an
> apparently illegal result (start greater than end).
>
>     > print(strfind("hello:there", "(.-)%:?"))
>     1       0

This seemed like a nice brainteaser with which to start the day, so I had a
play. My conclusion is that this is a perfectly reasonable result:

The minimum way in which to match (.-) is by matching the empty string at
the start of "hello:there"; then, an optional : also matches the empty string.

The return value "1,0" indicates that the string found starts at position
one, and is of length zero (a return of 1,1 would indicate that "h" had been
matched).

Not what you want, therefore, but perfectly legal. How about

> print(strfind("hello:there" .. ":", "(.-)%:"))
1       6      hello

i.e. tack an extra delimiter on to the end to make sure that you do get one,
and make the delimiter mandatory. This negates the effect of the -
"overriding" the ?.

Still, this suggests that - isn't really minimal matching. Does this count
as a bug?

-- 
http://sc3d.org/rrt/ | free, a.  already paid for (Peyton Jones)


Reply | Threaded
Open this post in threaded view
|

Re: regex question

Reuben Thomas-4
> Still, this suggests that - isn't really minimal matching. Does this count
> as a bug?

...and I was doing so well! Ignore this last bit.

-- 
http://sc3d.org/rrt/ | Si hoc legere scis nimium eruditionis habes.


Reply | Threaded
Open this post in threaded view
|

Re: regex question

John Belmonte-2
In reply to this post by Reuben Thomas-4
It would have been nice if strfind was defined as returning (start, end+1)
instead of (start, end).  I'm always having to add the +1 in my code anyway.
Using (x, x-1) to mean zero-length is questionable in my opinion.

thanks,
-John


Reuben Thomas wrote:
> The return value "1,0" indicates that the string found starts at position
> one, and is of length zero (a return of 1,1 would indicate that "h" had
been
> matched).



Reply | Threaded
Open this post in threaded view
|

Re: regex question

Diego Nehab-3
Hi,

> It would have been nice if strfind was defined as returning (start, end+1)
> instead of (start, end).  I'm always having to add the +1 in my code anyway.
> Using (x, x-1) to mean zero-length is questionable in my opinion.
> thanks,
> -John

I prefer it the way it is. First, it is compatible with strsub. I believe
that, if it changed, the use of negative indexes (very usefull) would 
become less natural.

Besides, once one overcomes his C 0-based indexing tendency, it is 
pretty natural to use loops like:

  for (i=s; i<=e; i++)

instead of 

  for (i=s; i<e+0; i++)

>From that you can also see why the empty string start and end points make
sense.

Regards,
Diego.