Pattern matching proposal: %B to match balanced string with specified escape

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Pattern matching proposal: %B to match balanced string with specified escape

Jonathan Goble
I find that %b() is a great feature of Lua, but it lacks one crucial
feature: handling the case where one or both of the args may appear in
the string prefixed by an escape character, and thus shouldn't be
counted. One solution to this is gsub'ing the escape sequences and
then matching, but this can be inefficient when dealing with very
large strings.

My suggestion is to add the token %B, which would perform the same
thing as %b, except it would recognize an escape character specified
in the arguments to the token, and upon encountering the escape
character, the following character would be ignored. I doubt it would
handle all use cases, but it almost certainly would handle a large
percentage.

This would be very easy to implement; in fact, I just did so this
evening with zero experience coding in C. [1] (I've confirmed that
this compiles in Visual Studio 2015 and runs on Windows 10, and that
the new feature works, but it has not otherwise been tested, including
for regressions).

The format in that commit is "%B123", where 1 is the beginning
character, 3 is the ending character, and 2 is the escape character. A
simple example in an interactive session:

> str1 = "test(testing^(test...)hello)end"
> str1:match"%b()"
(testing^(test...)hello)
> str1:match"%B(^)"
(testing^(test...)
> str2 = [[test(testing\(test...)hello)end]]
> str2:match[[%B(\)]]
(testing\(test...)

What do people think about this? Is this a good idea, bad idea,
somewhere in between? I'm especially interested in hearing Roberto's
opinion.

[1] https://github.com/jcgoble3/lua-testing/commit/ee90b07d7a8c8900e0a1c29cced1b3bf4576e8fc

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Dirk Laurie-2
2016-01-05 6:41 GMT+02:00 Jonathan Goble <[hidden email]>:

> My suggestion is to add the token %B, which would perform the same
> thing as %b, except it would recognize an escape character specified
> in the arguments to the token, and upon encountering the escape
> character, the following character would be ignored.

At present %B is a synonym for B, and I doubt that anyone relies
on that behaviour. So I'm +1.

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Rena
In reply to this post by Jonathan Goble
On Mon, Jan 4, 2016 at 11:41 PM, Jonathan Goble <[hidden email]> wrote:
I find that %b() is a great feature of Lua, but it lacks one crucial
feature: handling the case where one or both of the args may appear in
the string prefixed by an escape character, and thus shouldn't be
counted. One solution to this is gsub'ing the escape sequences and
then matching, but this can be inefficient when dealing with very
large strings.

My suggestion is to add the token %B, which would perform the same
thing as %b, except it would recognize an escape character specified
in the arguments to the token, and upon encountering the escape
character, the following character would be ignored. I doubt it would
handle all use cases, but it almost certainly would handle a large
percentage.

This would be very easy to implement; in fact, I just did so this
evening with zero experience coding in C. [1] (I've confirmed that
this compiles in Visual Studio 2015 and runs on Windows 10, and that
the new feature works, but it has not otherwise been tested, including
for regressions).

The format in that commit is "%B123", where 1 is the beginning
character, 3 is the ending character, and 2 is the escape character. A
simple example in an interactive session:

> str1 = "test(testing^(test...)hello)end"
> str1:match"%b()"
(testing^(test...)hello)
> str1:match"%B(^)"
(testing^(test...)
> str2 = [[test(testing\(test...)hello)end]]
> str2:match[[%B(\)]]
(testing\(test...)

What do people think about this? Is this a good idea, bad idea,
somewhere in between? I'm especially interested in hearing Roberto's
opinion.

[1] https://github.com/jcgoble3/lua-testing/commit/ee90b07d7a8c8900e0a1c29cced1b3bf4576e8fc


I like this idea.

I also wonder about using %b with the same character twice, like: %b"" - the manual says it expects two different characters, but it'd be nice if we could do this.

--
Sent from my Game Boy.
Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Dirk Laurie-2
2016-01-05 9:50 GMT+02:00 Rena <[hidden email]>:

> I also wonder about using %b with the same character twice, like: %b"" - the
> manual says it expects two different characters, but it'd be nice if we
> could do this.

I don't know why the manual says that. It works perfectly well with
the same character twice.

> string.match('abc"def"ghi"jkl"mno','%b""')
"def"

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Jonathan Goble
On Tue, Jan 5, 2016 at 3:29 AM, Dirk Laurie <[hidden email]> wrote:

> 2016-01-05 9:50 GMT+02:00 Rena <[hidden email]>:
>
>> I also wonder about using %b with the same character twice, like: %b"" - the
>> manual says it expects two different characters, but it'd be nice if we
>> could do this.
>
> I don't know why the manual says that. It works perfectly well with
> the same character twice.
>
>> string.match('abc"def"ghi"jkl"mno','%b""')
> "def"

And this proposal would make that even more powerful:

> str = [[char str[] = "this \"is\" a test"]]
> str
char str[] = "this \"is\" a test"
> str:match[[%B"\"]]
"this \"is\" a test"

Poof: instant matching of string literals with escape handling, in a
five-character pattern. How cool is that? :-)

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Soni "They/Them" L.


On 05/01/16 07:51 AM, Jonathan Goble wrote:

> On Tue, Jan 5, 2016 at 3:29 AM, Dirk Laurie <[hidden email]> wrote:
>> 2016-01-05 9:50 GMT+02:00 Rena <[hidden email]>:
>>
>>> I also wonder about using %b with the same character twice, like: %b"" - the
>>> manual says it expects two different characters, but it'd be nice if we
>>> could do this.
>> I don't know why the manual says that. It works perfectly well with
>> the same character twice.
>>
>>> string.match('abc"def"ghi"jkl"mno','%b""')
>> "def"
> And this proposal would make that even more powerful:
>
>> str = [[char str[] = "this \"is\" a test"]]
>> str
> char str[] = "this \"is\" a test"
>> str:match[[%B"\"]]
> "this \"is\" a test"
>
> Poof: instant matching of string literals with escape handling, in a
> five-character pattern. How cool is that? :-)
>
Meh, there's a lib for it
https://github.com/SoniEx2/Stuff/blob/master/lua/String.lua

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Jonathan Goble
On Tue, Jan 5, 2016 at 8:55 AM, Soni L. <[hidden email]> wrote:

> On 05/01/16 07:51 AM, Jonathan Goble wrote:
>> And this proposal would make that even more powerful:
>>
>>> str = [[char str[] = "this \"is\" a test"]]
>>> str
>>
>> char str[] = "this \"is\" a test"
>>>
>>> str:match[[%B"\"]]
>>
>> "this \"is\" a test"
>>
>> Poof: instant matching of string literals with escape handling, in a
>> five-character pattern. How cool is that? :-)
>>
> Meh, there's a lib for it
> https://github.com/SoniEx2/Stuff/blob/master/lua/String.lua

I hardly think that 273 lines of Lua can be compared to essentially 10
lines of C. It's pretty obvious which would be faster.

Also, your solution is for a single, very specialized use case, while
mine is a more general solution covering many different use cases.

Both can coexist here; the presence of your module (a great one from
the looks of it, I should add) should not in any way hold this up.

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Dirk Laurie-2
In reply to this post by Soni "They/Them" L.
2016-01-05 15:55 GMT+02:00 Soni L. <[hidden email]>:

> Meh, there's a lib for it
> https://github.com/SoniEx2/Stuff/blob/master/lua/String.lua

Meh too.

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Pierre-Yves Gérardy
In reply to this post by Jonathan Goble
On Tue, Jan 5, 2016 at 5:41 AM, Jonathan Goble <[hidden email]> wrote:
> My suggestion is to add the token %B, which would perform the same
> thing as %b, except it would recognize an escape character specified
> in the arguments to the token, and upon encountering the escape
> character, the following character would be ignored. I doubt it would
> handle all use cases, but it almost certainly would handle a large
> percentage.

Does it handle escaping the escape character?

(' - "foo\\"bar" - '):match([["\"]]
(' - "foo\\\"bar" - '):match([["\"]]

—Pierre-Yves

> [1] https://github.com/jcgoble3/lua-testing/commit/ee90b07d7a8c8900e0a1c29cced1b3bf4576e8fc
>

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Jonathan Goble
On Tue, Jan 5, 2016 at 4:44 PM, Pierre-Yves Gérardy <[hidden email]> wrote:

> On Tue, Jan 5, 2016 at 5:41 AM, Jonathan Goble <[hidden email]> wrote:
>> My suggestion is to add the token %B, which would perform the same
>> thing as %b, except it would recognize an escape character specified
>> in the arguments to the token, and upon encountering the escape
>> character, the following character would be ignored. I doubt it would
>> handle all use cases, but it almost certainly would handle a large
>> percentage.
>
> Does it handle escaping the escape character?
>
> (' - "foo\\"bar" - '):match([["\"]]
> (' - "foo\\\"bar" - '):match([["\"]]
>
> —Pierre-Yves

Yes; the matcher simply iterates over the string one character at a
time, except that when it encounters an escape character, the next
character, whatever it is, is ignored.

So your examples, tweaked to use raw strings for clarity:

> ([[ - "foo\"bar" - ]]):match([[%B"\"]])
"foo\"bar"  -- \ escapes the "
> ([[ - "foo\\"bar" - ]]):match([[%B"\"]])
"foo\\"  -- first \ escapes the second \, and " is then matched

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Jonathan Goble
In reply to this post by Jonathan Goble
On Mon, Jan 4, 2016 at 11:41 PM, Jonathan Goble <[hidden email]> wrote:

> I find that %b() is a great feature of Lua, but it lacks one crucial
> feature: handling the case where one or both of the args may appear in
> the string prefixed by an escape character, and thus shouldn't be
> counted. One solution to this is gsub'ing the escape sequences and
> then matching, but this can be inefficient when dealing with very
> large strings.
>
> My suggestion is to add the token %B, which would perform the same
> thing as %b, except it would recognize an escape character specified
> in the arguments to the token, and upon encountering the escape
> character, the following character would be ignored. I doubt it would
> handle all use cases, but it almost certainly would handle a large
> percentage.

Roberto? Luiz? I'm interested in hearing your "official" opinions on
this proposal.

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Roberto Ierusalimschy
> > My suggestion is to add the token %B, which would perform the same
> > thing as %b, except it would recognize an escape character specified
> > in the arguments to the token, and upon encountering the escape
> > character, the following character would be ignored. I doubt it would
> > handle all use cases, but it almost certainly would handle a large
> > percentage.
>
> Roberto? Luiz? I'm interested in hearing your "official" opinions on
> this proposal.

We don't have any "official" opinion about that.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Jonathan Goble
On Mon, Jan 11, 2016 at 6:35 AM, Roberto Ierusalimschy
<[hidden email]> wrote:

>> > My suggestion is to add the token %B, which would perform the same
>> > thing as %b, except it would recognize an escape character specified
>> > in the arguments to the token, and upon encountering the escape
>> > character, the following character would be ignored. I doubt it would
>> > handle all use cases, but it almost certainly would handle a large
>> > percentage.
>>
>> Roberto? Luiz? I'm interested in hearing your "official" opinions on
>> this proposal.
>
> We don't have any "official" opinion about that.

OK. But what are the chances of this being added to stock Lua in the
next version?

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Matthew Wild
On 11 January 2016 at 21:06, Jonathan Goble <[hidden email]> wrote:

> On Mon, Jan 11, 2016 at 6:35 AM, Roberto Ierusalimschy
> <[hidden email]> wrote:
>>> > My suggestion is to add the token %B, which would perform the same
>>> > thing as %b, except it would recognize an escape character specified
>>> > in the arguments to the token, and upon encountering the escape
>>> > character, the following character would be ignored. I doubt it would
>>> > handle all use cases, but it almost certainly would handle a large
>>> > percentage.
>>>
>>> Roberto? Luiz? I'm interested in hearing your "official" opinions on
>>> this proposal.
>>
>> We don't have any "official" opinion about that.
>
> OK. But what are the chances of this being added to stock Lua in the
> next version?

Based on the number of patches enthusiastically posted to this list,
and the number that ever get in - I'd say the answer is "very low".

I'm glad it is that way, or Lua would be Python. It doesn't mean your
patch isn't useful, or clever. Personally almost every time I use %b,
I wish it had a way to handle escaping. But Lua simply isn't an
all-features-included language.

Regards,
Matthew

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Dirk Laurie-2
2016-01-11 23:42 GMT+02:00 Matthew Wild <[hidden email]>:

>> OK. But what are the chances of this being added to stock Lua in the
>> next version?
>
> Based on the number of patches enthusiastically posted to this list,
> and the number that ever get in - I'd say the answer is "very low".

Jonathan's suggestion is merely a change to a library, not
to core Lua. In such a case, there is an alternative route to
respectability: contribute a module to LuaRocks. Let's say that
the module 'jgstring' returns a function that monkey-patches
the string library, saving the original functions in string.orig.

Once that is installed, it is a matter of

   require"jgstring"()

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Nagaev Boris
On Tue, Jan 12, 2016 at 9:19 AM, Dirk Laurie <[hidden email]> wrote:

> 2016-01-11 23:42 GMT+02:00 Matthew Wild <[hidden email]>:
>
>>> OK. But what are the chances of this being added to stock Lua in the
>>> next version?
>>
>> Based on the number of patches enthusiastically posted to this list,
>> and the number that ever get in - I'd say the answer is "very low".
>
> Jonathan's suggestion is merely a change to a library, not
> to core Lua. In such a case, there is an alternative route to
> respectability: contribute a module to LuaRocks. Let's say that
> the module 'jgstring' returns a function that monkey-patches
> the string library, saving the original functions in string.orig.
>
> Once that is installed, it is a matter of
>
>    require"jgstring"()
>

I would prefer if that module returned new function string.match
without changing any globals.


--


Best regards,
Boris Nagaev

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Jonathan Goble
On Tue, Jan 12, 2016 at 2:58 AM, Nagaev Boris <[hidden email]> wrote:

> On Tue, Jan 12, 2016 at 9:19 AM, Dirk Laurie <[hidden email]> wrote:
>> 2016-01-11 23:42 GMT+02:00 Matthew Wild <[hidden email]>:
>>
>>>> OK. But what are the chances of this being added to stock Lua in the
>>>> next version?
>>>
>>> Based on the number of patches enthusiastically posted to this list,
>>> and the number that ever get in - I'd say the answer is "very low".
>>
>> Jonathan's suggestion is merely a change to a library, not
>> to core Lua. In such a case, there is an alternative route to
>> respectability: contribute a module to LuaRocks. Let's say that
>> the module 'jgstring' returns a function that monkey-patches
>> the string library, saving the original functions in string.orig.
>>
>> Once that is installed, it is a matter of
>>
>>    require"jgstring"()
>>
>
> I would prefer if that module returned new function string.match
> without changing any globals.

The catch here is that in order to create a module containing my new
functions (find, gmatch, and gsub in addition to match), I (assume)
would have to copy-and-paste all of the pattern matching code, roughly
500 lines of code, into my new module, just to change about 10 to 15
lines. That does not seem like a reasonable plan of attack.

The only reasonable way to do this, unless I'm missing something, is
for the change to be made in stock Lua, which is not asking that much
(given the simplicity of implementation and the lack of any backwards
compatibility issues).

The only alternative I can see is for me to maintain a lightly
modified fork of Lua on GitHub that makes my changes available,
possibly by defining a symbol like LUA_CUSTOM_BALANCE. I'm considering
doing this anyway, but at this stage, I'd really rather not be
maintaining a fork.

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Nagaev Boris
On Tue, Jan 12, 2016 at 11:13 AM, Jonathan Goble <[hidden email]> wrote:

> On Tue, Jan 12, 2016 at 2:58 AM, Nagaev Boris <[hidden email]> wrote:
>> On Tue, Jan 12, 2016 at 9:19 AM, Dirk Laurie <[hidden email]> wrote:
>>> 2016-01-11 23:42 GMT+02:00 Matthew Wild <[hidden email]>:
>>>
>>>>> OK. But what are the chances of this being added to stock Lua in the
>>>>> next version?
>>>>
>>>> Based on the number of patches enthusiastically posted to this list,
>>>> and the number that ever get in - I'd say the answer is "very low".
>>>
>>> Jonathan's suggestion is merely a change to a library, not
>>> to core Lua. In such a case, there is an alternative route to
>>> respectability: contribute a module to LuaRocks. Let's say that
>>> the module 'jgstring' returns a function that monkey-patches
>>> the string library, saving the original functions in string.orig.
>>>
>>> Once that is installed, it is a matter of
>>>
>>>    require"jgstring"()
>>>
>>
>> I would prefer if that module returned new function string.match
>> without changing any globals.
>
> The catch here is that in order to create a module containing my new
> functions (find, gmatch, and gsub in addition to match), I (assume)
> would have to copy-and-paste all of the pattern matching code, roughly
> 500 lines of code, into my new module, just to change about 10 to 15
> lines. That does not seem like a reasonable plan of attack.
>
> The only reasonable way to do this, unless I'm missing something, is
> for the change to be made in stock Lua, which is not asking that much
> (given the simplicity of implementation and the lack of any backwards
> compatibility issues).
>
> The only alternative I can see is for me to maintain a lightly
> modified fork of Lua on GitHub that makes my changes available,
> possibly by defining a symbol like LUA_CUSTOM_BALANCE. I'm considering
> doing this anyway, but at this stage, I'd really rather not be
> maintaining a fork.
>

Maintaining a rock is simpler than maintaining a fork. And it worth to
make a rock even if it was added to stock Lua, because a rock can be
installed in Lua 5.1.

--


Best regards,
Boris Nagaev

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Jonathan Goble
On Tue, Jan 12, 2016 at 3:25 AM, Nagaev Boris <[hidden email]> wrote:

> On Tue, Jan 12, 2016 at 11:13 AM, Jonathan Goble <[hidden email]> wrote:
>> The catch here is that in order to create a module containing my new
>> functions (find, gmatch, and gsub in addition to match), I (assume)
>> would have to copy-and-paste all of the pattern matching code, roughly
>> 500 lines of code, into my new module, just to change about 10 to 15
>> lines. That does not seem like a reasonable plan of attack.
>>
>> The only reasonable way to do this, unless I'm missing something, is
>> for the change to be made in stock Lua, which is not asking that much
>> (given the simplicity of implementation and the lack of any backwards
>> compatibility issues).
>>
>> The only alternative I can see is for me to maintain a lightly
>> modified fork of Lua on GitHub that makes my changes available,
>> possibly by defining a symbol like LUA_CUSTOM_BALANCE. I'm considering
>> doing this anyway, but at this stage, I'd really rather not be
>> maintaining a fork.
>>
>
> Maintaining a rock is simpler than maintaining a fork. And it worth to
> make a rock even if it was added to stock Lua, because a rock can be
> installed in Lua 5.1.

Not quite, because again, I'd have to copy the entire pattern-matching
system. While I can see the benefit of making a rock, said rock would
contain large amounts of duplicated code that would need to be updated
for almost every release of Lua, same as a fork.

I'll consider it, though.

Reply | Threaded
Open this post in threaded view
|

Re: Pattern matching proposal: %B to match balanced string with specified escape

Nagaev Boris
On Tue, Jan 12, 2016 at 11:40 AM, Jonathan Goble <[hidden email]> wrote:

> On Tue, Jan 12, 2016 at 3:25 AM, Nagaev Boris <[hidden email]> wrote:
>> On Tue, Jan 12, 2016 at 11:13 AM, Jonathan Goble <[hidden email]> wrote:
>>> The catch here is that in order to create a module containing my new
>>> functions (find, gmatch, and gsub in addition to match), I (assume)
>>> would have to copy-and-paste all of the pattern matching code, roughly
>>> 500 lines of code, into my new module, just to change about 10 to 15
>>> lines. That does not seem like a reasonable plan of attack.
>>>
>>> The only reasonable way to do this, unless I'm missing something, is
>>> for the change to be made in stock Lua, which is not asking that much
>>> (given the simplicity of implementation and the lack of any backwards
>>> compatibility issues).
>>>
>>> The only alternative I can see is for me to maintain a lightly
>>> modified fork of Lua on GitHub that makes my changes available,
>>> possibly by defining a symbol like LUA_CUSTOM_BALANCE. I'm considering
>>> doing this anyway, but at this stage, I'd really rather not be
>>> maintaining a fork.
>>>
>>
>> Maintaining a rock is simpler than maintaining a fork. And it worth to
>> make a rock even if it was added to stock Lua, because a rock can be
>> installed in Lua 5.1.
>
> Not quite, because again, I'd have to copy the entire pattern-matching
> system. While I can see the benefit of making a rock, said rock would
> contain large amounts of duplicated code that would need to be updated
> for almost every release of Lua, same as a fork.
>
> I'll consider it, though.
>

For a fork, you'll have to copy the entire Lua :)

In case of a rock, you can maintain a patch and a script. A script
extracts pattern-matching code from Lua and applies a patch.

--


Best regards,
Boris Nagaev

12