Lua 5.2 string patterns do not respect lctype

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Lua 5.2 string patterns do not respect lctype

Dirk Laurie-2
The test for alpha in lstrlib.c uses `isalpha`, not `lislalpha`.
I suppose this is a feature, not a bug, so that the definition
of "letter" can be locale-dependent.

Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.2 string patterns do not respect lctype

Roberto Ierusalimschy
> The test for alpha in lstrlib.c uses `isalpha`, not `lislalpha`.
> I suppose this is a feature, not a bug, so that the definition
> of "letter" can be locale-dependent.

Yes, it is a feature. (The same is valid for other tests, too.)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.2 string patterns do not respect lctype

Dirk Laurie-2
2013/4/11 Roberto Ierusalimschy <[hidden email]>:
>> The test for alpha in lstrlib.c uses `isalpha`, not `lislalpha`.
>> I suppose this is a feature, not a bug, so that the definition
>> of "letter" can be locale-dependent.
>
> Yes, it is a feature. (The same is valid for other tests, too.)

It is tricky to capture the longest valid Lua identifier from the start of
a string.

- The pattern "^([%a_][%w_]*)" is locale-dependent.

- The pattern "^([A-Za-Z_][A-Za-Z0-9_]*)" works only with the original
  lctype.c.

If we reduce the requirement to merely testing whether a given string
is a valid name, then

- 'load("local "..str) and str' almost works, but gives false positives
  if the rest of "str" completes a valid chunk.

- 'load("local "..str) and load("local "..str.."=nil") and str' reduces
the false positives to the point where I have not yet found any, but
I have no doubt that the clever people on this list will find an
exception.

Much neater, especially to people who have already been willing to patch
lctype.c, would be to add two lines to the function match_class in
lstrlib.c, defining new class types that return lislapha(c) and
lislalnum(c). Half the alphabet is still available. But wait, that would
require `lctype.h` to be included, breaking the convention that
Lua-callable C functions should use only the official API.

Any better ideas?

Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.2 string patterns do not respect lctype

Peter Cawley
On Fri, Apr 12, 2013 at 10:53 AM, Dirk Laurie <[hidden email]> wrote:
- 'load("local "..str) and load("local "..str.."=nil") and str' reduces
the false positives to the point where I have not yet found any, but
I have no doubt that the clever people on this list will find an
exception.

How about:

str="some_identifier --" 

Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.2 string patterns do not respect lctype

Roberto Ierusalimschy
In reply to this post by Dirk Laurie-2
> It is tricky to capture the longest valid Lua identifier from the start of
> a string.
>
> - The pattern "^([%a_][%w_]*)" is locale-dependent.
>
> - The pattern "^([A-Za-Z_][A-Za-Z0-9_]*)" works only with the original
>   lctype.c.
>
> [...]
>
> Any better ideas?

1) Spell it out:

   [abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_][abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_0-9]*

2) do  os.setlocale("C")  before the match.

3) Ignore EBCDIC and assume that [A-Za-Z_] matches all letters.


-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Lua 5.2 string patterns do not respect lctype

Rena
On Fri, Apr 12, 2013 at 12:20 PM, Roberto Ierusalimschy <[hidden email]> wrote:
> It is tricky to capture the longest valid Lua identifier from the start of
> a string.
>
> - The pattern "^([%a_][%w_]*)" is locale-dependent.
>
> - The pattern "^([A-Za-Z_][A-Za-Z0-9_]*)" works only with the original
>   lctype.c.
>
> [...]
>
> Any better ideas?

1) Spell it out:

   [abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_][abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_0-9]*

2) do  os.setlocale("C")  before the match.

3) Ignore EBCDIC and assume that [A-Za-Z_] matches all letters.


-- Roberto


That makes me feel like it'd be useful to have classes that expand to "abcdefghijklmnopqrstuvwxyz" and "ABCD
EFGHIJKLMNOPQRSTUVWXYZ".

--
Sent from my Game Boy.