Does character class %s ever match hardspace?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Does character class %s ever match hardspace?

Dirk Laurie-2
The manual says:

The definitions of letter, space, and other character groups depend on
the current locale.

Is there a standard locale in which character class %s matches the
hardspace character?

Reply | Threaded
Open this post in threaded view
|

Re: Does character class %s ever match hardspace?

Egor Skriptunoff-2
On Mon, Oct 22, 2018 at 12:33 PM Dirk Laurie wrote:
Is there a standard locale in which character class %s matches the
hardspace character?


Yes, of course.
For example, Russian Windows locale.
 
   Lua 5.3.5  Copyright (C) 1994-2018 Lua.org, PUC-Rio
   > os.setlocale""
   Russian_Russia.1251
   > for code = 0, 255 do
   >> if string.char(code):match"%s" then
   >> print(code)
   >> end
   >> end
   9
   10
   11
   12
   13
   32
   160
   >

 
 
Reply | Threaded
Open this post in threaded view
|

Re: Does character class %s ever match hardspace?

Gé Weijers
On my MacOS machine the following character codes match '%s':

"C" locale: 9 10 11 12 13 32

"en_US.UTF-8" locale: 9 10 11 12 13 32 133 160

133 == NEXT LINE (NEL)
160 == NO-BREAK SPACE (NBSP)



On Mon, Oct 22, 2018 at 1:03 PM Egor Skriptunoff <[hidden email]> wrote:
On Mon, Oct 22, 2018 at 12:33 PM Dirk Laurie wrote:
Is there a standard locale in which character class %s matches the
hardspace character?


Yes, of course.
For example, Russian Windows locale.
 
   Lua 5.3.5  Copyright (C) 1994-2018 Lua.org, PUC-Rio
   > os.setlocale""
   Russian_Russia.1251
   > for code = 0, 255 do
   >> if string.char(code):match"%s" then
   >> print(code)
   >> end
   >> end
   9
   10
   11
   12
   13
   32
   160
   >

 
 


--
--

Reply | Threaded
Open this post in threaded view
|

Re: Does character class %s ever match hardspace?

Dirk Laurie-2
Op Do., 25 Okt. 2018 om 03:58 het Gé Weijers <[hidden email]> geskryf:
>
> On my MacOS machine the following character codes match '%s':
>
> "C" locale: 9 10 11 12 13 32
>
> "en_US.UTF-8" locale: 9 10 11 12 13 32 133 160
>
> 133 == NEXT LINE (NEL)
> 160 == NO-BREAK SPACE (NBSP)

Thanks to you and Egor. Egor's I understand: it is an 8-bit character
set. I find your example a little surprising, though. It's not that
way on Ubuntu. Are you using Lua 5.3? Surely single characters in the
range 128-255 are not legal UTF-8?

$ lua
Lua 5.3.5  Copyright (C) 1994-2018 Lua.org, PUC-Rio
> os.setlocale"en_US.UTF-8"
en_US.UTF-8
> for k=0,255 do if string.char(k):match"%s" then io.write(k,' ') end end
9 10 11 12 13 32 >
> utf8.len"The\160quick brown fox"
nil    4

Reply | Threaded
Open this post in threaded view
|

Re: Does character class %s ever match hardspace?

Gé Weijers
> On Oct 24, 2018, at 22:11, Dirk Laurie <[hidden email]> wrote:
>
> Op Do., 25 Okt. 2018 om 03:58 het Gé Weijers <[hidden email]> geskryf:
>>
>> On my MacOS machine the following character codes match '%s':
>>
>> "C" locale: 9 10 11 12 13 32
>>
>> "en_US.UTF-8" locale: 9 10 11 12 13 32 133 160
>>
>> 133 == NEXT LINE (NEL)
>> 160 == NO-BREAK SPACE (NBSP)
>
> Thanks to you and Egor. Egor's I understand: it is an 8-bit character
> set. I find your example a little surprising, though. It's not that
> way on Ubuntu. Are you using Lua 5.3? Surely single characters in the
> range 128-255 are not legal UTF-8?
>

I’m using 5.3. Different libc I guess.