The Lua utf8 library (Was: Issues: Character 160 ...)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

The Lua utf8 library (Was: Issues: Character 160 ...)

Dirk Laurie-2
2018-07-11 0:31 GMT+02:00 Gregg Reynolds <[hidden email]>:

> On Tue, Jul 10, 2018, 5:17 PM Sean Conner <[hidden email]> wrote:
>> It was thus said that the Great Gregg Reynolds once stated:
>> > On Tue, Jul 10, 2018, 4:44 PM Dirk Laurie <[hidden email]> wrote:
>> > > I am merely asking for extra functions along the lines of what the
>> > > utf8 library already does.
>> > > E.g. Sam's examples:
>> > >
>> > > > s1 = "Hélène"
>> > > > s2 = "Hélène"
>>
>>   They look similar, but they are construct differently.
>>
>> > FYI these look identical on Android.
>> > > If you really not understand what I mean, I can elaborate.
>> > Please do.
>> > What does "len" mean? Number of Unicode chars ot number of bytes?
>>   The number of Unicode code points.  The second one has a letter 'e'
>> followed by a combining accent (I'm not sure which accent is the combining
>> one), thus the different number of Unicode code points.
> Ok, we have "codepoints", "chars", bytes, and heaven knows what else. Is a
> Unicode "codepoint" a byte? No. Is "Unicode codepoint" even meaningful?

The Lua manual refers Unicode twice.

"The UTF-8 encoding of a Unicode character can be inserted in a
literal string with the escape sequence \u{XXX} (note the mandatory
enclosing brackets), where XXX is a sequence of one or more
hexadecimal digits representing the character code point."

"This library does not provide any support for Unicode other than the
handling of the encoding. Any operation that needs the meaning of a
character, such as character classification, is outside its scope."

OK, that's Unicode out of the way. I am not talking about it.

>From the point of view of the utf8 library, UTF-8 is a reversible way
of mapping a certain subset of strings (which I here call "codons",
borrowing a term from DNA theory) onto a certain subset of 32-bit
integers. Everything else about UTF-8, including its relation to
Unicode and its representation as glyphs, is totally irrelevant.

The two basic functions of the UTF-8 library are

    utf8.char  -- maps from one or more valid integers to a
concatenation of codons
    utf8.codepoint  -- maps from a valid concatenation of codons to
one or more integers

>From the point of view of the string library, encoding is a reversible
way of mapping one-byte strings (commonly called "characters") onto
the integers 0 to 255. Everything else about strings, including their
representation as glyphs, is totally irrelevant.

The two basic functions of the string library are

    string.char  -- maps from one or more valid integers to a
concatenation of characters
    string.byte  -- maps from a concatenation of characters to one or
more integers

There is an obvious analogy between codons and characters,already
exploited in the names of the functions utf8.char and utf8.len. The
analogy defines what the (presently non-existent) functions utf8.find,
utf8.sub, utf8.match, utf8.reverse, utf8.rep, utf8.gsub and
utf8.gmatch should mean.

[1] http://lua-users.org/wiki/ZenOfLua
[2] Most modern systems have a way of graphically representing codons,
and even some pairs of codons, as a sequence of glyphs. In many cases
(including the one that sparked off the original thread) the mapping
from glyphs to codons is not unique. This, too, is irrelevant.

Reply | Threaded
Open this post in threaded view
|

Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Hisham
On 11 July 2018 at 03:43, Dirk Laurie <[hidden email]> wrote:
> There is an obvious analogy between codons and characters,already
> exploited in the names of the functions utf8.char and utf8.len. The
> analogy defines what the (presently non-existent) functions utf8.find,
> utf8.sub, utf8.match, utf8.reverse, utf8.rep, utf8.gsub and
> utf8.gmatch should mean.

A few years ago I went through the exercise of reworking the core
pattern matching function of the Lua string library (which powers
string.match, string.gsub, string.gmatch) to work on UTF-8 codepoints
instead of bytes. My goal was to see if it was a small enough addition
to have a shot at being asked for inclusion in the library. I believe
I did get it working, if my memory serves me right. Patch follows
attached.

In the end, what turned me off about the idea was that the predefined
character classes such as %a and %d would be either unavailable or
misleading/incompatible — they couldn't be Unicode-based because we're
not supporting Unicode (just UTF-8), but they aren't ASCII-based
either, because the ones in string.match are affected by setlocale.

Ultimately, the problem is: you would expect utf8.match("name:
%a*%d+", "name: Hélène123") to work, but that doesn't seem feasible to
do without adding Unicode knowledge.

-- Hisham

lua-5.3.0-work2-utf8patterns.patch (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Dirk Laurie-2
2018-07-11 15:10 GMT+02:00 Hisham <[hidden email]>:

> On 11 July 2018 at 03:43, Dirk Laurie <[hidden email]> wrote:
>> There is an obvious analogy between codons and characters,already
>> exploited in the names of the functions utf8.char and utf8.len. The
>> analogy defines what the (presently non-existent) functions utf8.find,
>> utf8.sub, utf8.match, utf8.reverse, utf8.rep, utf8.gsub and
>> utf8.gmatch should mean.
>
> A few years ago I went through the exercise of reworking the core
> pattern matching function of the Lua string library (which powers
> string.match, string.gsub, string.gmatch) to work on UTF-8 codepoints
> instead of bytes. My goal was to see if it was a small enough addition
> to have a shot at being asked for inclusion in the library. I believe
> I did get it working, if my memory serves me right. Patch follows
> attached.
>
> In the end, what turned me off about the idea was that the predefined
> character classes such as %a and %d would be either unavailable or
> misleading/incompatible — they couldn't be Unicode-based because we're
> not supporting Unicode (just UTF-8), but they aren't ASCII-based
> either, because the ones in string.match are affected by setlocale.
>
> Ultimately, the problem is: you would expect utf8.match("name:
> %a*%d+", "name: Hélène123") to work, but that doesn't seem feasible to
> do without adding Unicode knowledge.

I would not like to tamper with existing classes, but one could introduce
definable character classes, almost like having a metatable for patterns.

Suppose we had this function:

string.class(lc,test)
  lc is a character class not currently defined e.g. "%y"
  test(str) is a function that returns the matching substring if str
     starts with a substring of that class,yes, otherwise nil

Then the user can add whatever Unicode knowledge is needed
without clutter.

Reply | Threaded
Open this post in threaded view
|

Re: The Lua utf8 library (Was: Issues: Character 160 ...)

steve donovan
In reply to this post by Hisham
On Wed, Jul 11, 2018 at 3:10 PM, Hisham <[hidden email]> wrote:
> Ultimately, the problem is: you would expect utf8.match("name:
> %a*%d+", "name: Hélène123") to work, but that doesn't seem feasible to
> do without adding Unicode knowledge.

Which is a _heavy_ task, given the number of human scripts in common use!

By the way, always been curious how non-English Lua people cope with
the existing limitations of Lua patterns?

Assume 'ASCII' punctuation and work around that?

Reply | Threaded
Open this post in threaded view
|

Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Roberto Ierusalimschy
> On Wed, Jul 11, 2018 at 3:10 PM, Hisham <[hidden email]> wrote:
> > Ultimately, the problem is: you would expect utf8.match("name:
> > %a*%d+", "name: Hélène123") to work, but that doesn't seem feasible to
> > do without adding Unicode knowledge.
>
> Which is a _heavy_ task, given the number of human scripts in common use!
>
> By the way, always been curious how non-English Lua people cope with
> the existing limitations of Lua patterns?
>
> Assume 'ASCII' punctuation and work around that?

Mainly. Either your text have all kinds of stuff, and then you need real
Unicode support, or else everything outside ASCII can be assumed to be
letters (accented letters and c-cedilla).

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Hisham
On 11 July 2018 at 11:45, Roberto Ierusalimschy <[hidden email]> wrote:

>> On Wed, Jul 11, 2018 at 3:10 PM, Hisham <[hidden email]> wrote:
>> > Ultimately, the problem is: you would expect utf8.match("name:
>> > %a*%d+", "name: Hélène123") to work, but that doesn't seem feasible to
>> > do without adding Unicode knowledge.
>>
>> Which is a _heavy_ task, given the number of human scripts in common use!
>>
>> By the way, always been curious how non-English Lua people cope with
>> the existing limitations of Lua patterns?
>>
>> Assume 'ASCII' punctuation and work around that?
>
> Mainly. Either your text have all kinds of stuff, and then you need real
> Unicode support, or else everything outside ASCII can be assumed to be
> letters (accented letters and c-cedilla).

Additionally, regional non-UTF-8 locales are not entirely gone, and
Lua supports those via os.setlocale(), based on the C library
setlocale().

$ export HELENE=$(echo "Hélène" | iconv --from-code=UTF-8 --to-code=ISO88591)

$ echo "$HELENE" | hexdump -C
00000000  48 e9 6c e8 6e 65 0a   |H.l.ne.|

$ echo "Hélène" | hexdump -C
00000000  48 c3 a9 6c c3 a8 6e 65  0a   |H..l..ne.|

$ lua
Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> print(#os.getenv("HELENE"))
6
> helene = "Hélène"
> print(#helene)
8
> print(helene:match("H%a+e")
nil
> print(os.getenv("HELENE"):match("H%a+e"))
nil
> print(os.getenv("HELENE"):upper())
HéLèNE

$ LC_ALL=pt_BR.iso88591 lua
Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> print(os.getenv("HELENE"):match("H%a+e"))
Hélène
> print(os.getenv("HELENE"):upper())
HÉLÈNE

As recently as last year we have dealt with (but not resolved!) bug
reports in LuaFileSystem related to handling filenames, with
os.setlocale() having an effect on the behavior:

https://github.com/keplerproject/luafilesystem/pull/57#issuecomment-282027816

I've also gotten recent reports from people using cyrillic with
non-UTF-8 locales, but I can't recall where.

I recall seeing projects here in Brazil resorting to non-UTF-8
(particuarly in cases where it's a local project where you only need
to support one language). I would never recommend doing this, but
these days you either use ASCII or have to resort to the full bazooka
of Unicode, so it doesn't surprise me to see people taking the ugly
shortcut of regional locales when having to deal with things like
sorting, etc.

-- Hisham

Reply | Threaded
Open this post in threaded view
|

Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Gregg Reynolds-2
In reply to this post by Dirk Laurie-2


On Wed, Jul 11, 2018, 1:43 AM Dirk Laurie <[hidden email]> wrote:
...
>From the point of view of the utf8 library, UTF-8 is a reversible way
of mapping a certain subset of strings (which I here call "codons",
borrowing a term from DNA theory) onto a certain subset of 32-bit
integers.

Not even wrong. https://en.m.wikipedia.org/wiki/Not_even_wrong. Utf8 has nothing to do with "a certain subset of 32 bit integers".

If you're talking about utf8, but you're not talking about Unicode, then what are you talking about? I'm not against it, I just don't see what you're after.

Everything
else about UTF-8, including its relation to
Unicode and its representation as glyphs, is totally irrelevant.

The two basic functions of the UTF-8 library are

    utf8.char  -- maps from one or more valid integers to a
concatenation of codons
    utf8.codepoint  -- maps from a valid concatenation of codons to
one or more integers

>From the point of view of the string library, encoding is a reversible
way of mapping one-byte strings (commonly called "characters") onto
the integers 0 to 255. Everything else about strings, including their
representation as glyphs, is totally irrelevant.

The two basic functions of the string library are

    string.char  -- maps from one or more valid integers to a
concatenation of characters
    string.byte  -- maps from a concatenation of characters to one or
more integers

There is an obvious analogy between codons and characters,already
exploited in the names of the functions utf8.char and utf8.len. The
analogy defines what the (presently non-existent) functions utf8.find,
utf8.sub, utf8.match, utf8.reverse, utf8.rep, utf8.gsub and
utf8.gmatch should mean.

[1] http://lua-users.org/wiki/ZenOfLua
[2] Most modern systems have a way of graphically representing codons,
and even some pairs of codons, as a sequence of glyphs. In many cases
(including the one that sparked off the original thread) the mapping
from glyphs to codons is not unique. This, too, is irrelevant.

Reply | Threaded
Open this post in threaded view
|

Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Jay Carlson
On 2018-07-11, at 4:58 PM, Gregg Reynolds <[hidden email]> wrote:

> On Wed, Jul 11, 2018, 1:43 AM Dirk Laurie <[hidden email]> wrote:
> ...
> >From the point of view of the utf8 library, UTF-8 is a reversible way
> of mapping a certain subset of strings (which I here call "codons",
> borrowing a term from DNA theory) onto a certain subset of 32-bit
> integers.
>
> Not even wrong. https://en.m.wikipedia.org/wiki/Not_even_wrong. Utf8 has nothing to do with "a certain subset of 32 bit integers".

What's wrong with the claim?

--
Jay
Reply | Threaded
Open this post in threaded view
|

Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Gregg Reynolds-2


On Wed, Jul 11, 2018, 4:35 PM Jay Carlson <[hidden email]> wrote:
On 2018-07-11, at 4:58 PM, Gregg Reynolds <[hidden email]> wrote:

> On Wed, Jul 11, 2018, 1:43 AM Dirk Laurie <[hidden email]> wrote:
> ...
> >From the point of view of the utf8 library, UTF-8 is a reversible way
> of mapping a certain subset of strings (which I here call "codons",
> borrowing a term from DNA theory) onto a certain subset of 32-bit
> integers.
>
> Not even wrong. https://en.m.wikipedia.org/wiki/Not_even_wrong. Utf8 has nothing to do with "a certain subset of 32 bit integers".

What's wrong with the claim?

Depends on what you mean by "the claim". In any case utf8 is a well-defined variable-width mapping. Some stuff ends up as 32 bits, some doesn't. Nothing magical about 32. Also it has nothing to do with strings.


--
Jay
Reply | Threaded
Open this post in threaded view
|

Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Coda Highland
In reply to this post by Gregg Reynolds-2
On Wed, Jul 11, 2018 at 3:58 PM, Gregg Reynolds <[hidden email]> wrote:

>
>
> On Wed, Jul 11, 2018, 1:43 AM Dirk Laurie <[hidden email]> wrote:
> ...
>>
>> >From the point of view of the utf8 library, UTF-8 is a reversible way
>> of mapping a certain subset of strings (which I here call "codons",
>> borrowing a term from DNA theory) onto a certain subset of 32-bit
>> integers.
>
>
> Not even wrong. https://en.m.wikipedia.org/wiki/Not_even_wrong. Utf8 has
> nothing to do with "a certain subset of 32 bit integers".
>
> If you're talking about utf8, but you're not talking about Unicode, then
> what are you talking about? I'm not against it, I just don't see what you're
> after.

UTF-8 = Unicode Transformation Format, 8 bit. The transformation
methodology is independent of the character set it represents. The
CANONICAL APPLICATION of this transformation format is to represent
Unicode characters, but it can be considered to be a variable-length
integer representation scheme. There's nothing wrong with discussing
the manipulation of data encoded in this format without having to drag
in the concept of a character set.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Gregg Reynolds-2


On Wed, Jul 11, 2018, 4:53 PM Coda Highland <[hidden email]> wrote:
On Wed, Jul 11, 2018 at 3:58 PM, Gregg Reynolds <[hidden email]> wrote:
>
>
> On Wed, Jul 11, 2018, 1:43 AM Dirk Laurie <[hidden email]> wrote:
> ...
>>
>> >From the point of view of the utf8 library, UTF-8 is a reversible way
>> of mapping a certain subset of strings (which I here call "codons",
>> borrowing a term from DNA theory) onto a certain subset of 32-bit
>> integers.
>
>
> Not even wrong. https://en.m.wikipedia.org/wiki/Not_even_wrong. Utf8 has
> nothing to do with "a certain subset of 32 bit integers".
>
> If you're talking about utf8, but you're not talking about Unicode, then
> what are you talking about? I'm not against it, I just don't see what you're
> after.

UTF-8 = Unicode Transformation Format, 8 bit. The transformation
methodology is independent of the character set it represents. The
CANONICAL APPLICATION of this transformation format is to represent
Unicode characters, but it can be considered to be a variable-length
integer representation scheme. There's nothing wrong with discussing
the manipulation of data encoded in this format without having to drag
in the concept of a character set.

Agreed. Then again, the only reason we have it is because we needed to deal with lots o' chars.

Anyway my point was there's nothing special about 32 bits in utf8.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Gregg Reynolds-2


On Wed, Jul 11, 2018, 4:57 PM Gregg Reynolds <[hidden email]> wrote:


On Wed, Jul 11, 2018, 4:53 PM Coda Highland <[hidden email]> wrote:
On Wed, Jul 11, 2018 at 3:58 PM, Gregg Reynolds <[hidden email]> wrote:
>
>
> On Wed, Jul 11, 2018, 1:43 AM Dirk Laurie <[hidden email]> wrote:
> ...
>>
>> >From the point of view of the utf8 library, UTF-8 is a reversible way
>> of mapping a certain subset of strings (which I here call "codons",
>> borrowing a term from DNA theory) onto a certain subset of 32-bit
>> integers.
>
>
> Not even wrong. https://en.m.wikipedia.org/wiki/Not_even_wrong. Utf8 has
> nothing to do with "a certain subset of 32 bit integers".
>
> If you're talking about utf8, but you're not talking about Unicode, then
> what are you talking about? I'm not against it, I just don't see what you're
> after.

UTF-8 = Unicode Transformation Format, 8 bit. The transformation
methodology is independent of the character set it represents. The
CANONICAL APPLICATION of this transformation format is to represent
Unicode characters, but it can be considered to be a variable-length
integer representation scheme. There's nothing wrong with discussing
the manipulation of data encoded in this format without having to drag
in the concept of a character set.

Agreed. Then again, the only reason we have it is because we needed to deal with lots o' chars.

Anyway my point was there's nothing special about 32 bits in utf8.

P.s. to the OP's point, it has nothing to do with strings.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Dirk Laurie-2
In reply to this post by Gregg Reynolds-2
2018-07-11 22:58 GMT+02:00 Gregg Reynolds <[hidden email]>:

>
>
> On Wed, Jul 11, 2018, 1:43 AM Dirk Laurie <[hidden email]> wrote:
> ...
>>
>> >From the point of view of the utf8 library, UTF-8 is a reversible way
>> of mapping a certain subset of strings (which I here call "codons",
>> borrowing a term from DNA theory) onto a certain subset of 32-bit
>> integers.
>
>
> Not even wrong. https://en.m.wikipedia.org/wiki/Not_even_wrong. Utf8 has
> nothing to do with "a certain subset of 32 bit integers".

My bad. I should have said "Lua integers". The actual sizie depends
on luaconf.h, and 32 bits is not in fact the default.

> If you're talking about utf8, but you're not talking about Unicode, then
> what are you talking about? I'm not against it, I just don't see what you're
> after.

I am talking about utf8, not about UTF-8 certainly not about Unicode.

Definitions:

Unicode: a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. [1]

UTF-8: a variable width character encoding capable of encoding all 1,112,064[1] valid code points in Unicode using one to four 8-bit bytes. [1]

utf8: a library in Lua 5.3 that provides basic support for UTF-8 encoding, but no support for Unicode other than the handling of the encoding. Any operation that needs the meaning of a character, such as character classification, is outside its scope. [2]

I started this thread in order to make the point that certain other functions in the string libray, in addition utf8.len and utf8.char, could also be generalized to the very restricted setting in which the utf8 library operates.

Reply | Threaded
Open this post in threaded view
|

[BUG?] Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Viacheslav Usov
In reply to this post by Gregg Reynolds-2
On Wed, Jul 11, 2018 at 10:59 PM Gregg Reynolds <[hidden email]> wrote:
On Wed, Jul 11, 2018, 1:43 AM Dirk Laurie <[hidden email]> wrote:
...
>From the point of view of the utf8 library, UTF-8 is a reversible way
of mapping a certain subset of strings (which I here call "codons",
borrowing a term from DNA theory) onto a certain subset of 32-bit
integers.

Not even wrong. https://en.m.wikipedia.org/wiki/Not_even_wrong. Utf8 has nothing to do with "a certain subset of 32 bit integers".

Part of the claim that you are trying to refute was "UTF-8 is a reversible way of mapping X onto a certain subset of 32-bit integers." That part is certainly true. The set of all Unicode codepoints is isomorphic with a certain subset of 32 bit integers, [0, 0x10ffff] to be exact, and the whole point of any Unicode encoding, including UTF-8, by definition, is a reversible mapping onto the set of Unicode codepoints.

Another part was "from the point of view of the utf8". utf8 uses int (mixed with unsigned int) internally (see utf8_decode) to represent Unicode codepoints. On most modern platforms, int is a 32-bit integer, where the entire statement is correct as it stands. On some platforms, it is longer than 32 bits, but in this case the statement "a certain subset of 32-bit integers" trivially applies. The Lua integer that utf8 uses externally has either 32 or 64 bits, so "a certain subset of 32 bit integers" is still correct.

utf8 will work correctly if int has at least 22 bits, assuming signed two-complement's representation. While one could argue that some platforms might have int longer than 21 bits but shorter than 32, I am afraid that, practically, utf8 is broken if int is shorter than 32 bits. I do not think this is intentional, so it probably needs to be fixed.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: [BUG?] Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Gregg Reynolds-2


On Thu, Jul 12, 2018, 6:00 AM Viacheslav Usov <[hidden email]> wrote:
On Wed, Jul 11, 2018 at 10:59 PM Gregg Reynolds <[hidden email]> wrote:
On Wed, Jul 11, 2018, 1:43 AM Dirk Laurie <[hidden email]> wrote:
...
>From the point of view of the utf8 library, UTF-8 is a reversible way
of mapping a certain subset of strings (which I here call "codons",
borrowing a term from DNA theory) onto a certain subset of 32-bit
integers.

Not even wrong. https://en.m.wikipedia.org/wiki/Not_even_wrong. Utf8 has nothing to do with "a certain subset of 32 bit integers".

Part of the claim that you are trying to refute was "UTF-8 is a reversible way of mapping X onto a certain subset of 32-bit integers." That part is certainly true. The set of all Unicode codepoints is isomorphic with a certain subset of 32 bit integers, [0, 0x10ffff] to be exact, and the whole point of any Unicode encoding, including UTF-8, by definition, is a reversible mapping onto the set of Unicode codepoints.

What can I say? I am an incorrigibly pedantic weenie, after all. Heh. There are no "32 bit integers" (altho there are base 2 representations of ints with 32 places.) UTF-8 is not a mapping from octet seqs to UTF-32 bitstrings. It's just another respresentation of ints, mutually isomorphic with any other.

Pedantic? Sure, but Unicode itself is very fastidious about this kinda stuff. Codepoints are numbers (abstract); bit patterns are code units.  Unicode expresses codepoints in hex notation, not code units (i.e. they are not "32 bit integers"). Etc.

If we had no legacy encodings we prolly would not need this kinda fastidiousness, but since we do the precision is helpful.
...

While one could argue that some platforms might have int longer than 21 bits but shorter than 32, I am afraid that, practically, utf8 is broken if int is shorter than 32 bits. I do not think this is intentional, so it probably needs to be fixed.

Nice catch!
Reply | Threaded
Open this post in threaded view
|

Re: [BUG?] Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Peter Aronoff
Gregg Reynolds <[hidden email]> wrote:
> What can I say? I am an incorrigibly pedantic weenie, after all.

Please stop or take it off-list then.

Thanks.
--
We have not been faced with the need to satisfy someone else's
requirements, and for this freedom we are grateful.
    Dennis Ritchie and Ken Thompson, The UNIX Time-Sharing System

Reply | Threaded
Open this post in threaded view
|

Re: [BUG?] Re: The Lua utf8 library (Was: Issues: Character 160 ...)

Roberto Ierusalimschy
> Gregg Reynolds <[hidden email]> wrote:
> > What can I say? I am an incorrigibly pedantic weenie, after all.
>
> Please stop or take it off-list then.

After ~80 messages in this thread and its predecessor, I guess everybody
said what they had to say about UTF-8, Unicode, and numerals. We really
could stop now.

Thanks,

-- Roberto