[Re:] UTF-8, Unicode and all that

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[Re:] UTF-8, Unicode and all that

[As usual, apologies for not staying inside the bounds of the mailing
list threading system, as I read via the lua-l Digest...]


There's a blog post, written back in the Old Ages (2003), that goes
some way to showing what the prehistory of character sets was, briefly
what the Unicode standard(s) were attempting to achieve (at least, up
to that point in time...), and quite a bit of mention of encoding
schemes, including the myriad of old-style Code pages, WordStar's 7-bit
wonderland, UCS-2, UCS-4, and the "popular new" UTF-8.

The style is quite light, but the article's contents are carefully
written to try and help sort out possible confusion.

I've found it useful to read, and untangle a few of the concepts
being thrown around, and it may be that others might find it useful as
well.  The article does not deep-dive into how to implement full
language support; it mainly lays out the case of why and how UTF-8
is valuable when generating web pages, and how to manage your
interaction with the client's browser in this area.

[The article only scratches the surface when mentioning typesetting,
whereas I acknowledge that glyphs, ligatures, and other things are
present in the Lua discussion thread... I may be underestimating the
conceptual level of the current discussion, but I keep the blog post
bookmarked, so it can remind me as soon as I see a "character"
(perhaps a wide character, wchar_t, in some C libraries), about the
probable benefits of UTF-8 relative to alternative encodings).]

The article is:

         "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"

and its URL is:



programmer, Grouse Software