On Thu, May 4, 2017 at 8:39 PM, nobody <[hidden email]> wrote:
> On 2017-05-01 21:41, Andrew Starks wrote:
>> There is one single language understood by every culture that is
>> doing scientific work: math.
> Using math symbols in code requires Unicode. (Sure, you can write their
> names, but then you're back at the problem of different languages.)
I was using the language of math as an analogy. In the same way that
math is the accepted annotation for conveying mathematical concepts,
ASCII text is the accepted format for source code. Once you go beyond
that and into other encodings, things get ambiguous.
>> What problem does it solve? Is support for UTF-8 useful for
>> automated script processing or some sort of DSL application?
> Unicode is extremely useful in combination with custom mixfix
> operators/notations. Without that, not so much. Lua does not even
> permit defining custom operators. (Which is fine, it just makes
> Unicode support much less useful.)
> For DSL-purposes (and I'd count math as a DSL), Lua's syntax is already
> extremely flexible. Liberally sprinkling everything with `__call`, you
> can write `x 'op'` or `x 'op' (y)`, where 'op' can be any string
> (including Unicode). (The mandatory parentheses are pretty annoying but
> not absolutely terrible.) And if you want warts-free custom syntax,
> there's also LPEG and/or ltokenp.
> So while I know from experience that Unicode support plus mixfix
> definitions can be absolutely awesome, Lua has neither custom operators
> nor custom mixfix notations, and they're not compatible with what Lua is
> / how Lua works. So adding Unicode support would add _some_
> flexibility/convenience, but not very much. Given the complexity, it's
> probably not worth it.
> -- nobody
> The lion's share of the TTF fonts out there have extremely
> limited UTF-8 support.
> A a few free fonts that look good and have pretty broad UTF-8 support
> (both are available in most (all?) Linux package management systems).
> * Arial (one of the Microsoft core fonts); 
Sadly, Arial Unicode MS is not in the core fonts. :-(
Google’s Noto Sans (and now Noto Serif) are font families with the intent of “no missing glyphs, ever."
> * Deja Vu font family (open source) 
"U+4e00 CJK Unified Ideographs (0/0) (0/0) (0/0)”
There are 0 out of 0 CJK Unified Ideographs? Meaning they didn’t want all the missing Han glyphs to count against their coverage ratio, so they declared the Han glyphs are out of scope. Adding insult to injury :-) , the precomposed Korean is right out too:
"U+ac00 Hangul Syllables (0/0) (0/0) (0/0)”
But these days you don’t need to have a single font file with coverage of everything (like Arial Unicode). Font substitution really does seem to work.
Perhaps we should gauge progress by how many of the scripts at the bottom of the English Wikipedia home page are missing.