Would lua support varaible name with non-ascii characters?

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Would lua support varaible name with non-ascii characters?

qtiuto
Luajit does support this feature.
and for utf-8,all we need is just changing the follow in llex.c(line 558-571)
if (lislalpha(ls->current)) {  /* identifier or reserved word? */
TString *ts;
do {
save_and_next(ls);
} while (lislalnum(ls->current));
ts = luaX_newstring(ls, luaZ_buffer(ls->buff),
luaZ_bufflen(ls->buff));
seminfo->ts = ts;
if (isreserved(ts)) /* reserved word? */
return ts->extra - 1 + FIRST_RESERVED;
else {
return TK_NAME;
}
}
to
if (lislalpha(ls->current)|| ls->current &0x80) {  /* identifier or reserved word? */
TString *ts;
do {
save_and_next(ls);
} while (lislalnum(ls->current)|| ls->current & 0x80);
ts = luaX_newstring(ls, luaZ_buffer(ls->buff),
luaZ_bufflen(ls->buff));
seminfo->ts = ts;
if (isreserved(ts)) /* reserved word? */
return ts->extra - 1 + FIRST_RESERVED;
else {
return TK_NAME;
}
}
It's very easy.Will lua 5.4 support it?
Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Luiz Henrique de Figueiredo
> and for utf-8,all we need is just changing the follow in llex.c(line 558-571)

Or change lctype.c. See http://lua-users.org/lists/lua-l/2009-10/msg00104.html

> It's very easy.Will lua 5.4 support it?

Probably not. See the thread above.

Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Lorenzo Donati-3
In reply to this post by qtiuto
On 15/10/2018 17:55, 奥斯陆君王 wrote:

[...]

>
> It's very easy.Will lua 5.4 support it?
>

I hope it never will!

Sorry, it is not about any cultural prejudice (I know many people,
especially Asian people, could feel discriminated by such a stance, but
it is not my intention).

It is just a matter of convenience and "safety". It is not worth opening
such a big can of worms, IMO.

I started programming learning by trial and error what it means using
"0" and "O" and "o" as characters in identifiers carelessly. The same
goes for "l" and "1".

That is, any subset of characters that have likely similar glyphs in
some font are going to cause grief in some cases without proper
programming practices.

Allow the whole UNICODE mess into identifiers and the chances for
mistaking a symbol for another skyrockets exponentially! I'm not an
UNICODE guru but I bet my bottom dollar that there are more than a dozen
symbols that, in some font, look like an uppercase latin "O" (that is a
symbol looking like more or less like a circle). The same goes for other
simple-looking symbols like an uppercase "I" (a vertical "stick" of some
sort).

Now, imagine an identifier like B10010100, where each individual
"character" is in fact a different "version" of a "0" or a "1". Nightmare!

These problems are somewhat small annoyances to cope with when you are
dealing with ASCII, where the "problematic" chars are well known,
because every programmer more or less knows what's in *the whole ASCII
set*.

But what the frigging heck is in UNICODE?!? There are gazillions of code
points! There are even not-yet-defined code points!!! WHO knows UNICODE
in its entirety?

How can I be sure that whoever must use my code where I inserted a
"unicodishy" identifier is able to understand uniquely what kind of
"characters" make up the identifier?

Is this worth all the hassle? What advantages would this bring to the
programming effort? How much will it cost to track down bugs generated
by the possible mistake?

I doubt there are tangible *net* advantages in *standardizing* UNICODE,
even in its remarkable UTF-8 encoding, as an alphabet for programming.

UNICODE was meant for linguistics and typesetting, not for programming.

And anyway, as LHF pointed out, you can change lctype.c if you have
special needs (which I definitely won't argue against, that's for sure).


BTW, since this is not the first time this "I'd like unicode in my
names" thing comes up, I'd like to see some of the UNICODE gurus on this
list entering a contest of creating the most bedazzling set of
seemingly-identical identifiers using theirs "utf-8 powers". :-D

I think this will have great educational value for those thinking that
having *generally standardized* UNICODE identifiers is a good idea. ;-)


Cheers!

--Lorenzo









Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Viacheslav Usov
On Tue, Oct 16, 2018 at 2:58 PM Lorenzo Donati <[hidden email]> wrote:

> Now, imagine an identifier like B10010100, where each individual "character" is in fact a different "version" of a "0" or a "1". Nightmare!

Especially in Lua, which happily treats any unknown identifier as a valid global variable.

oоοo𝖔

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Lorenzo Donati-3
On 16/10/2018 15:18, Viacheslav Usov wrote:

> On Tue, Oct 16, 2018 at 2:58 PM Lorenzo Donati <[hidden email]>
> wrote:
>
>> Now, imagine an identifier like B10010100, where each individual
> "character" is in fact a different "version" of a "0" or a "1". Nightmare!
>
> Especially in Lua, which happily treats any unknown identifier as a valid
> global variable.
>
> oоοo𝖔
>
> Cheers,
> V.
>
Yepp!!

And since there are few fonts (AFAIK) that cover the entire UNICODE set,
we will need editors capable of automatically rendering source code by
mixing and matching different glyphs from different fonts.

Word processors for source code, anyone? (*Ouch!*)

Imagine: use a ~1GB application to write a ~100 lines script (~1kB
source code) of a language whose implementation is ~1MB. That's
minimalism! :-)

Moreover, the same editors will have to be also hex-editors (*Urgh!*),
because we will need the ability to look at the actual encoding of
glyphs to discriminate those visually-ambiguous identifiers (something
that is so easy in ASCII, e.g. by switching to a monospaced font).

Then the same editor would need the ability to map the encoding to its
standard UNICODE representation, just because otherwise we would also
need to remember all the possible UTF-8 sequences and their meanings.
(*Arghh!!*)

Then....

The more I think about it, the more it seems a possible representation
of Hell (in the biblical sense) for a programmer. Spend the eternity
learning every UTF-8 sequence, its mapping to code-points and their
possible visual representation with glyphs in an infinite number of
fonts which never can represent the whole UNICODE plane-set.

In comparison solving the halting problem is just purgatory! (*<grin>*)

There are up sides, though. Imagine how many nice and mind-boggling
pranks you could do to your colleague programmers! :-]]

Cheers! :-D

-- Lorenzo








Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Sean Conner
It was thus said that the Great Lorenzo Donati once stated:
> Imagine: use a ~1GB application to write a ~100 lines script (~1kB
> source code) of a language whose implementation is ~1MB. That's
> minimalism! :-)

  There are people who already do that (I'm looking at you, Electron! [1])

> Moreover, the same editors will have to be also hex-editors (*Urgh!*),
> because we will need the ability to look at the actual encoding of
> glyphs to discriminate those visually-ambiguous identifiers (something
> that is so easy in ASCII, e.g. by switching to a monospaced font).

  I think it would be easy enough to have an editor that just highlights
Unicode characters outside the range of 0x20 - 0x7E.  

  -spc (Hmmm ... I may have to look into modifying my editor to do just that
        ... )

[1] https://electronjs.org/

Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Albert Chan
In reply to this post by Lorenzo Donati-3

> There are up sides, though. Imagine how many nice and mind-boggling pranks you could do to your colleague programmers! :-]]
>
> Cheers! :-D
>
> -- Lorenzo

Since you might read your own code, the prank might backfire :-D

I think coding is hard enough without worrying about non-ASCII issue ...



Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Vadim A. Misbakh-Soloviov
In reply to this post by Lorenzo Donati-3
... and in my opinion it would just be nice that lua **wouldn't try** to be
smarter than me and **don't** prevent me from shooting my leg.
I.e., in my opinion, it would be nice that lua would support unicode as
variable names, and... just shut up. All the problems are mine and I'm only
responsible for all the possible fuckups.


// Well, actually, I already have such a lua interpreter (luajit), but it
would also be nice if PUC-Rio one would be happy with that too.




Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Andrew Starks-2
In reply to this post by Albert Chan


On Tue, Oct 16, 2018 at 13:09 Albert Chan <[hidden email]> wrote:

> There are up sides, though. Imagine how many nice and mind-boggling pranks you could do to your colleague programmers! :-]]
>
> Cheers! :-D
>
> -- Lorenzo

Since you might read your own code, the prank might backfire :-D

I think coding is hard enough without worrying about non-ASCII issue ...



I’d rather have LPEG matching for variable assignments. 

lpeg.r”ah” = 42

assert(b == g and g == “42” )

Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Tim Hill
In reply to this post by Lorenzo Donati-3


> On Oct 16, 2018, at 10:11 AM, Lorenzo Donati <[hidden email]> wrote:
>
> On 16/10/2018 15:18, Viacheslav Usov wrote:
>> On Tue, Oct 16, 2018 at 2:58 PM Lorenzo Donati <[hidden email]>
>> wrote:
>>
>>> Now, imagine an identifier like B10010100, where each individual
>> "character" is in fact a different "version" of a "0" or a "1". Nightmare!
>>
>> Especially in Lua, which happily treats any unknown identifier as a valid
>> global variable.
>>
>> oоοo𝖔
>>
>> Cheers,
>> V.
>>
> Yepp!!
>
> And since there are few fonts (AFAIK) that cover the entire UNICODE set, we will need editors capable of automatically rendering source code by mixing and matching different glyphs from different fonts.
>
> Word processors for source code, anyone? (*Ouch!*)
>
> Imagine: use a ~1GB application to write a ~100 lines script (~1kB source code) of a language whose implementation is ~1MB. That's minimalism! :-)
>
> Moreover, the same editors will have to be also hex-editors (*Urgh!*), because we will need the ability to look at the actual encoding of glyphs to discriminate those visually-ambiguous identifiers (something that is so easy in ASCII, e.g. by switching to a monospaced font).
>
> Then the same editor would need the ability to map the encoding to its standard UNICODE representation, just because otherwise we would also need to remember all the possible UTF-8 sequences and their meanings. (*Arghh!!*)
>
> Then....
>
> The more I think about it, the more it seems a possible representation of Hell (in the biblical sense) for a programmer. Spend the eternity learning every UTF-8 sequence, its mapping to code-points and their possible visual representation with glyphs in an infinite number of fonts which never can represent the whole UNICODE plane-set.
>
> In comparison solving the halting problem is just purgatory! (*<grin>*)
>
> There are up sides, though. Imagine how many nice and mind-boggling pranks you could do to your colleague programmers! :-]]
>
> Cheers! :-D
>
> -- Lorenzo
>
>


Unicode (noun): A character encoding system designed to make code pages look sensible.

—Tim






Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Sergey Zakharchenko
Hello,

I'll just throw in my take on how Lua may look 'different' in a text
editor with Unicode font support (what you type and what's stored in
the file is still the same plain ASCII, just the presentation is
different). Not that I'm using it though, I just created it for fun
once.

https://pasteboard.co/HIP1375.png

https://gist.github.com/szakharchenko/a479fb90af72ef0243710278f1a7eac2

You could create some sort of convention, like '__Uxxx_ in a symbol
name should be displayed as the corresponding Unicode char', and have
the editor do the transformation for you. The elisp to do that is left
as an exercise.

Best regards,

--
DoubleF

Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Dirk Laurie-2
Op Wo., 17 Okt. 2018 om 07:31 het Sergey Zakharchenko
<[hidden email]> geskryf:

>
> Hello,
>
> I'll just throw in my take on how Lua may look 'different' in a text
> editor with Unicode font support (what you type and what's stored in
> the file is still the same plain ASCII, just the presentation is
> different). Not that I'm using it though, I just created it for fun
> once.
>
> https://pasteboard.co/HIP1375.png

We've been here before.

http://lua-users.org/lists/lua-l/2016-11/threads.html#00213

Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Dirk Laurie-2
In reply to this post by Luiz Henrique de Figueiredo
Op Ma., 15 Okt. 2018 om 18:30 het Luiz Henrique de Figueiredo
<[hidden email]> geskryf:
>
> > and for utf-8,all we need is just changing the follow in llex.c(line 558-571)
>
> Or change lctype.c. See http://lua-users.org/lists/lua-l/2009-10/msg00104.html
>
> > It's very easy.Will lua 5.4 support it?
>
> Probably not. See the thread above.

In a later post,
http://lua-users.org/lists/lua-l/2011-05/msg00543.html, Luiz spelt it
out:

> Note that you can also provide your own lctype.c without patching
> the one in the Lua core. The linker will use yours instead.

We also had some fun last year in this thread:

http://lua-users.org/lists/lua-l/2017-04/msg00395.html

Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Tim Hill
In reply to this post by Viacheslav Usov


On Oct 16, 2018, at 6:18 AM, Viacheslav Usov <[hidden email]> wrote:

On Tue, Oct 16, 2018 at 2:58 PM Lorenzo Donati <[hidden email]> wrote:

> Now, imagine an identifier like B10010100, where each individual "character" is in fact a different "version" of a "0" or a "1". Nightmare!

Especially in Lua, which happily treats any unknown identifier as a valid global variable.

oоοo𝖔

Cheers,
V.

+1 .. I realize I’m a native English speaker and so biased, but it seems to me the benefits of Unicode identifiers are far outweighed by the problems created.

—Tim

Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Thomas Jericke
Am 22.10.2018 um 19:30 schrieb Tim Hill:


On Oct 16, 2018, at 6:18 AM, Viacheslav Usov <[hidden email]> wrote:

On Tue, Oct 16, 2018 at 2:58 PM Lorenzo Donati <[hidden email]> wrote:

> Now, imagine an identifier like B10010100, where each individual "character" is in fact a different "version" of a "0" or a "1". Nightmare!

Especially in Lua, which happily treats any unknown identifier as a valid global variable.

oоοo𝖔

Cheers,
V.

+1 .. I realize I’m a native English speaker and so biased, but it seems to me the benefits of Unicode identifiers are far outweighed by the problems created.

—Tim

What problem can be so bad outweighting the benefit of using 🤮 (U+1F92E) as an identifier?

--

Thomas

Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Enrico Colombini
On 23-Oct-18 09:10, Thomas Jericke wrote:
> What problem can be so bad outweighting the benefit of using 🤮
> (U+1F92E) as an identifier?

Would it represent a function returning all arguments?

--
   Enrico

Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Ivan Krylov
In reply to this post by Thomas Jericke
On Tue, 23 Oct 2018 09:10:43 +0200
Thomas Jericke <[hidden email]> wrote:

> What problem can be so bad outweighting the benefit of using 🤮
> (U+1F92E) as an identifier?

Ah, the famous -vomit-frame-pointer optimization?

--
Best regards,
Ivan

Reply | Threaded
Open this post in threaded view
|

Re: Would lua support varaible name with non-ascii characters?

Lorenzo Donati-3
In reply to this post by Thomas Jericke
On 23/10/2018 09:10, Thomas Jericke wrote:

> Am 22.10.2018 um 19:30 schrieb Tim Hill:
>>
>>
>>> On Oct 16, 2018, at 6:18 AM, Viacheslav Usov <[hidden email]
>>> <mailto:[hidden email]>> wrote:
>>>
>>> On Tue, Oct 16, 2018 at 2:58 PM Lorenzo Donati
>>> <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>> > Now, imagine an identifier like B10010100, where each individual
>>> "character" is in fact a different "version" of a "0" or a "1".
>>> Nightmare!
>>>
>>> Especially in Lua, which happily treats any unknown identifier as a
>>> valid global variable.
>>>
>>> oоοo𝖔
>>>
>>> Cheers,
>>> V.
>>
>> +1 .. I realize I’m a native English speaker and so biased, but it
>> seems to me the benefits of Unicode identifiers are far outweighed by
>> the problems created.
>>
>> —Tim
>>
> What problem can be so bad outweighting the benefit of using 🤮
> (U+1F92E) as an identifier?
>

It could be useful in Lua error messages, though. ;-)

That would clearly be a nice substitute for the "assertion error"
message. :-D



> --
>
> Thomas
>
>