"Ignore me" symbol

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Sam Pagenkopf
Apologies for rehashing, must have skimmed the thread too hard. I suppose the overhead would kick in with functions that often use large inline literals. If you can keep them all in the outermost local scope, it's annoying but the performance impact virtually disappears.

I just did some testing, surprised that even luaJIT doesn't realize such literals are constant at compile-time.

On Tue, Dec 18, 2018 at 12:40 AM Coda Highland <[hidden email]> wrote:
On Tue, Dec 18, 2018 at 12:34 AM Sam Pagenkopf <[hidden email]> wrote:
>
> I think inline backtick comments are unneeded, --[[inline comment]] is good enough.
>
> Never saw a problem with _G.THING for explicit globals, someone let me know if I'm wrong.
>
> For long literals, there's a workaround:
> function num (n, base) return tonumber(n:gsub('_', ''), base or 10) end
> assert(num '100_000' == 100000)
> And don't forget the 4e10 syntax for literals with lots of zeroes.
> assert(1e5 == 100000)
>
> And if you really hate names like setmetatable and collectgarbage, just rename them locally.

The workaround has already been explicitly shot down in this thread:
it's a runtime transformation which means you take a performance hit
every time you use one. It HAS to be done at the lexer level to be
practical for anything that needs to worry about scalability. This
plus binary literals are things that have been in demand for quite a
while and they're things I think are worth doing -- though they're
things that can be done with token filters, too.

Inline long comments with --[[ ]] work fine for comments. The proposal
was actually trying to ask for an annotation syntax and just did a
poor job getting the point across. That's why other posters have
talked about syntaxes like @annotation(param, param) as something that
could be parsed cleanly but be a no-op in the interpreter so that
other tools can use the markup to do other tasks. Inline comments do
WORK for this, but they're a little bit on the verbose side and people
would like something with a bit better integration. That said, while I
understand the desire and intent, I don't particularly have a strong
opinion about it myself.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Viacheslav Usov
In reply to this post by Sean Conner
On Mon, Dec 17, 2018 at 10:40 PM Sean Conner <[hidden email]> wrote:

>  Also, can someone describe the symbol I just copied?

If you look at the raw message payload, you will see that the character was encoded as =E2=80=89. In "quoted-printable" that means three bytes with the given hex values. The message headers indicate that the encoding was UTF-8, so you can use some UTF-8 decoder to get the Unicode codepoint for those three bytes, which gives you U+2009 THIN SPACE [1], which per [2] is described (among other spaces) as:

The main difference among other space characters is their width. U+2000..U+2006 are
standard quad widths used in typography. U+2007 figure space has a fixed width, known
as tabular width, which is the same width as digits used in tables. U+2008 punctuation
space is a space defined to be the same width as a period. U+2009 thin space and U+200A
hair space are successively smaller-width spaces used for narrow word gaps and for justification
of type. The fixed-width space characters (U+2000..U+200A) are derived from
conventional (hot lead) typography. Algorithmic kerning and justification in computerized
typography do not use these characters. However, where they are used (for example, in
typesetting mathematical formulae), their width is generally font-specified, and they typi-
cally do not expand during justification. The exception is U+2009 thin space, which
sometimes gets adjusted.

(end)

It finds another mention in [3]:

Some or all of the following characters may be tailored to be in MidNum, depending on the environment, to allow for languages that use spaces as thousands separators, such as €1 234,56.
U+0020 SPACE
U+00A0 NO-BREAK SPACE 
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+202F NARROW NO-BREAK SPACE

(end)

Which makes it relevant for the discussion.

Cheers,
V.


[2] The Unicode® Standard Version 11.0 – Core Specification, page 264. https://www.unicode.org/versions/Unicode11.0.0/UnicodeStandard-11.0.pdf.

[3] Unicode® Standard Annex #29 UNICODE TEXT SEGMENTATION https://www.unicode.org/reports/tr29/tr29-33.html#Word_Boundary_Rules

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Roberto Ierusalimschy
In reply to this post by Coda Highland
> The workaround has already been explicitly shot down in this thread:
> it's a runtime transformation which means you take a performance hit
> every time you use one. It HAS to be done at the lexer level to be
> practical for anything that needs to worry about scalability. This
> plus binary literals are things that have been in demand for quite a
> while and they're things I think are worth doing -- though they're
> things that can be done with token filters, too.

Up to a point. In Java, the "thousand separator" can only be used in
source code; methods to convert strings to numbers do not accept them.
I am not sure this would be a good design for a scripting language like
Lua. On the other hand, to accept thousand separators in floating
strings like 'tostring("123_527.24")' is not as easy as to simply
ignore the separators in the lexer.

Binary literals also pose some design questions:

- Of course, 'tonumber', 'io.read("n")', and similar functionality
should support that format.

- Shouldn't Lua have a correspondent format to print binary numbers?
(To implement a simple '%b' is quite easy; something that handles
'%-#080.3b' is not that easy.)

- Should Lua also support floating binary literals?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Sean Conner
It was thus said that the Great Roberto Ierusalimschy once stated:

> > The workaround has already been explicitly shot down in this thread:
> > it's a runtime transformation which means you take a performance hit
> > every time you use one. It HAS to be done at the lexer level to be
> > practical for anything that needs to worry about scalability. This
> > plus binary literals are things that have been in demand for quite a
> > while and they're things I think are worth doing -- though they're
> > things that can be done with token filters, too.
>
> Up to a point. In Java, the "thousand separator" can only be used in
> source code; methods to convert strings to numbers do not accept them.
> I am not sure this would be a good design for a scripting language like
> Lua. On the other hand, to accept thousand separators in floating
> strings like 'tostring("123_527.24")' is not as easy as to simply
> ignore the separators in the lexer.

  I was curious about this, and yes, I can see why---you call one of the
Standard C functions (strtof(), strtod(), strtold()) to do the actual
conversion.  To support grouping separators [1], the code would have to
first walk through the entire number, stripping out the grouping separators,
then call the Standard C function.

  -spc (Or you could convince the C Standards body to support this ... )

[1] To avoid mixing this concept up with the decimal separator.

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

szbnwer@gmail.com
hi all! :)

about this thousands separator, wouldnt it be simply the best to put
some coloring magic on numbers in your favorite editor? :D

bests! :)

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

dyngeccetor8
On 12/19/18 10:29 AM, [hidden email] wrote:
> about this thousands separator, wouldnt it be simply the best to put
> some coloring magic on numbers in your favorite editor? :D

Same can be said for indents. BTW space is already "ignore me" symbol.

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Luiz Henrique de Figueiredo
> BTW space is already "ignore me" symbol.

No, it is not. A space between two tokens serves to separate them, not
to merge their text, as proposed in '1000_0101'. The space in 'local
x' is essential, otherwise this would be 'localx'.

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Roberto Ierusalimschy
In reply to this post by Sean Conner
>   I was curious about this, and yes, I can see why---you call one of the
> Standard C functions (strtof(), strtod(), strtold()) to do the actual
> conversion.  To support grouping separators [1], the code would have to
> first walk through the entire number, stripping out the grouping separators,
> then call the Standard C function.

Exactly. It is hard to implement your own 'strtod', due to precision
lost.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Tim Hill
In reply to this post by Hugo Musso Gualandi


> On Dec 16, 2018, at 5:10 PM, Hugo Musso Gualandi <[hidden email]> wrote:
>
> I agree with Dibyendu that annotations (along the lines of what is
> present in Java or perhaps something like Ocaml's ppx annotations)
> might be a better way to support language extensibility than a magic
> backtick character.
>
> A concern I would have with this is that this novel meaning for
> backticks is unlike any other language and could be confusing and
> unfamiliar. Additionally, it would also make it harder to grep for
> things as there would now be multiple ways to write the same
> identifiers.
>
>

Having worked on large Java codebases that make extensive use of annotations I can say it was NOT a nice experience. You end up with so many annotations and so little Java that the whole looks more like assembly language (and a bad one at that). Yuck.

—Tim


Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Lorenzo Donati-3
In reply to this post by Egor Skriptunoff-2
On 17/12/2018 21:48, Egor Skriptunoff wrote:

> On Mon, Dec 17, 2018 at 1:43 PM Lorenzo Donati wrote:
>
>> Numeric optional separator: I crave this, together with a standard
>> syntax for binary literals[1] . C++ addressed both needs with the syntax:
>>
>> 0b1011'1100'0001
>>
>> I'd prefer the underscore as a separator, though:
>>
>> 0b1011_1100_0001
>>
>
>
> In Lua syntax a literal number is never followed by a literal string, so
> C++ syntax (1'000'000) is suitable for Lua too.
> IMO, 1'000'000 looks nicer than 1_000_000.
> But probably my opinion is biased because 1'000'000 is one of the standard
> ways to write numbers in Russian texts (space is also used as thousands
> separator).
> BTW, fast googling showed that "upper dot/upper comma" are used as digit
> groups separators (at least in handwriting) in some other countries:
> Belgium, Italy, Romania, Switzerland, Mexico, Liechtenstein.
> In English-speaking countries a comma is used as separator, that's why many
> people prefer something bottom-ish (such as underscore).
>

Although I'm Italian and we traditionally use a small /upper/ (sometimes
also lower) dot to separate digits in big number, my bias against single
quote as separator is a visual one:

1. it is smallish and may be difficult to see in some fonts (especially
when it is proportional - yes I use proportional fonts when programming,
switching to monospace rendering when really needed).

2. My "internal parser" interprets the single quote as something
"string/char" related, so it interferes with my "mental code
processing", slowing me down.

3. The underscore resembles a space, which is what I use as a separator
when writing number in text (the traditional Italian way collides with
other countries uses, especially English-speaking ones, so a way to
avoid ambiguities is using spaces, which is AFAIK unambiguous almost
anywhere and coherent with scientific writing).

Point 2 is the worse pain point for me. Of course YMMV.





Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Egor Skriptunoff-2
On Thu, Dec 20, 2018 at 10:34 PM Lorenzo Donati wrote:
2. My "internal parser" interprets the single quote as something
"string/char" related, so it interferes with my "mental code
processing", slowing me down.

Point 2 is the worse pain point for me.

Yes, me too.
That's why I suggested backtick instead of single quote.

A crazy idea: let more than one symbol to be "thousands separators", why not?
For example: space, underscore, backtick, and probably some other.
But not simultaneous in the same number.
local x = 1`000'000_000 000  -- this is syntax error
local y = 1'000 + 1`000 + 1_000 + 1 000 -- this is OK
Such decision should make Lua suitable for all and stop all the debates. 
Or it wouldn't stop? :-)
Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Lorenzo Donati-3
In reply to this post by Roberto Ierusalimschy
On 18/12/2018 17:28, Roberto Ierusalimschy wrote:

>> The workaround has already been explicitly shot down in this thread:
>> it's a runtime transformation which means you take a performance hit
>> every time you use one. It HAS to be done at the lexer level to be
>> practical for anything that needs to worry about scalability. This
>> plus binary literals are things that have been in demand for quite a
>> while and they're things I think are worth doing -- though they're
>> things that can be done with token filters, too.
>
> Up to a point. In Java, the "thousand separator" can only be used in
> source code; methods to convert strings to numbers do not accept them.
> I am not sure this would be a good design for a scripting language like
> Lua. On the other hand, to accept thousand separators in floating
> strings like 'tostring("123_527.24")' is not as easy as to simply
> ignore the separators in the lexer.
>
> Binary literals also pose some design questions:
>
> - Of course, 'tonumber', 'io.read("n")', and similar functionality
> should support that format.
>
> - Shouldn't Lua have a correspondent format to print binary numbers?
> (To implement a simple '%b' is quite easy; something that handles
> '%-#080.3b' is not that easy.)
>
> - Should Lua also support floating binary literals?
>
> -- Roberto
>
>

Interesting points.

I wouldn't care about FP /binary/ literals. Probably FP /hex/ literals
are enough and Lua wouldn't benefit too much from the former.

Integer binary literals are a different beast, IMO. Maybe I'm biased
because I'm doing lot of works with microcontrollers lately, and writing
down bit-patterns in code using hex is not as clear as using binary,
especially when dealing with lots of MCU registers.

OK, an expert gets used to it, but as a (high school) teacher I usually
write example code that should help students understand the inner
working of the thing, and there is a limit to what a newbie can grok
about such low level stuff, and binary notation would help a lot!

In this respect a %b format should also be defined, even if not handling
every possible option available for (e.g) %x (anyway, the implementation
could be incremental: start with basic options, then expand the features
in new Lua releases).

For now I use C or C++ for firmware code, but I'm planning to try to use
eLua in the future (on the long run), or even Lua-proper if I manage to
do something with more powerful MCUs.

Anyway, I do use Lua for toy programs showing what bits operations do in
practice, but for this I use my own routines to handle binary
representation (no need to be efficient here!:-).

BTW, I wonder whether PUC-Lua could be easily adapted to MCUs in a
freestanding environment. I mean, if the MCU has enough RAM, could Lua
be ported to such an environment without the need of an OS? I know that
eLua was heavily modified IIRC to cope with small MCUs constraints (too
little RAM). Will having much more RAM allow using PUC-Lua without so
much hassle? Maybe is the Harvard architecture of many MCU also a
problem for PUC-Lua's C code?

Cheers!

Lorenzo.









Jim
Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Jim
On 12/20/18, Lorenzo Donati <[hidden email]> wrote:
>> Up to a point. In Java, the "thousand separator" can only be used in
>> source code; methods to convert strings to numbers do not accept them.
>> I am not sure this would be a good design for a scripting language like Lua.

why not ? java's solution seems quite simple.

>> Binary literals also pose some design questions:
>> - Of course, 'tonumber', 'io.read("n")', and similar functionality
>> should support that format.

can't this be solved by code similar to that used by the lexer for the
same purpose ?
just ignore the '_' separator in any strings passed to those functions
as is also done there.

the read in characters have to be checked in any case one by one since
they could
contain something illegal, i. e. :
  case '_'  : continue // ignore
  case '0', '1' : // ok, handle char
  default : error ( "got illegal binary literal" ) // illegal content
was detected

>> - Shouldn't Lua have a correspondent format to print binary numbers?
>> (To implement a simple '%b' is quite easy; something that handles
>> '%-#080.3b' is not that easy.)

maybe, but not too important (for my uses).

>> - Should Lua also support floating binary literals?

i personally don't need them, if this turns out to be too much effort
just ignore them.

> I wouldn't care about FP /binary/ literals. Probably FP /hex/ literals
> are enough and Lua wouldn't benefit too much from the former.

indeed.

> Integer binary literals are a different beast, IMO.

right.

> Maybe I'm biased because I'm doing lot of works with microcontrollers lately,

an imporatant use case for binary literals, like octal integer literals are for
unix mode integers.

> and writing down bit-patterns in code using hex is not as clear as using binary,
> especially when dealing with lots of MCU registers.

of course one could do so without hex literals at all just by using
default decimal
integer literals ... :-/

> binary notation would help a lot!

indeed.

> Anyway, I do use Lua for toy programs showing what bits operations do in
> practice, but for this I use my own routines to handle binary representation

that shouldn't be necessary as this is a fairly common every day task,
especially
when working with microcontrollers. i am quite sure this poses no problem at all
for micro python and the various java script implementations used in
the hardware field.

12