"Ignore me" symbol

classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

"Ignore me" symbol

Egor Skriptunoff-2
Hi!

What do you think about the following suggestion?

Let's introduce special "ignore me" symbol in Lua syntax.
(The description below assumes this symbol is the backtick)

Single occurrence of "ignore me" symbol means that this symbol
must be ignored (excepting inside literal strings)
without splitting the current lexem:

   local million = 1`000`000
   local INT64_MIN = 0x8000`0000`0000`0000
   collect`garbage()
   assert(#("1`000`000") == 9)
   assert(to`number("1`000`000"))

A sequence of two (or more) "ignore me" symbols means
part of line (block) must be ignored.
The block is terminated by either end-of-line
or the same amount ("the depth") of backticks.
Syntactically it should be treated as lexem separator.

   local x``integer`` = 42
   local arr``array[65`536] of double``, sz```integer``` = {}, 0
   ``This is single-line comment
   ```This is single-line comment
   ````This is single-line comment
   return`42         -- identifier "return42"
   return``xxxx``42  -- not "return42", but "return 42"


Q: Why this suggestion might be useful?
A: It solves a bunch of problems described as
"I want to extend Lua syntax without breaking compatibility":

1)
Everyone would be glad to have optional digits-group separator
in literal numbers.
2)
Some Lua users would appreciate ability to make long Lua identifiers
more readable by splitting them with ignorable separator.
3)
Lua extensions such as Ravi could be made compatible with Lua.
Return type annotations in Ravi may be located after closing parenthesis:
function (x``integer``)``integer,integer`` return x-1,x+1 end
4)
Global-by-default-haters could use their own Lua dialect which requires
every global variable to be preceded by single backtick.
Global-by-default-lovers could use (and modify) the code written
by global-by-default-haters without a problem.
Your "religion" (hater/lover) could be specified in "luaconf.h"
5)
Different Lua extensions could coexist simultaneously by processing
blocks only having specific depth.


Q: Do you actually suggest new additional syntax for comments in Lua?
A: No.
Comments are for comments (text written in human language).
Extension-blocks ``....`` are for extended syntax.
Structured data in comments looks awkward (although widely used).
Short comment in long brackets --[[integer]] looks not nice.


More notes:
a) Blocks are concatenate-able preserving their depths:
``depth 2`````depth 3`````depth 2``
should be parsed as
``depth 2``+```depth 3```+``depth 2``
b) Nesting of blocks should be avoided.
c) Should blocks of depth 3+ be considered multi-line?
d) This suggestion doesn't break any existing Lua code.

-- Egor

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Dibyendu Majumdar
On Sun, 16 Dec 2018 at 19:51, Egor Skriptunoff
<[hidden email]> wrote:

> Let's introduce special "ignore me" symbol in Lua syntax.
> (The description below assumes this symbol is the backtick)
>
> Single occurrence of "ignore me" symbol means that this symbol
> must be ignored (excepting inside literal strings)
> without splitting the current lexem:
>
> A sequence of two (or more) "ignore me" symbols means
> part of line (block) must be ignored.
> The block is terminated by either end-of-line
> or the same amount ("the depth") of backticks.
> Syntactically it should be treated as lexem separator.
>

I personally would prefer something like the Java annotation syntax:

@name( property1='value1', ... )

Where the '( property1='value1', ... )' bit is optional

Of course some folks might prefer the () to be {} so that this looks
like a Lua table.

So one could have single word annotations, or provide extra attributes.

As with any annotation system, if the parser doesn't recognize it,
they should be skipped.

The syntax for closeable local declarations can just use this
mechanism instead of the current proposed syntax.

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Peter Hickman-3
While a separator for strings of digits that are numbers would be useful as 1_212_312_312 is more readable as a number than 1212312312THAVE two other objections

1) The ` character is pretty much invisible and almost identical to ' depending on your font
2) Interleaving```mary```text```had```you```a```should```little```not```lamb```read```it's```makes```fleece```reading```was```the```white```text```as```much```snow```more```and```difficult

It is almost as if you wanted to introduce a syntax change to make creating bugs easier :)

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Egor Skriptunoff-2
On Mon, Dec 17, 2018 at 12:59 AM Peter Hickman wrote:
1) The ` character is pretty much invisible and almost identical to ' depending on your font

Both are used in Linux shell, and nobody complains about that.
But yes, this might be a problem if you're writing Lua programs in MS Word using proportional font :-)

 
2) Interleaving```mary```text```had```you```a```should```little```not```lamb```read```it's```makes```fleece```reading```was```the```white```text```as```much```snow```more```and```difficult

It is almost as if you wanted to introduce a syntax change to make creating bugs easier :)

This problem does not specific to my suggestion.
It's up to the programmer to make source text easy readable or obfuscated.
Good coding style would probably imply writing each term on separate line:
Interleaving```mary
text```had
you```a
should```little
Moreover, these sequences of words (language statements) are from different languages (left one from Lua, right one from the extension)

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Stephen Irons-3


On Mon, Dec 17, 2018 at 11:57 AM, Egor Skriptunoff <[hidden email]> wrote:
On Mon, Dec 17, 2018 at 12:59 AM Peter Hickman wrote:
1) The ` character is pretty much invisible and almost identical to ' depending on your font

Both are used in Linux shell, and nobody complains about that.


I complain about using "`", and I am not 'nobody', so that makes at least two people who prefer to use $() in sh and bash scripts.


Stephen Irons
Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Hugo Musso Gualandi
In reply to this post by Egor Skriptunoff-2
I agree with Dibyendu that annotations (along the lines of what is
present in Java or perhaps something like Ocaml's ppx annotations)
might be a better way to support language extensibility than a magic
backtick character.

A concern I would have with this is that this novel meaning for
backticks is unlike any other language and could be confusing and
unfamiliar. Additionally, it would also make it harder to grep for
things as there would now be multiple ways to write the same
identifiers.


Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

nobody
In reply to this post by Egor Skriptunoff-2
On 16/12/2018 23.57, Egor Skriptunoff wrote:
> On Mon, Dec 17, 2018 at 12:59 AM Peter Hickman wrote:
>> 1) The ` character is pretty much invisible and almost identical
>> to ' depending on your font
>
> Both are used in Linux shell, and nobody complains about that.

Correct.

-- nobody

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Coda Highland
In reply to this post by Egor Skriptunoff-2
On Sun, Dec 16, 2018 at 1:50 PM Egor Skriptunoff
<[hidden email]> wrote:
>
> Hi!
>
> What do you think about the following suggestion?
>
> Let's introduce special "ignore me" symbol in Lua syntax.
> (The description below assumes this symbol is the backtick)

I disagree with introducing special meaning to consecutive such
symbols. If they're supposed to be "ignore me" symbols then they
should be, y'know, ignored. Which means the second one should be
ignored the same as the first one.

As other posters have mentioned, it also makes a mess when you start
combining the various kinds, and while it's true that you just
"shouldn't do that" it also seems like something designed to improve
readability shouldn't be a tool to make it worse. We have line and
block comments for that, despite your assertion that you don't intend
it to be used that way.

If we want an annotation syntax, then we should ask for an annotation
syntax, and ideally it should resemble something that people already
recognize as an annotation syntax.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Fontana Nicola
In reply to this post by nobody
Il giorno lun, 17/12/2018 alle 02.16 +0100, nobody ha scritto:

> On 16/12/2018 23.57, Egor Skriptunoff wrote:
> > On Mon, Dec 17, 2018 at 12:59 AM Peter Hickman wrote:
> > > 1) The ` character is pretty much invisible and almost identical
> > > to ' depending on your font
> >
> > Both are used in Linux shell, and nobody complains about that.
>
> Correct.
>
> -- nobody

Refreshing ambiguous answer.

Ciao.
--
Nicola



Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Lorenzo Donati-3
In reply to this post by Coda Highland
On 17/12/2018 06:35, Coda Highland wrote:

> On Sun, Dec 16, 2018 at 1:50 PM Egor Skriptunoff
> <[hidden email]> wrote:
>>
>> Hi!
>>
>> What do you think about the following suggestion?
>>
>> Let's introduce special "ignore me" symbol in Lua syntax.
>> (The description below assumes this symbol is the backtick)
>
> I disagree with introducing special meaning to consecutive such
> symbols. If they're supposed to be "ignore me" symbols then they
> should be, y'know, ignored. Which means the second one should be
> ignored the same as the first one.
>
> As other posters have mentioned, it also makes a mess when you start
> combining the various kinds, and while it's true that you just
> "shouldn't do that" it also seems like something designed to improve
> readability shouldn't be a tool to make it worse. We have line and
> block comments for that, despite your assertion that you don't intend
> it to be used that way.
>
> If we want an annotation syntax, then we should ask for an annotation
> syntax, and ideally it should resemble something that people already
> recognize as an annotation syntax.
>
> /s/ Adam
>
>

I largely agree with you.

There are two orthogonal problems here that are addressed which should
be kept separated, IMO.

Annotations: although I don't feel the urge to have that feature in Lua,
probably it would be handy, especially if Lua provided a standard mean
to process them without additional tools.

Numeric optional separator: I crave this, together with a standard
syntax for binary literals[1] . C++ addressed both needs with the syntax:

0b1011'1100'0001

I'd prefer the underscore as a separator, though:

0b1011_1100_0001


Cheers!

-- Lorenzo

[1] Yes, I know I could define something like this:

B'1100 1010'

but it wouldn't be standard, it will require runtime execution and it's
extremely memory-wasteful (think of initialization of MCU code, for
example).








Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Egor Skriptunoff-2
In reply to this post by Coda Highland
On Mon, Dec 17, 2018 at 8:36 AM Coda Highland <[hidden email]> wrote:
I disagree with introducing special meaning to consecutive such
symbols. If they're supposed to be "ignore me" symbols then they
should be, y'know, ignored. Which means the second one should be
ignored the same as the first one.

I don't see a point in your objection.
Following your logic, if "minus" symbol means "unary negation", then two consecutive minuses must mean "unary negation and one more unary negation"?
Right?  :-)
Is there big difference between "minus" and "backtick"?  :-)
Do you want to say that some composite lexems ( --  <<  >>  ::  .. ) are confusing because their meaning don't match the meaning of symbols they are consisted of?

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Egor Skriptunoff-2
In reply to this post by Fontana Nicola
On Mon, Dec 17, 2018 at 9:36 AM Fontana Nicola wrote:
Il giorno lun, 17/12/2018 alle 02.16 +0100, nobody ha scritto:
> On 16/12/2018 23.57, Egor Skriptunoff wrote:
> > On Mon, Dec 17, 2018 at 12:59 AM Peter Hickman wrote:
> > > 1) The ` character is pretty much invisible and almost identical
> > > to ' depending on your font
> >
> > Both are used in Linux shell, and nobody complains about that.
>
> Correct.
>
> -- nobody

Refreshing ambiguous answer.

It seems that we need annotations (to specify indentifiers' types) not only in Ravi, but in this mailing list too :-)

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Egor Skriptunoff-2
In reply to this post by Lorenzo Donati-3
On Mon, Dec 17, 2018 at 1:43 PM Lorenzo Donati wrote:
Numeric optional separator: I crave this, together with a standard
syntax for binary literals[1] . C++ addressed both needs with the syntax:

0b1011'1100'0001

I'd prefer the underscore as a separator, though:

0b1011_1100_0001


In Lua syntax a literal number is never followed by a literal string, so C++ syntax (1'000'000) is suitable for Lua too.
IMO, 1'000'000 looks nicer than 1_000_000.
But probably my opinion is biased because 1'000'000 is one of the standard ways to write numbers in Russian texts (space is also used as thousands separator).
BTW, fast googling showed that "upper dot/upper comma" are used as digit groups separators (at least in handwriting) in some other countries: Belgium, Italy, Romania, Switzerland, Mexico, Liechtenstein.
In English-speaking countries a comma is used as separator, that's why many people prefer something bottom-ish (such as underscore).
 
Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Coda Highland
In reply to this post by Egor Skriptunoff-2
On Mon, Dec 17, 2018 at 2:43 PM Egor Skriptunoff
<[hidden email]> wrote:

>
> On Mon, Dec 17, 2018 at 8:36 AM Coda Highland <[hidden email]> wrote:
>>
>> I disagree with introducing special meaning to consecutive such
>> symbols. If they're supposed to be "ignore me" symbols then they
>> should be, y'know, ignored. Which means the second one should be
>> ignored the same as the first one.
>
>
> I don't see a point in your objection.
> Following your logic, if "minus" symbol means "unary negation", then two consecutive minuses must mean "unary negation and one more unary negation"?
> Right?  :-)
> Is there big difference between "minus" and "backtick"?  :-)
> Do you want to say that some composite lexems ( --  <<  >>  ::  .. ) are confusing because their meaning don't match the meaning of symbols they are consisted of?
>

No, I don't make that argument. In fact I find the comparison somewhat
disingenuous.

My complaint is ONLY relevant because the explicit meaning of this
particular lexeme is proposed to be "ignore me". The intended purpose
is for that lexeme to be filtered out of the byte stream before the
code is even tokenized. But by enabling its use in a compound lexeme,
suddenly the lexer has to determine if the character is an "ignore me"
non-token or if it's part of a "don't ignore me, just don't execute
me" annotation token. And not only does the lexer have to do this, so
does any human reading the code. "How many of these in a row are there
here?" when the contract of a single one is supposed to be "it doesn't
matter if I'm here or not, I don't do anything."

That said, consecutive minuses actually do have this issue as well.
The other examples you give don't, because they're all binary
operators, but negation and predecrement are both unary prefix
operators. There's an ambiguity introduced in the grammar and the only
way to resolve that ambiguity is by fiat. The C language specification
resolves it by instructing the lexer to take the longest string of
characters that forms a valid token, even if a different parse would
have resulted in a legal parse (for example, "2--3" is "2 -- 3" and
not "2 - -3" even though the former is illegal and the latter
evaluates to 5) but the only reason we actually put up with it in
modern programming is because of historical precedent. That's not an
excuse to make the problem worse; rather, we should take the
opportunity to learn the lesson.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Dirk Laurie-2
In reply to this post by Egor Skriptunoff-2
Op Ma. 17 Des. 2018 om 22:48 het Egor Skriptunoff
<[hidden email]> geskryf:

>
> On Mon, Dec 17, 2018 at 1:43 PM Lorenzo Donati wrote:
>>
>> Numeric optional separator: I crave this, together with a standard
>> syntax for binary literals[1] . C++ addressed both needs with the syntax:
>>
>> 0b1011'1100'0001
>>
>> I'd prefer the underscore as a separator, though:
>>
>> 0b1011_1100_0001
>
>
>
> In Lua syntax a literal number is never followed by a literal string, so C++ syntax (1'000'000) is suitable for Lua too.
> IMO, 1'000'000 looks nicer than 1_000_000.
> But probably my opinion is biased because 1'000'000 is one of the standard ways to write numbers in Russian texts (space is also used as thousands separator).
> BTW, fast googling showed that "upper dot/upper comma" are used as digit groups separators (at least in handwriting) in some other countries: Belgium, Italy, Romania, Switzerland, Mexico, Liechtenstein.
> In English-speaking countries a comma is used as separator, that's why many people prefer something bottom-ish (such as underscore).

Must it be an ASCII character? Why not Unicode/UTF8, as in 1 000 000?

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Sean Conner
It was thus said that the Great Dirk Laurie once stated:

> Op Ma. 17 Des. 2018 om 22:48 het Egor Skriptunoff
> >
> > BTW, fast googling showed that "upper dot/upper comma" are used as digit
> > groups separators (at least in handwriting) in some other countries:
> > Belgium, Italy, Romania, Switzerland, Mexico, Liechtenstein.
> >
> > In English-speaking countries a comma is used as separator, that's why
> > many people prefer something bottom-ish (such as underscore).
>
> Must it be an ASCII character? Why not Unicode/UTF8, as in 1 000 000?

  As a monolingual Yank [1] with a US keyboard, I don't know how to type  
but must resort to copy-n-paste (like I did just there).

  -spc (Also, can someone describe the symbol I just copied?  My current
        font doesn't have a glyph for that character ... )

[1] aka. citizen of the United States of America

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Andrew Gierth
>>>>> "Sean" == Sean Conner <[hidden email]> writes:

 Sean> As a monolingual Yank [1] with a US keyboard, I don't know how to
 Sean> type   but must resort to copy-n-paste (like I did just there).

 Sean> -spc (Also, can someone describe the symbol I just copied? My
 Sean> current font doesn't have a glyph for that character ... )

U+2009, THIN SPACE

(though on my mail client, it comes out as a slightly _wider_ space than
normal)

--
Andrew.

Jim
Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Jim
In reply to this post by Dirk Laurie-2
On 12/17/18, Dirk Laurie <[hidden email]> wrote:
> Must it be an ASCII character? Why not Unicode/UTF8, as in 1 000 000?

what a great idea.
using unicode non-ASCII chars in place of ASCII in the core language syntax
in a situation where the latter are fully sufficient and nothing is
gained by doing so
seems in fact pretty silly.

instead using '_' as separator in integer literals (that is ignoring
any occurring
underscores in such literals) is a sound proposal also found in other
languages (java).
this is especially useful for binary literals (which Lua sadly still lacks).

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Sam Pagenkopf
I think inline backtick comments are unneeded, --[[inline comment]] is good enough.

Never saw a problem with _G.THING for explicit globals, someone let me know if I'm wrong.

For long literals, there's a workaround:
function num (n, base) return tonumber(n:gsub('_', ''), base or 10) end
assert(num '100_000' == 100000)
And don't forget the 4e10 syntax for literals with lots of zeroes.
assert(1e5 == 100000)

And if you really hate names like setmetatable and collectgarbage, just rename them locally.

On Tue, Dec 18, 2018 at 12:08 AM Jim <[hidden email]> wrote:
On 12/17/18, Dirk Laurie <[hidden email]> wrote:
> Must it be an ASCII character? Why not Unicode/UTF8, as in 1 000 000?

what a great idea.
using unicode non-ASCII chars in place of ASCII in the core language syntax
in a situation where the latter are fully sufficient and nothing is
gained by doing so
seems in fact pretty silly.

instead using '_' as separator in integer literals (that is ignoring
any occurring
underscores in such literals) is a sound proposal also found in other
languages (java).
this is especially useful for binary literals (which Lua sadly still lacks).

Reply | Threaded
Open this post in threaded view
|

Re: "Ignore me" symbol

Coda Highland
On Tue, Dec 18, 2018 at 12:34 AM Sam Pagenkopf <[hidden email]> wrote:

>
> I think inline backtick comments are unneeded, --[[inline comment]] is good enough.
>
> Never saw a problem with _G.THING for explicit globals, someone let me know if I'm wrong.
>
> For long literals, there's a workaround:
> function num (n, base) return tonumber(n:gsub('_', ''), base or 10) end
> assert(num '100_000' == 100000)
> And don't forget the 4e10 syntax for literals with lots of zeroes.
> assert(1e5 == 100000)
>
> And if you really hate names like setmetatable and collectgarbage, just rename them locally.

The workaround has already been explicitly shot down in this thread:
it's a runtime transformation which means you take a performance hit
every time you use one. It HAS to be done at the lexer level to be
practical for anything that needs to worry about scalability. This
plus binary literals are things that have been in demand for quite a
while and they're things I think are worth doing -- though they're
things that can be done with token filters, too.

Inline long comments with --[[ ]] work fine for comments. The proposal
was actually trying to ask for an annotation syntax and just did a
poor job getting the point across. That's why other posters have
talked about syntaxes like @annotation(param, param) as something that
could be parsed cleanly but be a no-op in the interpreter so that
other tools can use the markup to do other tasks. Inline comments do
WORK for this, but they're a little bit on the verbose side and people
would like something with a bit better integration. That said, while I
understand the desire and intent, I don't particularly have a strong
opinion about it myself.

/s/ Adam

12