> The complete syntax of Lua in the reference manual uses
> /LiteralString/, whose description is said to be given in 3.1.
> 3.1 says: "Any byte in a literal string not explicitly affected by the
> previous rules represents itself." The newline byte (decimal value 10
> in ASCII and UTF-8) is explicitly mentioned in three places in the
> preceding text:
> (1) A backslash followed by a real newline results in a newline in the
> (2) The escape sequence '\z' skips the following span of white-space
> characters, including line breaks
> (3) Any kind of end-of-line sequence (carriage return, newline,
> carriage return followed by newline, or newline followed by carriage
> return) is converted to a simple newline.
> (3) is given only in the context of long strings. Therefore, in a
> literal string delimited by single or double quotes, the newline byte
> has a special meaning only when it follows the backslash or is within
> a span of white-space characters following \z; and it represents
> itself otherwise. The same is true for the carriage return bytes
> (decimal value 13 in ASCII and UTF-8). So, per "the official
> definition of the Lua language" newlines and carriage returns should
> just work without being escaped in either kind of string, albeit with
> subtle platform-dependent differences in non-long strings.
> But we know that the official implementation produces error
> "unfinished string" when a non-long string has a newline. So at least
> one of the two is wrong and ought to be fixed.
> Personally, I see no reason why the implementation cannot treat
> newline and carriage return bytes as described in the manual. As far
> as I can see, llex.c already has a special case just to emit that
> error message:
> case '\n':
> case '\r':
> lexerror(ls, "unfinished string", TK_STRING);
> That special case code can be trivially changed to match the
> officially description. We can also define the behaviour of newline
> and carriage returns to be the same in both kinds of string literals,
> thus eliminating the platform-dependent differences mentioned above;
> then it is even more trivial to change that special case (this is
> because (1) given above is not followed exactly by the implementation,
> which handles not just "a real newline" but any of the four possible
> \n and \r combinations, apparently to eliminate those same
> platform-dependent differences).
> The "typo" arguments given earlier, I do not find them convincing. The
> treat-unknown-as-global rule is far more dangerous when it comes to
> types, yet we somehow live with that.
> But fundamentally, we should eliminate the discrepancy in /some way.
> /Saying one thing and doing something completely different is bad.
Yes. And I've wanted to use (numeric) escapes (e.g. \0 or something) in
multiline strings before. And I decided to use something like this:
(but am too lazy to find the actual code now...)
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.