Bug: long strings with REALLY long delimiters…

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Bug: long strings with REALLY long delimiters…

nobody
…like [===…===[a]===…===] cause all sorts of weird behavior (incl.
crashes) in 5.3.x / 5.4 (git HEAD).  Try

local eqs = ("="):rep(0x3ffffffe)
local code = "return ["..eqs.."[a]"..eqs.."]"
print(#assert(load(code))())

It might segfault, it might print a huge number instead of 1 (or, minus
the '#', print 'a]===…===]<<garbage>>' instead of a single 'a').

And with ASAN (-fsanitize=address) you get told more gory details, like
that it's trying to do a 0x1_0000_0001-long read (4GiB plus 1 Byte… that
read size is stable over the next couple extra ='s).

I haven't deeply looked into this, from a cursory glance it might be
(partly?) caused by the separator length being tracked in an `int`,
which affects (in 5.4/git)

llex.c:251: skip_sep (return, locvar on next line)
llex.c:264: read_long_string (funarg)
llex.c:447: llex (locvar)

A quick test (int --> ptrdiff_t in those 4 places) _seems_ to fix it,
but there might be more…

-- Marco

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters…

Daurnimator
On Wed, 12 Dec 2018 at 13:55, nobody <[hidden email]> wrote:

>
> …like [===…===[a]===…===] cause all sorts of weird behavior (incl.
> crashes) in 5.3.x / 5.4 (git HEAD).  Try
>
> local eqs = ("="):rep(0x3ffffffe)
> local code = "return ["..eqs.."[a]"..eqs.."]"
> print(#assert(load(code))())
>
> It might segfault, it might print a huge number instead of 1 (or, minus
> the '#', print 'a]===…===]<<garbage>>' instead of a single 'a').
>
> And with ASAN (-fsanitize=address) you get told more gory details, like
> that it's trying to do a 0x1_0000_0001-long read (4GiB plus 1 Byte… that
> read size is stable over the next couple extra ='s).
>
> I haven't deeply looked into this, from a cursory glance it might be
> (partly?) caused by the separator length being tracked in an `int`,
> which affects (in 5.4/git)
>
> llex.c:251: skip_sep (return, locvar on next line)
> llex.c:264: read_long_string (funarg)
> llex.c:447: llex (locvar)
>
> A quick test (int --> ptrdiff_t in those 4 places) _seems_ to fix it,
> but there might be more…
>
> -- Marco

I have had this sitting in my drafts for over a year, I think this is
the same bug?:

The overflow is in https://www.lua.org/source/5.3/llex.c.html#skip_sep
Generate test file:

local f = assert(io.open("bug.lua", "w"))
local step = 1<<20
assert(f:write("--["))
local eq = string.rep("=", step)
for i=1, 2<<31, step do
    assert(f:write(eq))
end
assert(f:write("["))
f:close()

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters…

nobody

On 13/12/2018 00.08, Daurnimator wrote:> I have had this sitting in my
drafts for over a year, I think this is

> the same bug?:
>
> The overflow is in https://www.lua.org/source/5.3/llex.c.html#skip_sep
> Generate test file:
>
> local f = assert(io.open("bug.lua", "w"))
> local step = 1<<20
> assert(f:write("--["))
> local eq = string.rep("=", step)
> for i=1, 2<<31, step do
>      assert(f:write(eq))
> end
> assert(f:write("["))
> f:close()

_Maybe_, although long comments didn't cause problems in my tests (as in
it properly skipped the comment and ASAN didn't complain.)  And I think
it's the computation in read_long_string just below where it goes

seminfo->ts = luaX_newstring(ls, luaZ_buffer(ls->buff) + (2 + sep),
                                  luaZ_bufflen(ls->buff) - 2*(2 + sep));

because it's only ++'ing in skip_sep and 0x3ffffffe still fits (tho
yours doesn't).

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters…

Roberto Ierusalimschy
> _Maybe_, although long comments didn't cause problems in my tests (as in
> it properly skipped the comment and ASAN didn't complain.)  And I think
> it's the computation in read_long_string just below where it goes
>
> seminfo->ts = luaX_newstring(ls, luaZ_buffer(ls->buff) + (2 + sep),
>                                  luaZ_bufflen(ls->buff) - 2*(2 + sep));
>
> because it's only ++'ing in skip_sep and 0x3ffffffe still fits (tho
> yours doesn't).

Right. More exactly, the overflow is in the computation 2*(2 + sep).
With 'sep' being 0x3ffffffe, this expression results in 0x80000000,
which wraps to a negative number and then is added (instead of being
subtracted) to the buffer length.

Strangely, neither gcc nor clang, with option '-ftrapv', detected
this overflow.

Instead of using a larger type to count 'sep', it seems easier to just
limit the maximum number of '=' in a long bracket. I don't think people
will mind a limit of 1000.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters…

Egor Skriptunoff-2
On Thu, Dec 13, 2018 at 7:15 PM Roberto Ierusalimschy wrote:
it seems easier to just
limit the maximum number of '=' in a long bracket. I don't think people
will mind a limit of 1000.

IMO, it's not a good idea.
If this limit is N, then minimal size of non-quotable string is about 0.5*N^2
Existence of non-quotable strings may crash some Lua programs by maliciously crafted input.
N should be at least 10^6 to make sure non-quotable strings are unrealistically huge.
 

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters…

Coda Highland
On Thu, Dec 13, 2018 at 3:02 PM Egor Skriptunoff
<[hidden email]> wrote:

>
> On Thu, Dec 13, 2018 at 7:15 PM Roberto Ierusalimschy wrote:
>>
>> it seems easier to just
>> limit the maximum number of '=' in a long bracket. I don't think people
>> will mind a limit of 1000.
>
>
> IMO, it's not a good idea.
> If this limit is N, then minimal size of non-quotable string is about 0.5*N^2
> Existence of non-quotable strings may crash some Lua programs by maliciously crafted input.
> N should be at least 10^6 to make sure non-quotable strings are unrealistically huge.
>

A string from anywhere but a literal in the source code isn't affected
by this issue. If maliciously crafted input can be a problem in your
application, then that means you're running user-supplied scripts. And
if you're running user-supplied scripts, then it wouldn't crash here
-- the load() call would just return an error. And if you're not
dealing with load errors when you're dealing with user-supplied
scripts, that's your own fault, not Lua's.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters …

David Favro
In reply to this post by Egor Skriptunoff-2


On December 13, 2018 9:01:52 PM UTC, Egor Skriptunoff <[hidden email]> wrote:
>On Thu, Dec 13, 2018 at 7:15 PM Roberto Ierusalimschy wrote:
>> it seems easier to just
>> limit the maximum number of '=' in a long bracket. I don't think
>people
>> will mind a limit of 1000.
>
>IMO, it's not a good idea.
>If this limit is N, then minimal size of non-quotable string is about
>0.5*N^2

What's a "non-quotable string"?

Am I missing something or can't any string be represented as a literal with e.g. double-quote (") as delimiter and appropriate escaping of special characters?  If so, I don't see your definition of "non-quotable", could you elaborate?

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters …

Coda Highland
On Thu, Dec 13, 2018 at 3:32 PM David Favro <[hidden email]> wrote:

>
>
>
> On December 13, 2018 9:01:52 PM UTC, Egor Skriptunoff <[hidden email]> wrote:
> >On Thu, Dec 13, 2018 at 7:15 PM Roberto Ierusalimschy wrote:
> >> it seems easier to just
> >> limit the maximum number of '=' in a long bracket. I don't think
> >people
> >> will mind a limit of 1000.
> >
> >IMO, it's not a good idea.
> >If this limit is N, then minimal size of non-quotable string is about
> >0.5*N^2
>
> What's a "non-quotable string"?
>
> Am I missing something or can't any string be represented as a literal with e.g. double-quote (") as delimiter and appropriate escaping of special characters?  If so, I don't see your definition of "non-quotable", could you elaborate?
>

You're missing that Lua has another form of string literal (sometimes
called "raw" strings) that treats its contents as verbatim instead of
requiring special characters to be escaped. In this context,
"non-quotable" means "a string that cannot be expressed using a raw
string literal".

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters …

David Favro


On December 13, 2018 9:43:33 PM UTC, Coda Highland <[hidden email]> wrote:

>On Thu, Dec 13, 2018 at 3:32 PM David Favro <[hidden email]>
>wrote:
>
>>
>>
>>
>>   On December 13, 2018 9:01:52 PM UTC, Egor Skriptunoff
>> <[hidden email]> wrote:
>>   >On Thu, Dec 13, 2018 at 7:15 PM Roberto Ierusalimschy wrote:
>>   >> it seems easier to just
>>   >> limit the maximum number of '=' in a long bracket. I don't think
>>   >people
>>   >> will mind a limit of 1000.
>>   >
>>   >IMO, it's not a good idea.
>>   >If this limit is N, then minimal size of non-quotable string is
>about
>>   >0.5*N^2
>>
>>   What's a "non-quotable string"?
>>
>>   Am I missing something or can't any string be represented as a
>literal with
>> e.g. double-quote (") as delimiter and appropriate escaping of
>special
>> characters?  If so, I don't see your definition of "non-quotable",
>could you
>> elaborate?
>>
>
>
>You're missing that Lua has another form of string literal (sometimes
>called "raw" strings) that treats its contents as verbatim instead of
>requiring special characters to be escaped. In this context,
>"non-quotable" means "a string that cannot be expressed using a raw
>string literal".
>
>/s/ Adam

Yes, I kind of assumed that, perhaps I was being a little rhetorical, but I don't think that "non-quotable" is remotely an accurate description of such a string, not what people would normally assume that the phrase means, even in the context of a mailing-list thread on long-strings.  And, while I've no idea what Egor meant about vulnerabilities surrounding them, I am imagining some kind of issue with a serialization library that tries to represent strings in the VM as Lua string literals for external storage being fed unquotable strings.  In my experience, such libraries don't use long-strings and I don't think that string.format()'s %q does either.  My point being that *any* string can be represented as a "quoted" Lua string literal, so I still ask for clarification what does "non-quotable" mean in this context, and why would such a string pose a vulnerability?

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters …

Rena
In reply to this post by David Favro
On Thu, Dec 13, 2018, 16:32 David Favro <[hidden email] wrote:


On December 13, 2018 9:01:52 PM UTC, Egor Skriptunoff <[hidden email]> wrote:
>On Thu, Dec 13, 2018 at 7:15 PM Roberto Ierusalimschy wrote:
>> it seems easier to just
>> limit the maximum number of '=' in a long bracket. I don't think
>people
>> will mind a limit of 1000.
>
>IMO, it's not a good idea.
>If this limit is N, then minimal size of non-quotable string is about
>0.5*N^2

What's a "non-quotable string"?

Am I missing something or can't any string be represented as a literal with e.g. double-quote (") as delimiter and appropriate escaping of special characters?  If so, I don't see your definition of "non-quotable", could you elaborate?

A string that starts with `[====[`, assuming the limit of `=` in a delimiter were 4.

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters …

Tim Hill
In reply to this post by David Favro


> On Dec 13, 2018, at 2:09 PM, David Favro <[hidden email]> wrote:
>
>
> Yes, I kind of assumed that, perhaps I was being a little rhetorical, but I don't think that "non-quotable" is remotely an accurate description of such a string, not what people would normally assume that the phrase means, even in the context of a mailing-list thread on long-strings.  And, while I've no idea what Egor meant about vulnerabilities surrounding them, I am imagining some kind of issue with a serialization library that tries to represent strings in the VM as Lua string literals for external storage being fed unquotable strings.  In my experience, such libraries don't use long-strings and I don't think that string.format()'s %q does either.  My point being that *any* string can be represented as a "quoted" Lua string literal, so I still ask for clarification what does "non-quotable" mean in this context, and why would such a string pose a vulnerability?
>

The official Lua term for these is “long format literal strings” (Lua Ref Manual 5.3). And +1 that with appropriate escaping both long and non-long (short?) literals can represent any sequence of bytes.

—Tim


Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters …

David Favro
In reply to this post by Rena


On December 13, 2018 10:10:40 PM UTC, Rena <[hidden email]> wrote:

>On Thu, Dec 13, 2018, 16:32 David Favro <[hidden email] wrote:
>
>>
>> On December 13, 2018 9:01:52 PM UTC, Egor Skriptunoff
>> <[hidden email]> wrote:
>>> IMO, it's not a good idea.
>>> If this limit is N, then minimal size of non-quotable string is
>about
>>> 0.5*N^2
>
>> What's a "non-quotable string"?
>> Am I missing something or can't any string be represented as a
>literal
>> with e.g. double-quote (") as delimiter and appropriate escaping of
>> special characters? If so, I don't see your definition of
>> "non-quotable", could you elaborate?
>
>
>A string that starts with `[====[`, assuming the limit of `=` in a
>delimiter were 4.

Oh my goodness, I must have flunked English class, my point seems to be completely lost.  I know of the existence of long string literals and immediately saw that imposing a limit on the number of '=' would mean that certain (nonsensical, as Roberto pointed out) strings would have to be expressed as a different form of literal.

Given that as I pointed out in my first message, *any* string (including your example, even if n was 0) can be represented as a quoted Lua literal, why would we call it a "non-quotable string" (which, perhaps I'm daft but to me means there there exists no Lua string literal which represents this string).

The reason I tried to make that point is that Egor said (without specifying why or how) that vulnerabilities would be possible if a "non-quotable string" exists.  It seems to me that if this is at all conceivable, it most likely would be so if someone mistakenly thought that the inability to be expressed as a long string literal meant inability to be expressed as a string literal whatsoever.  I just wanted Egor to acknowledge that these "non-quotable strings" can be represented as quoted string literals and explain, with that in mind, how such a vulnerability would work.

Apparently I expressed myself poorly, so since my point clearly is lost, let's abandon it, please!  Perhaps all of us besides Egor could stop speculating what he meant, and he can just tell us, why would existence of a string unable to be expressed as a long string literal (I think we all know what I mean by that term) pose a potential vulnerability?

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters …

Gabriel Bertilson
In reply to this post by Rena
[====[[====[]====] works; perhaps you meant ]====]? But actually
]====] can be represented too.

The requirement for representing a string as a long string literal is
that you choose a pair of delimiters such that the closing delimiter
only occurs at the end of the combination of string and closing
delimiter. So if 4 is the maximum number of equals signs in a
delimiter, then if the string is ]====], you can use delimiters with 1
to 3 equals signs: [=[]====]]=], [==[]====]]==], [===[]====]]===]. And
you have to represent the string ] with 1 to 4 equals signs, because
with zero equals signs, [[]]], the closing delimiter appears too
early, and a stray ] is left for the parser to choke on.

So a string that cannot be represented as a long string literal, given
that N is the limit, is one containing every closing delimiter with 0
to N equals signs. If the limit is 4, an unrepresentable string would
be ]]]=]]==]]===]]====]. The minimum length of such a string would be
triangular_number(N) + 2*(N+1). If the limit is 1000, the
unrepresentable string would be at least 502502 bytes long if I
calculated it right. (See the formula below.) It would be very unusual
for such a string to occur in a program.

function unrepr_str_size(limit) local triangle = 0 for i = 1, limit do
triangle = triangle + i end return triangle + 2 * (limit + 1) end

— Gabriel

On Thu, Dec 13, 2018 at 4:11 PM Rena <[hidden email]> wrote:

>
> On Thu, Dec 13, 2018, 16:32 David Favro <[hidden email] wrote:
>>
>>
>>
>> On December 13, 2018 9:01:52 PM UTC, Egor Skriptunoff <[hidden email]> wrote:
>> >On Thu, Dec 13, 2018 at 7:15 PM Roberto Ierusalimschy wrote:
>> >> it seems easier to just
>> >> limit the maximum number of '=' in a long bracket. I don't think
>> >people
>> >> will mind a limit of 1000.
>> >
>> >IMO, it's not a good idea.
>> >If this limit is N, then minimal size of non-quotable string is about
>> >0.5*N^2
>>
>> What's a "non-quotable string"?
>>
>> Am I missing something or can't any string be represented as a literal with e.g. double-quote (") as delimiter and appropriate escaping of special characters?  If so, I don't see your definition of "non-quotable", could you elaborate?
>
>
> A string that starts with `[====[`, assuming the limit of `=` in a delimiter were 4.
>

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters…

nobody
In reply to this post by Roberto Ierusalimschy
On 13/12/2018 17.14, Roberto Ierusalimschy wrote:
> Instead of using a larger type to count 'sep', it seems easier to just
> limit the maximum number of '=' in a long bracket. I don't think people
> will mind a limit of 1000.

[semi-ignoring the other part of the thread b/c it's confusing]

I think something like 0x2000_0000 (or 0x2000 in case 16-bit ints are a
thing) is better, because generated code is something people do.

A simple way to include any piece of textual data in a generated Lua
script is to put it in a long string.  (You don't have "%q" outside of
Lua.)  The simplest way to do that is to count the largest sequence of
='s in it and add one more in the separator.

While it's already fairly unlikely to hit a limit of 1000, in case it
_does_ get hit, the code suddenly breaks, and you'll need a more complex
solution.  Therefore, using the largest safe, "round" (easy to spot as
magic) number reduces the potential for future work at ≈zero extra cost.
(And with 0x2000_0000, I think anything that hits _that_ limit can
safely be disregarded as malicious.)

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters …

Philippe Verdy
In reply to this post by Gabriel Bertilson
the facility for putting "arbitrary" number of = signs in long string delimiters is NOT made for escaping arbitrary code from random sources.
It is only a facility for commenting source programs.
Don't abuse it! If you need to espace properly data without huge cost, it's just enough to use a small number of = signs (not more than 5 should be enough) and then escape only the occurences of 5 equal signs by breaking it after the 4th one occuring after a ] and if it's followed by a ]. This is a rare case in actual data.

To encode  "=====",
you'd output [====[=========]

To encode  "====]",
you'd output [====[====]]====]

To encode  "]====]" (this is one of the worst case scenarii),
you'd output [====[]====]====][====[]====]

I.e. close the current delimiter and reopen it immediately. This should create a single string, just like  "ABC====""=ABC".

Allowing string literals (using either short or long delimiters) to be repeated without any operator between them, should be parsed as a single literal value (and you could as well mix the quotation styles between each part, this should not change at all the meaning). No change is needed in the Lua lexer, the only change is in the parser: accept "A" "B" to represent the same literal value as "AB" (the spaces between the two parts could as well be newlines, without needing to use an intermediate + operator which would be evaluated at runtime.

The interest of the "long quotation" marks is that it can occur much less often so the escaping worstcase encoded length is better with delimiters that are 6 character long like here instead of just 1-character long. But you'll realize immediately that the worst case always exists and the its total encoded length is now a bit longer than single character escaping (the increase is only in the first leading and last trainling delimiter, but not in the middle part whose size is doubled by adding the escapes.

But the worst case will most likely just occur less often (it occurs only when encoding source sequences containing ONLY an exact repetition of the trailing delimiter.



Le ven. 14 déc. 2018 à 01:06, Gabriel Bertilson <[hidden email]> a écrit :
[====[[====[]====] works; perhaps you meant ]====]? But actually
]====] can be represented too.

The requirement for representing a string as a long string literal is
that you choose a pair of delimiters such that the closing delimiter
only occurs at the end of the combination of string and closing
delimiter. So if 4 is the maximum number of equals signs in a
delimiter, then if the string is ]====], you can use delimiters with 1
to 3 equals signs: [=[]====]]=], [==[]====]]==], [===[]====]]===]. And
you have to represent the string ] with 1 to 4 equals signs, because
with zero equals signs, [[]]], the closing delimiter appears too
early, and a stray ] is left for the parser to choke on.

So a string that cannot be represented as a long string literal, given
that N is the limit, is one containing every closing delimiter with 0
to N equals signs. If the limit is 4, an unrepresentable string would
be ]]]=]]==]]===]]====]. The minimum length of such a string would be
triangular_number(N) + 2*(N+1). If the limit is 1000, the
unrepresentable string would be at least 502502 bytes long if I
calculated it right. (See the formula below.) It would be very unusual
for such a string to occur in a program.

function unrepr_str_size(limit) local triangle = 0 for i = 1, limit do
triangle = triangle + i end return triangle + 2 * (limit + 1) end

— Gabriel

On Thu, Dec 13, 2018 at 4:11 PM Rena <[hidden email]> wrote:
>
> On Thu, Dec 13, 2018, 16:32 David Favro <[hidden email] wrote:
>>
>>
>>
>> On December 13, 2018 9:01:52 PM UTC, Egor Skriptunoff <[hidden email]> wrote:
>> >On Thu, Dec 13, 2018 at 7:15 PM Roberto Ierusalimschy wrote:
>> >> it seems easier to just
>> >> limit the maximum number of '=' in a long bracket. I don't think
>> >people
>> >> will mind a limit of 1000.
>> >
>> >IMO, it's not a good idea.
>> >If this limit is N, then minimal size of non-quotable string is about
>> >0.5*N^2
>>
>> What's a "non-quotable string"?
>>
>> Am I missing something or can't any string be represented as a literal with e.g. double-quote (") as delimiter and appropriate escaping of special characters?  If so, I don't see your definition of "non-quotable", could you elaborate?
>
>
> A string that starts with `[====[`, assuming the limit of `=` in a delimiter were 4.
>

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters…

dyngeccetor8
In reply to this post by nobody
On 12/14/18 3:12 AM, nobody wrote:
> I think something like 0x2000_0000 (or 0x2000 in case 16-bit ints are a thing)
> is better, because generated code is something people do.

I'm for 2GiB limit too.

> A simple way to include any piece of textual data in a generated Lua script is
> to put it in a long string.  (You don't have "%q" outside of Lua.)  The simplest
> way to do that is to count the largest sequence of ='s in it and add one more in
> the separator.

Finding longest "=" sequence is arguably simplest and not optimal. I prefer
finding shortest long closing quote: min(N): ("]" "=" ^ N "]") not in string.

This algorithm have no worst cases.

Sample implementation at
https://github.com/martin-eden/workshop/blob/master/formats/lua/quote_string/intact.lua#L16

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters…

Roberto Ierusalimschy
> Finding longest "=" sequence is arguably simplest and not optimal. I prefer
> finding shortest long closing quote: min(N): ("]" "=" ^ N "]") not in string.
>
> This algorithm have no worst cases.
>
> Sample implementation at
> https://github.com/martin-eden/workshop/blob/master/formats/lua/quote_string/intact.lua#L16

It doesn't seem to work for "]]a]=". (Result is "[=[]]a]=]=]"...)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters…

dyngeccetor8
On 12/14/18 7:32 PM, Roberto Ierusalimschy wrote:

>> Finding longest "=" sequence is arguably simplest and not optimal. I prefer
>> finding shortest long closing quote: min(N): ("]" "=" ^ N "]") not in string.
>>
>> This algorithm have no worst cases.
>>
>> Sample implementation at
>> https://github.com/martin-eden/workshop/blob/master/formats/lua/quote_string/intact.lua#L16
>
> It doesn't seem to work for "]]a]=". (Result is "[=[]]a]=]=]"...)
>
> -- Roberto

Thank you for pointing at this case!

I fixed algorithm by concatenating "]" to source string before determining
variable number of "=" required.

So now "]]a]=" transformed to "]]a]=]" and ("[==[", "]==]") are used as proper
quotes.

Sadly, link now points to current version. Link to original (bugged) version is

https://github.com/martin-eden/workshop/blob/39322444d84e7d93c6f73e563a4e78989a85f195/formats/lua/quote_string/intact.lua

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters…

Andrew Gierth
>>>>> "dyngeccetor8" == dyngeccetor8  <[hidden email]> writes:

 >>> https://github.com/martin-eden/workshop/blob/master/formats/lua/quote_string/intact.lua#L16

 dyngeccetor8> Sadly, link now points to current version.

I don't think this is quite right for the case where the string starts
with \r not followed by \n.

According to the docs, long strings don't preserve the exact bytes of
newline sequences, but they recognize any of \r, \n, \r\n, \n\r as being
newlines. So if the string starts with \r alone, your code does not
insert a \n, and the \r will be eaten when reading the string back in.
If on the other hand you added a \n, then the \n\r would still be
treated as a single newline, and it would still be eaten.

I think what you need to do is: if the first character of the string is
either \r or \n, then duplicate it.

--
Andrew.

Reply | Threaded
Open this post in threaded view
|

Re: Bug: long strings with REALLY long delimiters…

Sam Pagenkopf
For security issues, I'd blame the person generating code using malicious foreign input and then running it or passing it around. There are less elaborate ways to be exploited.

Let me bikeshed. I can't picture a normal file that needs more than say, 100. Someone that wants 10,000 should think of new solutions, and I would advocate a shorter limit. Then again, I've only ever written lua for humans.

On Fri, Dec 14, 2018 at 3:33 PM Andrew Gierth <[hidden email]> wrote:
>>>>> "dyngeccetor8" == dyngeccetor8  <[hidden email]> writes:

 >>> https://github.com/martin-eden/workshop/blob/master/formats/lua/quote_string/intact.lua#L16

 dyngeccetor8> Sadly, link now points to current version.

I don't think this is quite right for the case where the string starts
with \r not followed by \n.

According to the docs, long strings don't preserve the exact bytes of
newline sequences, but they recognize any of \r, \n, \r\n, \n\r as being
newlines. So if the string starts with \r alone, your code does not
insert a \n, and the \r will be eaten when reading the string back in.
If on the other hand you added a \n, then the \n\r would still be
treated as a single newline, and it would still be eaten.

I think what you need to do is: if the first character of the string is
either \r or \n, then duplicate it.

--
Andrew.

12