embedding zeroes

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

embedding zeroes

Rebel Neurofog
Hi, people of the list!

I'd like to ask whether this quotation improvement is guaranteed or
not to work in Lua:
output:write ((string.gsub (string.format ("bootstrap_image = %q",
binary), "\\000", "\0")))
-- Zero bytes are very common for binary formats so it's great to
reduce it's encoded length.

The manual says `Strings in Lua may contain any 8-bit value, including
embedded zeros, which can be specified as '\0'.`
But can it be specified as zero itself?
It works fine on my host, but is it guaranteed to work elsewhere?

Reply | Threaded
Open this post in threaded view
|

Re: embedding zeroes

Patrick Rapin
> I'd like to ask whether this quotation improvement is guaranteed or
> not to work in Lua:
> output:write ((string.gsub (string.format ("bootstrap_image = %q",
> binary), "\\000", "\0")))

To my knowledge, yes.
But I don't get the point to use the %q format modifier along with
that substitution.
If you really want the shortest possible result, you can just use a
long string like in:

output:write (string.format ("bootstrap_image = [=[%s]=]", binary))

After having verified that neither [=[ nor ]=] appear in binary (or
test dynamically the number of = signs to use).

Reply | Threaded
Open this post in threaded view
|

Re: embedding zeroes

Rebel Neurofog
On Mon, Dec 19, 2011 at 7:43 PM, Patrick Rapin <[hidden email]> wrote:

>> I'd like to ask whether this quotation improvement is guaranteed or
>> not to work in Lua:
>> output:write ((string.gsub (string.format ("bootstrap_image = %q",
>> binary), "\\000", "\0")))
>
> To my knowledge, yes.
> But I don't get the point to use the %q format modifier along with
> that substitution.
> If you really want the shortest possible result, you can just use a
> long string like in:
>
> output:write (string.format ("bootstrap_image = [=[%s]=]", binary))
>
> After having verified that neither [=[ nor ]=] appear in binary (or
> test dynamically the number of = signs to use).
>

It doesn't work at least due to newline mangling.

Reply | Threaded
Open this post in threaded view
|

Re: embedding zeroes

Roberto Ierusalimschy
In reply to this post by Rebel Neurofog
> I'd like to ask whether this quotation improvement is guaranteed or
> not to work in Lua:
> output:write ((string.gsub (string.format ("bootstrap_image = %q",
> binary), "\\000", "\0")))
> -- Zero bytes are very common for binary formats so it's great to
> reduce it's encoded length.
>
> The manual says `Strings in Lua may contain any 8-bit value, including
> embedded zeros, which can be specified as '\0'.`
> But can it be specified as zero itself?
> It works fine on my host, but is it guaranteed to work elsewhere?

You can use '\0', but there is a problem if the next element is a digit.
"\0007" is '\0' followed by '7', but "\07" is something different. You
can avoid this problem with a frontier pattern:

  output:write ((string.gsub (string.format ("bootstrap_image = %q",
  binary), "\\000%f[%D]", "\0")))


(Anyway note that, as Rapin pointed out, there may be more efficient
ways to store your data.)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: embedding zeroes

Rebel Neurofog
> You can use '\0', but there is a problem if the next element is a digit.
> "\0007" is '\0' followed by '7', but "\07" is something different. You
> can avoid this problem with a frontier pattern:
>
>  output:write ((string.gsub (string.format ("bootstrap_image = %q",
>  binary), "\\000%f[%D]", "\0")))

You meant of course this (2 backslashes before last zero character):
output:write ((string.gsub (string.format ("bootstrap_image = %q",
binary), "\\000%f[%D]", "\\0")))

Thanks for the tip, anyway))
But I meant what I wrote: putting zero byte like string.byte (0), not
putting backslash and zero character.
Again, it works fine for me but will it work everywhere? I couldn't
find an answer in manual or the list.

> (Anyway note that, as Rapin pointed out, there may be more efficient
> ways to store your data.)

How exactly?
Manual says:
`Literals in this bracketed form can run for several lines,
do not interpret any escape sequences, and ignore long brackets of any
other level.`
And that means, I can't encode newlines unchanged ("\r\n" turns to "\n").

Reply | Threaded
Open this post in threaded view
|

Re: embedding zeroes

Roberto Ierusalimschy
> > You can use '\0', but there is a problem if the next element is a digit.
> > "\0007" is '\0' followed by '7', but "\07" is something different. You
> > can avoid this problem with a frontier pattern:
> >
> >  output:write ((string.gsub (string.format ("bootstrap_image = %q",
> >  binary), "\\000%f[%D]", "\0")))
>
> You meant of course this (2 backslashes before last zero character):
> output:write ((string.gsub (string.format ("bootstrap_image = %q",
> binary), "\\000%f[%D]", "\\0")))
>
> Thanks for the tip, anyway))
> But I meant what I wrote: putting zero byte like string.byte (0), not
> putting backslash and zero character.

Sorry for my misunderstanding.


> Again, it works fine for me but will it work everywhere? I couldn't
> find an answer in manual or the list.

>From the Lua part it should work everywhere, but I am not sure whether
all file systems will be happy with embedded zeros inside text files.


> > (Anyway note that, as Rapin pointed out, there may be more efficient
> > ways to store your data.)
>
> How exactly?

For instance, you may represent other "control" bytes as themselves, as
you intended to do with \0. As long as Lua is concerned, any byte inside
a double-quoted string that is different from escape '\', double quote
'"', and new lines (both '\r' and '\n') is OK.  As with '\0', you just
have to worry about the file system.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: embedding zeroes

Rebel Neurofog
>> Again, it works fine for me but will it work everywhere? I couldn't
>> find an answer in manual or the list.
>
> >From the Lua part it should work everywhere, but I am not sure whether
> all file systems will be happy with embedded zeros inside text files.

That's good. Thanks!

>> > (Anyway note that, as Rapin pointed out, there may be more efficient
>> > ways to store your data.)
>>
>> How exactly?
>
> For instance, you may represent other "control" bytes as themselves, as
> you intended to do with \0. As long as Lua is concerned, any byte inside
> a double-quoted string that is different from escape '\', double quote
> '"', and new lines (both '\r' and '\n') is OK.  As with '\0', you just
> have to worry about the file system.
>
> -- Roberto
>

%q works the same way except for control characters and zero byte.

Anyway, I would add this to manual (both for strings parsing behavior)
for completeness.

Thanks, again!

Reply | Threaded
Open this post in threaded view
|

Re: embedding zeroes

Patrick Rapin
In reply to this post by Rebel Neurofog
>> output:write (string.format ("bootstrap_image = [=[%s]=]", binary))

> It doesn't work at least due to newline mangling.

So I suppose your system is Windows.
On Windows, indeed, \r\n is turned into \n only because luaL_loadfile
opens Lua files in text mode.
This is a problem if the script contains binary data like with [[ ]].
But is that case, I am a bit surprised that you do not hit a problem
with character 26 (^Z) also.
Because Lua 5.1.4 does *not* escape that character with the "%q"
format, which is treated as an EOF marker when the file is read back
in text mode.
BTW Lua 5.2.0 has a better escaping scheme: all control characters are
turned to \nnn form, with the leading zeros only inserted when
necessary.

Reply | Threaded
Open this post in threaded view
|

Re: embedding zeroes

Rebel Neurofog
On Tue, Dec 20, 2011 at 2:37 AM, Patrick Rapin <[hidden email]> wrote:
>>> output:write (string.format ("bootstrap_image = [=[%s]=]", binary))
>
>> It doesn't work at least due to newline mangling.
>
> So I suppose your system is Windows.

No. It isn't. More of that: in my case the code never gets to file
where it executed.
Lua 5.2 manual explicitly describes it:
`Any kind of end-of-line sequence (carriage return, newline, carriage
return followed by newline, or newline followed by carriage return) is
converted to a simple newline.`

> BTW Lua 5.2.0 has a better escaping scheme: all control characters are
> turned to \nnn form, with the leading zeros only inserted when
> necessary.

Yeah, I've noticed that. So I decided to avoid using %q and escaping
only ", \, CR and NL by hands.

Reply | Threaded
Open this post in threaded view
|

Re: embedding zeroes

Patrick Rapin
> Lua 5.2 manual explicitly describes it:
> `Any kind of end-of-line sequence (carriage return, newline, carriage
> return followed by newline, or newline followed by carriage return) is
> converted to a simple newline.`

Sorry, yes, you are right. I didn't realize that.
Maybe because Lua 5.1 manual does not explain this conversion.

But now I have a question. Since Lua converts any type of newlines
into "\n" (in inclinenumber function), why does luaL_loadfilex opens
the script file in text mode ?
This adds complexity in the function since it has to reopen the file
in binary mode when the file is precompiled.
And there is (probably) a performance penalty since the new line
conversion is performed twice.

Reply | Threaded
Open this post in threaded view
|

Re: embedding zeroes

Luiz Henrique de Figueiredo
> why does luaL_loadfilex opens the script file in text mode?

Because scripts are supposed to be text. If your OS allows arbitrary data
in text files, that's fine.

If you're suggesting that script files should always be opened in binary
mode, this cannot be done in ANSI C unless you know what to expect from
text files in your OS. ANSI C seems to cater for strange filesystems that
have fixed-size records and so lines or blocks may have unknown padding
and other control characters such as ^Z...