Feature proposal: \x## notation in strings

classic Classic list List threaded Threaded
100 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

Feature proposal: \x## notation in strings

Ico Doornekamp
Hi all,

It's pretty well possible that this has been brought up in the past, but
I'd still like to bring this up:

I'm using Lua a lot for embedded software, which tends to involve quite
a bit of working with low-level protocols, bits and binary data. One of
the things I'd personally like to see added in Lua to make my life a bit
easier is the C-like notation for 'binary' data in strings using
hexidecimal escape codes like \xNN.

There's probably a good reason why this is not already part of lua, can
anybody shed some light on this ?

Thanks,

Ico

-- 
:wq
^X^Cy^K^X^C^C^C^C

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Alexander Gladysh
> I'm using Lua a lot for embedded software, which tends to involve quite
> a bit of working with low-level protocols, bits and binary data. One of
> the things I'd personally like to see added in Lua to make my life a bit
> easier is the C-like notation for 'binary' data in strings using
> hexidecimal escape codes like \xNN.

Probably this is a silly question, but is decimal notation not
acceptable? I guess hexadecimals may be a bit more convenient in some
cases, but they should be more or less equivalent.

To quote manual:

    A character in a string can also be specified by its numerical value
    using the escape sequence \ddd, where ddd is a sequence
    of up to three decimal digits.

Alexander.

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Peter Cawley
When hex is the accepted norm, it is alot easier to read hex than
decimal. For example, in the correct content, I would recognise \x90
and \xF4 as the x86 NOP and HLT instructions respectively. \144 and
\244 would be much less obvious in that context. Yes, they are
equivalent, but it is easier to write the code using the hex
constants, and easier to read the code again later when the constants
are in hex.

On Mon, Mar 9, 2009 at 10:06 PM, Alexander Gladysh <[hidden email]> wrote:
>> I'm using Lua a lot for embedded software, which tends to involve quite
>> a bit of working with low-level protocols, bits and binary data. One of
>> the things I'd personally like to see added in Lua to make my life a bit
>> easier is the C-like notation for 'binary' data in strings using
>> hexidecimal escape codes like \xNN.
>
> Probably this is a silly question, but is decimal notation not
> acceptable? I guess hexadecimals may be a bit more convenient in some
> cases, but they should be more or less equivalent.
>
> To quote manual:
>
>    A character in a string can also be specified by its numerical value
>    using the escape sequence \ddd, where ddd is a sequence
>    of up to three decimal digits.
>
> Alexander.
>


Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Ico Doornekamp
In reply to this post by Alexander Gladysh
* On 2009-03-09 Alexander Gladysh <[hidden email]> wrote  :

> > I'm using Lua a lot for embedded software, which tends to involve quite
> > a bit of working with low-level protocols, bits and binary data. One of
> > the things I'd personally like to see added in Lua to make my life a bit
> > easier is the C-like notation for 'binary' data in strings using
> > hexidecimal escape codes like \xNN.
> 
> Probably this is a silly question, but is decimal notation not
> acceptable? I guess hexadecimals may be a bit more convenient in some
> cases, but they should be more or less equivalent.

Decimal is perfectly acceptable indeed, although hex would be a nice
addition, and I see no obvious reason why this could not be added to Lua
with minimal effort. I could be missing something here, of course.

In the embedded field hexadecimal notation is very common, especially in
things like protocols or hardware registers where every bit can have a
separate meaning. Converting mentally between bits and hex is quite
trivial once you get the hang of it, but converting between bits and
decimal is much harder to do.

Ico


(Speaking of bits: I also like to make my vote heard for the addition of
bit-handling functions, or even better, operators to the lua core. I'd
kill for some high-performance bit clear, set, shift, and, or, and
similar operations!)


-- 
:wq
^X^Cy^K^X^C^C^C^C

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Sam Roberts
On Mon, Mar 9, 2009 at 3:18 PM, Ico <[hidden email]> wrote:
> In the embedded field hexadecimal notation is very common, especially in
> things like protocols or hardware registers where every bit can have a
> separate meaning. Converting mentally between bits and hex is quite
> trivial once you get the hang of it, but converting between bits and
> decimal is much harder to do.

I'd very much like this, too.

Most protocol and data dumpers dump data as hex. It's the standard way
of expressing binary data.

Some old tools use octal, it's even the default for C character codes,
I think as a hold-over from the early 7 bit architectures unix was
developed on? Anyhow, its also fairly easy to convert octal to and
from bits.

Decimal notation doesn't have anything to recommend it, as far as I
can tell. Anybody who is inserting bytes using numerical values would
want to be using hex.

> (Speaking of bits: I also like to make my vote heard for the addition of
> bit-handling functions, or even better, operators to the lua core. I'd
> kill for some high-performance bit clear, set, shift, and, or, and
> similar operations!)

http://bitop.luajit.org/

Cheers,
Sam

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Vaughan McAlley
I'm working with MIDI data which makes no sense at all in decimal. I
was sufficiently motivated apply and debug an old patch to the new
version of llex.c. It's not attractive but it works:

http://mcalley.net.au/lua/llex.c

Cheers,
Vaughan

2009/3/10 Sam Roberts <[hidden email]>:
> On Mon, Mar 9, 2009 at 3:18 PM, Ico <[hidden email]> wrote:
>> In the embedded field hexadecimal notation is very common, especially in
>> things like protocols or hardware registers where every bit can have a
>> separate meaning. Converting mentally between bits and hex is quite
>> trivial once you get the hang of it, but converting between bits and
>> decimal is much harder to do.
>
> I'd very much like this, too.
>
> Most protocol and data dumpers dump data as hex. It's the standard way
> of expressing binary data.
>
> Some old tools use octal, it's even the default for C character codes,
> I think as a hold-over from the early 7 bit architectures unix was
> developed on? Anyhow, its also fairly easy to convert octal to and
> from bits.
>
> Decimal notation doesn't have anything to recommend it, as far as I
> can tell. Anybody who is inserting bytes using numerical values would
> want to be using hex.
>
>> (Speaking of bits: I also like to make my vote heard for the addition of
>> bit-handling functions, or even better, operators to the lua core. I'd
>> kill for some high-performance bit clear, set, shift, and, or, and
>> similar operations!)
>
> http://bitop.luajit.org/
>
> Cheers,
> Sam
>

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Luiz Henrique de Figueiredo
In reply to this post by Ico Doornekamp
> There's probably a good reason why this is not already part of lua, can
> anybody shed some light on this ?

To avoid bloat and also because the perfect place for such a feature is
in a text-encoding library (which would also handle base64, ascii85, etc.)
Also, it is trivial to write it in Lua:

local XT={}
for c=0,255 do
	XT[string.format("%02X",c)]=string.char(c)
	XT[string.format("%02x",c)]=string.char(c)
end

function X(s)
	return (string.gsub(s,"(..)",XT))
end

print(X"4c75612e6f7267")

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Benjamin Tolputt-3
Luiz Henrique de Figueiredo wrote:
> To avoid bloat and also because the perfect place for such a feature is
> in a text-encoding library (which would also handle base64, ascii85, etc.)
> Also, it is trivial to write it in Lua:
>   

It may be trivial, but hexadecimal formats are pretty much a united
feature across all programming languages with escape sequences in
strings. I don't use Lua for binary work (primarily because I use Lua at
a higher level), but I am still quite surprised it does not support the
hexadecimal escape in some fashion (and would have raised the same issue
when I encountered it).

The other encodings you mentioned (base64, ascii85) are used in such a
small percentage of applications compared to hex escapes as to make that
comparison somewhat contrived (especially given the fact there is a
decimal number escape).


-- 
Regards,

Benjamin Tolputt
Analyst Programmer


Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

David Given
In reply to this post by Peter Cawley
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Peter Cawley wrote:
> When hex is the accepted norm, it is alot easier to read hex than
> decimal. For example, in the correct content, I would recognise \x90
> and \xF4 as the x86 NOP and HLT instructions respectively. \144 and
> \244 would be much less obvious in that context. Yes, they are
> equivalent, but it is easier to write the code using the hex
> constants, and easier to read the code again later when the constants
> are in hex.

Not to mention that the hex notation follows the Principle Of Least
Surprise, which the decimal one doesn't; take the following two sequences:

\102\101\100\099\098\097
\x92\x91\x90\x8f\x8e\x8d

One of these contains a nasty surprise...

- --
David Given
[hidden email]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJtdzTf9E0noFvlzgRAtMJAJ9tmh0BT0gFqb6tE9AB2hVf5MtXbgCffAMh
O3l2CgVMx/N7sER//aSEkWc=
=OkbZ
-----END PGP SIGNATURE-----

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Ico Doornekamp
In reply to this post by Luiz Henrique de Figueiredo

* On 2009-03-10 Luiz Henrique de Figueiredo <[hidden email]> wrote  :

> > There's probably a good reason why this is not already part of lua, can
> > anybody shed some light on this ?
> 
> To avoid bloat and also because the perfect place for such a feature is
> in a text-encoding library (which would also handle base64, ascii85, etc.)

I think the HEX notation differs from the others in your list in that it
is not ment as an encoding type, but just to mix in binary characters in
strings. Lua alread provides the decimal form, and I guess that the hex
form \xNN can be added to the lua core in no more then 10 lines of C
somewhere in llex.c:read_string().

> Also, it is trivial to write it in Lua:
>
> local XT={}
> for c=0,255 do
> 	XT[string.format("%02X",c)]=string.char(c)
> 	XT[string.format("%02x",c)]=string.char(c)
> end
> 
> function X(s)
> 	return (string.gsub(s,"(..)",XT))
> end
> 
> print(X"4c75612e6f7267")


True, but then again, there's almost nothing that can be trivially
written in lua :)

Your example differs from my original proposal, which is to add the
escape sequence to regular strings so that a string can contain both
normal text and hex bytes, like "text\0x3text"

Ico

-- 
:wq
^X^Cy^K^X^C^C^C^C

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Alexander Gladysh
In reply to this post by Luiz Henrique de Figueiredo
On Tue, Mar 10, 2009 at 3:47 AM, Luiz Henrique de Figueiredo
<[hidden email]> wrote:
>> There's probably a good reason why this is not already part of lua, can
>> anybody shed some light on this ?

> To avoid bloat and also because the perfect place for such a feature is
> in a text-encoding library (which would also handle base64, ascii85, etc.)

Well, if bloat is a problem, personally I'd prefer to have
hexadecimals in core language and decimals to be left to be added from
outside.

Since Lua accepts integer hexadecimal constants now, it would only be
logical to have hexadecimals in the string literals as well. Also
useful.

Alexander.

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Eike Decker
In reply to this post by David Given
> Peter Cawley wrote:
>
> Not to mention that the hex notation follows the Principle Of Least
> Surprise, which the decimal one doesn't; take the following two sequences:
>
> \102\101\100\099\098\097
> \x92\x91\x90\x8f\x8e\x8d
>
> One of these contains a nasty surprise...

Hm. I don't get it:

Lua 5.1.3  Copyright (C) 1994-2008 Lua.org, PUC-Rio
> ="\102\101\100\099\098\097"
fedcba
> ="\98"
b
> ="\098"
b

It is just the way I would expect it - at least in Lua. I would have
been surprised if \0xx would be an octal notation in Lua....

Though I do agree that a hexadecimal notation in strings would be more
convenient in quite a few cases than the decimal notation. Actually,
when I once built a binary string, I was surprised that Lua supports
only decimal values.
A Lua function that converts strings in a more convenient way is
possible, but I think it would be nice if the compiler would support
this natively.

Eike

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Enrico Colombini
Eike Decker wrote:
A Lua function that converts strings in a more convenient way is
possible, but I think it would be nice if the compiler would support
this natively.

Being a minimalist, I usually prefer extra features to be kept as libraries.
But for hex constants embedded in strings I'd make an exception: I think they're just too useful (in many fields) to be left out of the core.

  Enrico

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Ico Doornekamp

* On 2009-03-10 Enrico Colombini <[hidden email]> wrote  :

> Eike Decker wrote:
>> A Lua function that converts strings in a more convenient way is
>> possible, but I think it would be nice if the compiler would support
>> this natively.
>
> Being a minimalist, I usually prefer extra features to be kept as libraries.
> But for hex constants embedded in strings I'd make an exception: I think  
> they're just too useful (in many fields) to be left out of the core.

So it seems that I'm not alone when I feel this would make an useful
addition to the lua language. What would be the proper way to make this
into an official request  ? 


-- 
:wq
^X^Cy^K^X^C^C^C^C

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Asko Kauppi

Are you aware of the patches at: http://lua-users.org/wiki/LuaPowerPatches ?

Particularily the "Literals (hex, UTF-8) patch"

"Allows \x00..\xFFFF (hex) and \u0000..\uFFFF (UTF-8 encoded) characters within strings."

Making patches is the preferred way of requesting features to Lua. Though it (almost) never works. :)

- asko



Ico kirjoitti 10.3.2009 kello 11:14:



* On 2009-03-10 Enrico Colombini <[hidden email]> wrote  :

Eike Decker wrote:
A Lua function that converts strings in a more convenient way is
possible, but I think it would be nice if the compiler would support
this natively.

Being a minimalist, I usually prefer extra features to be kept as libraries. But for hex constants embedded in strings I'd make an exception: I think
they're just too useful (in many fields) to be left out of the core.

So it seems that I'm not alone when I feel this would make an useful
addition to the lua language. What would be the proper way to make this
into an official request  ?


--
:wq
^X^Cy^K^X^C^C^C^C


Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Ico Doornekamp

* On 2009-03-10 Asko Kauppi <[hidden email]> wrote  :

> Are you aware of the patches at: http://lua-users.org/wiki/LuaPowerPatches  ?
>
> "Allows \x00..\xFFFF (hex) and \u0000..\uFFFF (UTF-8 encoded) characters 
> within strings."

No, I didn't know about this one, but it think it tries to add a bit too
much in one go (4 digit-hex numbers and unicode support), possibly
allowing one to use the 'bloat' argument against it. 

> Making patches is the preferred way of requesting features to Lua. Though 
> it (almost) never works. :)

That's too bad, although I do understand the reasons of the developers for
using this development model.

Anyway, these were my $0.02, hope this input was useful.

Ico

-- 
:wq
^X^Cy^K^X^C^C^C^C

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Ralph Hempel
In reply to this post by Enrico Colombini
Enrico Colombini wrote:
Eike Decker wrote:
A Lua function that converts strings in a more convenient way is
possible, but I think it would be nice if the compiler would support
this natively.

Being a minimalist, I usually prefer extra features to be kept as libraries. But for hex constants embedded in strings I'd make an exception: I think they're just too useful (in many fields) to be left out of the core.

for i=1,10 do
  print("Me too")
end

Ralph

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Duncan Cross
In reply to this post by Enrico Colombini


On Tue, Mar 10, 2009 at 8:10 AM, Enrico Colombini <[hidden email]> wrote:
Eike Decker wrote:
A Lua function that converts strings in a more convenient way is
possible, but I think it would be nice if the compiler would support
this natively.

Being a minimalist, I usually prefer extra features to be kept as libraries.
But for hex constants embedded in strings I'd make an exception: I think they're just too useful (in many fields) to be left out of the core.

 Enrico

I agree with these sentiments, I'd like to see this too.

-Duncan

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Piotr Diomin
I vote for the hex format!

Reply | Threaded
Open this post in threaded view
|

Re: Feature proposal: \x## notation in strings

Rob Kendrick
On Tue, 10 Mar 2009 16:21:21 +0300
Piotr Diomin <[hidden email]> wrote:

> I vote for the hex format!

Like in North Korea, what the people vote for doesn't often matter :)
I for one support our benevolent dictators, and I'm sure they'll make a
well-reasoned decision.

B.

12345