string.pack with bit resolution

classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

string.pack with bit resolution

bil til
This post was updated on .
Hi,
one further question from me newbie:

I discoverd this string.pack in 5.3 lua.

A further bottleneck for controller applications, where we have to live
with, can be "field-proof" communications. Often this runs with low
data-rates. CAN e. g. max. 1Mbaud, and KNX even only 9.6kBaud. The advantage
of KNX is, that it allows "relatively chaotic cabling", as you typically
find in houses... .

If you want to transfer data over such lines, it is very important to pack
numbers very efficiently into communication buffers, and "very" here
ultimately requires "bit-efficiently".

For such bit and byte packing, we could do our own formatting, but it would
be very nice, if the format string could be interchanged with other devices
/ other people, and then would describe the buffer contents unique, at least
in "lua world".

Therefore could you please consider to allow the following extensions to the
Ref manual spec "6.4.2-Format strings for pack":
a: bit
S0: a null-terminated String
S: a string, where the number field before specifies the string length (so
that the string length can be specified also by a short number, e. g. 1 byte
or so)
S$: string until end of the buffer
...and one VERY important point: The letters a,b,h,l are used for specific
bit sizes. If a number is directly appended to those (maybe limit to 1...32
or 1...64), then this gives the number of bits/bytes etc. Like this then I
can specify the following:
a3 3-bit number (signed, A3 would be unsigned)
b3 24-bit number (signed, B3 would be unsigned)
h3 48-bit number (signed, H3 would be unsigned)
You can also allow a[3], b[3], h[3], but I would allow both notations (also
for in,sn,!n I would allow to skip the brackets)
And !0 must also be also allowed (for numbers): This means STRICLTY NO
alignment, not even on bit level (so allowing to concat such numbers in
bit-mode, without loosing any bits). Exception: strings and chars - these by
default can always align to 1 byte at minimum, I am fine with this.
(or should I better use X0 .. I do not really understand the difference
between Xop and !)

A number before such letter still is used for repeats. So you could then
write:
!0 10a 3A4 b3  (=10 bit-numbers 0/1, 3 unsigned 4bit numbers, one 24bit
signed)
You should specify, what is valid if somebody skips the spaces, e.g. writing
10a3b... in this case I would say, that the numbers are for "repeat meaning"
(so in 10a3b: 10 single bits and 3 bytes).
So if somebody wants such "a5 b3" type numbers, spaces need to be
added in the format string ... (maybe also alternatively allow colons, as
you want..).


Further a "half precision float"/Float 16bit  also should get a number (e. g. "r" like
good old "real" from Fortan  - maybe only for operating systemw with "hardware
support" for half-precision (16bit-floats) - Cortex M3/M4 can convert in one
cycle to half-precision... this can be useful data compression ... of course
in lua for Linux or Windows such "r" support might be awkward to program of
course, I would not support this there possibly ...).


... For those who complain that this "bloats up" the lua source code
too much: This really needs agreement between many different people,
it should be somehow specified centrally... . And anybody with ROM
problems I think in the lua C code can easily strip the format specifiers to
some smaller set (e. g. allow only a,b,h,l,r,f,d,c,s,S,X(or!))



--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

Francisco Olarte
On Wed, Oct 16, 2019 at 11:17 AM bil til <[hidden email]> wrote:
> If you want to transfer data over such lines, it is very important to pack
> numbers very efficiently into communication buffers, and "very" here
> ultimately requires "bit-efficiently".
> For such bit and byte packing, we could do our own formatting, but it would
> be very nice, if the format string could be interchanged with other devices
> / other people, and then would describe the buffer contents unique, at least
> in "lua world".

Byte packer and bit packers are different beasts, and I've done both.
You'll have to rewrite all pack and make it mucho more complex.

I think your use case will be better served by a different "bitpack"
module. This is easier to do in C, specially in modern ( 32 bit ) CPUs
( I've done it in z80 assembly and it was a PITA, and in 32 bit C
which was way easier ). Portability shouldn't be too hard, you could
easily do it in ANSI. And a dedicated module let's you add things like
precompiled format strings and freedom to put things like golomb
coding, ascii ( or teletype ) strings which can be useful if you have
fast cpu ( relative to link speed, against a 9600 link current 32 bits
microcontrollers are fast ).

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

bil til
This post was updated on .
Yes of course this should be done in C. I can do it in my own lib. But to
send message to devices of other people, it would be very nice to have a
standard way to define this packing. If there would be lua support for this,
at least for the basic things, it would be VERY nice if they can be done
"intrinsic" in Lua.

To keep things simple, the bit-/byte-count values could be restricted quite
much. I would be happy with the following additional types:
a,a2,a4,a6,b3,h3 (+ the corresponding unsigneds with large letter),
plus r for short-float. And we could agree, that all types automatically
align to minimum byte, except the bit-types a...a6, they really need bit packing if
they succeed on each other... .




--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

Francisco Olarte
bil til ( is this the name? )

One thing. Sending this without any quoting to provide context makes
knowing what you are referring to a wild guess.

That being said:

On Wed, Oct 16, 2019 at 1:03 PM bil til <[hidden email]> wrote:
> Yes of course this should be done in C. I can do it in my own lib.

I'm assuming you may be replying to a previous message of mine ( this
is the lack of context I was referring to ).

> But to
> send message to devices of other people, it would be very nice to have a
> standard way to define this packing. If there would be lua support for this,
> at least for the basic things, it would be VERY nice if they can be done
> "intrinsic" in Lua.

Well, lua is not python, no batteries included. I think this problem
is a very specialized one and I would put it in a module anyway ( like
the standard modules which are supplied in Lua ), specially given it
can be nicely defined and delimited.

The sending to other devices is easy. You have to send them the format
strings anyway, send them the packing library. In fact you can make a
C-level library plus a lua binding and send it, so they can either use
it in C or in lua. In fact, given lua does not make backwards
compatibility guarantees among version this could greatly help you to
future-proof your devices, which can be tricky to upgrade.

> To keep things simple, the bit-/byte-count values could be restricted quite
> much. I would be happy with the following additional types:
> a,a2,a3,a4,a5,a6,a7,b3,h3 (+ the corresponding unsigneds with large letter),
> plus r for short-float. And we could agree, that all types automatically
> aling to byte, except the bit-types a...a7, they really need bit packing if
> they succeed on each other... .

Even if you preserve the byte align for other types, you still need
bit packing logic. In fact the bit packer is made (theoretically) more
complex by the need to fall back to byte alignement ( point is
probably moot since normally any packer will have ops to align to
8/16/32 bits boundaries ).

And, for me, I would prefer to have a byte-packer with a simple
definition and a bit-packer with another simple definition more than
having both mixed, specially as the byte one is proven and should have
few bugs left. And lets not enter on whether the BYTE string produced
by the BIT packer should have bits LSB or MSB first inside bytes, and
numbers LSB or MSB first into the bit stream. I think you are grossly
understimating the consequences of adding bit-packing to string.pack.

Francisco Olarte.

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

bil til
This post was updated on .
Hi Francesco,
sorry, the application is more described in my other Email from this morning
(32_BIT_LUA, 48bit-Integers), I did not want to repeat this here. This would
be for sending data between controller devices, using quite low datarate
interfaces (e. g. CAN, KNX). Such protocols typically also limit the
telegram byte count to 8Byte or 16 Byte, so the messages really should not
be too long. E.g. in a house system you would like every room to send room
statistics data to the server in some pre-defined time cycles... . In this
case it is quite important that you can put as many information as possible
into 8 or 16 bytes... .

The sending is no problem, this my software will do. I just would be happy,
if there is some "general way" to define a format string for such a "packed
data transfer". Bit packing is not so difficult... . If I have some code
ready, I will show it to you... . (I do not want the integration into lua,
because I cannot do it myself - I just would prefer, that this "Format
string" is defined in some popular environment as lua).

... here a small helper function to get/set a bit in some byte field:
(If you want to get/ set several bits, you just invoke this several times,
this is not for speed, just for flexibility... if somebody wants to program
a high speed version, this can be done in some extra user lib ...).

// Get( iTo=0xAAAAAAAA) / Set(iTo= 0 or 1) the bit Nr. dwBitNr
// in the "bitfield" (pcField, iBytes)
// return bit 0/1 (Get), or changed 0/1 (Set)
#define GETSET_GetBit     0xAAAAAAAA  
BOOL GetSetBit( char* pcField, int iBytes, int iBitNr, BOOL iTo){
  int i= iBitNr / sizeof( char);
    // i Long-Index here
  if( i >= iBytes){
    assertIfDebug( 0);
    return 0;
  }
  pcField += i;
    // pcField points to correct byte now      
  i= 1 << ( iBitNr & 0x1F);
    // i Bit-Mask 0..7 now
  int iBit= !!(*pcField & i);
    // iBit now 0/1 shows current bit state
  if( iTo== 0xAAAAAAAA)
    return iBit;
   
  iTo= !!iTo;  // just for security...          
    // iTo 0/1 shows the desired bit state
  if( i == iTo)
    return 0;
  *pl ^= i;
  return 1;
}








--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

Roberto Ierusalimschy
In reply to this post by bil til
> Therefore could you please consider to allow the following extensions to the
> Ref manual spec "6.4.2-Format strings for pack":

Some of these options are already there:

> S0: a null-terminated String

This is option 'z'.

> S: a string, where the number field before specifies the string length (so
> that the string length can be specified also by a short number, e. g. 1 byte
> or so)

This is option 'sn', where 'n' is the number of bytes of the preceding
length (e.g., 's2' uses an unsigned short for the length).

> b3 24-bit number (signed, B3 would be unsigned)
> h3 48-bit number (signed, H3 would be unsigned)

These are options 'i3' and 'i6'.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

bil til
This post was updated on .
Thanks for info, this is really nice news.

I misunderstood the i3, i6 then ... I thought from first reading that it
mean 3 int sizes or 6 int sizes (I really thought "How stupid is this?"
...sorry...) ... but it clear from ref-manual, I checked again, it was my
stupidness.

Just the bit types (at least please a, a2, a4) would be really required VERY
much (allone for bits or bits with status - in such typical "status
reporting" you have typically myriads of booleans, or of "state numbers" 0-3
(e. g. 0=off, 1=green, 2=yellow, 3= red state or similar application).
(NON-Speed optimized programming is totally fine ... just for flexibility
and for "clearness", then you can in 99% of cases just specify "format
string like lua-definition" ... this for most users then is very perfect...).

And Float16 really would also bit nice. If you have float, then binary
conversion from float to half float is not TOO hard ... but e. g. Arm Cortex
M3... has hardware command for this... (just you need to specify with inline
assembly, compilers usually do NOT support short float to my knowledge, at
least not ArmCC).




--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

bil til
This post was updated on .
One further offer to make:

I would skip the signed bit numbers. Signed integers always have the
slightly awkward property, that there is one negative number more than the
positives, and this "negative surplus" is this "wicked 0x80", which can even
lead to crazy nightmares for experienced programmers. In case of chars, this
is only a 1% defect, but in case of a2, this is a 25% defect, which is
really hard to explain to any user.

So let's concentrate on the positive world and on the unsigned bit numbers
A, A2 nd A4.

Labeling bits with large letter is of course a half nightmare again. So if
you are flexible enough for this, I would use the char "." to mark a bit.
And if I have convinced you already enough concerning the importance of bits
in such packings, you could please also allow the short cuts : and | for
2-bit and 4-bit unsigned.

So then 3 more lines in your format list:
.     a bit (value 0/nil/false, 1/true)
.[n]  an (unsigned) bit number with n bits (value 0/nil/false, 1/true ...
2^n-1)
:     an unsigned number with 2 bits (value 0/nil/false, 1/true, 2, 3)  
|     an unsigned number with 4 bits (value 0/nil/false, 1/true, 2, 3...15)  

If possible also the 2 following additonal float types:
r     short float (you could also use F for this ... just maybe a bit strange...)
D     long double

(remark: long double is an ansi C standard type - this in ANY case needs to
be somehow in the list ... the 64bit fans otherwise will kill you...)
[SIDE REMARK0: If you want to make life easy for microcontroller integration,
in such cases of using double and long double in "side-library" C code, it
woudl be nice if you have some possibilty to define "SKIP_DOUBLE"
or "SKIP_LONGDOUBLE". in case of pack/unpack the question then rises,
how to handle such a case, if you have such a format in the format list.
Giving errors in such a case I think is NO good idea. Better return NIL
or zero for unpack for this element, and insert 8 or 16 zeros in case
of the pack function].

As separators you have already allowed spaces, some people for sure want
colons, I would propose also single quotes / hyphens '. So then the last
line of format list should read:
" ", ",", "'": These 3 charakters (space, colon, single quote) are ignored

Then you could write nice formats to pack/unpack String into its bits like
this (e. 1 Byte, 1short, 1 int):
"....'...." or "||"
"....'....'....'...." or "||'||"
"||||'||||"
(of course you should also allow "8." or "16." or "32." ... but the above
notaions really look nice, even deigners would like this, I hope)

A further VERY nice application would appear, if you would allow to specify
the n in the format list. You use the parameter n already for lua_Number
[SIDE REMARK1: which makes sense - just please specify how you do this - I
assume you need _tt and then the native byte number (so in LUA_32BITS this
would be 4 byte for int or 4 byte for float) - you have to specify how long
the _tt is - I assume 1 byte is fine for this - this of course really MUST be
specified exactly in the descirption of pack / unpack. You could e. g.
use _tt marking 0 for nil/pointer/no_number, 1 for boolean with 1 byte,
4 for int32, 5 for float32 (4 bytes each), 8 for int64, 9 for double64
 (8bytes each)].

... so n is given away already... then maybe use #).
[SIDE REMARK2: The specifier j and J is stupid seen from my point of view -
this makes no sense ... if somebody wants an integer in this list, please i
 should be used]
[SIDE REMARK3: In the format list for h and l you write "native size" - this
is stupid in my eyes, please change to "2 bytes" for h, and "4 bytes" for l,
or do you know some other native size for short and long??? - only
lua_Integer and lua_Number has native size, as I see it - but I would
STRONLY recommend to combine them together as proposed in my
side remark1... so then j and J can be killed]

So for the extension I want to describe in the following, please two further
lines in your format list:
i# (or #| or #.4 ...): this number element is put into the string, and
additionally it is used as count specifier for the following elements (then
allowing to write #i #b #c)
^i (or: ^| or ^b or ^.4...) this number element is put into the string, and
additionally it is used as length specifier for the following elements (then
allowing to write i^ b^ c^)
[SIDE REMARK4]: your existing format s1 then is an abbreviation of the
writing  
"^b c^", s2 then could also be written "^h #c^", ...)

If you do this # parameter as I just defined, then you could also allow
tables as parameters for pack (and as return for unpack):  
string.pack( "n# #n", #t, t)
this should please pack the index field of a table into the string. If you
want to allow similar unpacking of a table in such a very nice and compact
statement, it would be nice if it would be possible to write:
t.setn, t.seti= string.unpack( "n# #n")

If the user knows, that the table contains only booleans, then the user
could e. g. also write:
string.pack ( "n# #.", #t, t)
and if you supply the functions table.setn, table.seti:
t.setn, t.seti= string.unpack( "n# #.")
(this would require, that your unpack function in case of bit specifier .
returns by default Boolean, but this is no restriction... if somebody
presers lua_Number, this will be converted by the number receiver function
more or less automatically I see it)

... it is in such case VERY helpful, if the format string for packing and
unpacking is always EXACTLY the same, this makes life for configuration of
user data MUCH easier ...

... this would be really VERY nice for a whole bunch of users I am very
sure...



if for the table hash-part you allow two further functions table.getnk()
(number of keys), table.getk(), table.getv(), and further
table.setnknv( nk, ...) (the ... here stands for a list of keys, and then
a list of values), then you can use this
also for the hash part of a table:
string.pack( "n# #z #z", t.getnk(), t.getk(), t.getv())
t.setnknv( string.unpack( "n# #z #z"))
(to make this work 100%, it would be necessary that z can be used
also for lua_Number, elements, and then this would automatically
invoke numberToString... but this anyway makes sense...).
(if the user is sure, that the hash part contains numbers, then
also the format string "n# #z #n" is possible and much nicer...).

... this ALSO would be really VERY nice for a whole bunch of users I
think... )

Concerning error handling: What do you do, if you invoke the string.pack
with specifier s1, and you give a string as argument with more than 255
Bytes? Do you then give an error message, or do you just give out the first
255 Bytes of this string? ... both would be ok, both has advantages and
disadvantages, just please better specify this. (to be honest, although this
is a lazy approach, I would in such case give NO error ... just then limit
to 255 Bytes ... this is somehow responsibility of the poeple who specify
the formats, and if they want to test this, they can write some more code...
also e. g. if numbers overflow, then best just use the max number possible,
e.g. for bits it is clear, that only nil or false or zero will create zero,
and any other value will create 1).

[SIDE REMARK5]: I am quite sure that some poeple would like this
functionality to use also for streams, without the need to invoke the
garbage collector to create an intermediate string object. For such cases it
would be very nice, if in your C could you could allow the possibility to
give a "byte-by-byte stream function" as first parameter to this pack /
unpack function. So the C prototype of your base function for packing should
e. g. be something like this:
void stringpack( void* target, BOOL boTargetIsFunc, ...) (in C of course
this stupid va-stuff instead of ...)
so target then can be lua_String object, but also function which accepts one
byte, and which then is invoked for every output byte of stringpack.
And the lua-lib function str_pack then is defined like
str_pack( lua_String, ...) (using va notation again...)
Then, if some C programmer wants, he can apply this also easily for other
streams, e. g. he can then easily define can.pack, or tcip.pack, or knx.pack
for CAN / TCIP / KNX interfaces...). (you could also consider to give a
stream as parameter to such a function in lua, as it is often allowed in
 C, but I think this would be NOT good in the case of lua, this would be
too flexible and easily collide in some uncontrolled manner e. g. in the case of
coroutines, if the lua-user is allowed to do such things by himself) (but
this really is only for people, who have huge amounts of data to transfer
and I am NOT part of this group, so I have no experience with this,
possibly they anyway prefer to work through buffers, and then this
Side remark5 possibly is complete nonsense....)]

... sorry, this was a bit much ...
(but please do not come with the argument, that these things bloat up the c
code for string.pack / string.unpack very much - this I do not believe you
... these are just some minor additons in c code, which make these functions
MUCH more flexible in use)
 



--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

Sean Conner
It was thus said that the Great bil til once stated:
> One further offer to make:

  That you will make the necessary changes to string.pack()/string.unpack()
(or an entirely new module) for people to use and comment on?  If the demand
is that high and you you know what it wanted, I would think you would be the
best person to implement this.

  Or is that just wishful thinking on my part?

> I would skip the signed bit numbers. Signed integers always have the
> slightly awkward property, that there is one negative number more than the
> positives,

  Lua is based upon C.  And C allows the representation of negative integers
to be of sign magnitude, 1s complement, or 2s complement.  The range of an
8-bit value for each of these are:

        sign magnitude: -127 .. 127
        1s complement:  -127 .. 127
        2s complement:  -128 .. 127

  The C standard (and I'm using the C89 standard here, which Lua adheres to
most) only gives symetrical ranges for each integer.  Section 5.2.4.2.1
states:

        Their implementation- delined values shall be equal or greater in
        magnitude (absolute value) to those shown. with the same sign.

        ...

        -- minimum value for an object of type int
           INT_MIN -32767

        -- maximum value for an object of type int
           MIN_MAX +32767

        ...

  They can, of course, be bigger.

  Granted, most systems today are 2s complement, but I have recently come
across a C compiler, *still commerically avilable* for a sign magnitude
system.  

> and this "negative surplus" is this "wicked 0x80", which can even
> lead to crazy nightmares for experienced programmers.

  I'm not aware of any crazy nightmares for experienced programmers.  Novice
ones, yes.  But perhaps I was fortunate enough to learn assembly first and
that on every 2s complement system I've learned assembly for (and that's
pretty much all the systems I've ever come across) state:

        NEG set overflow flag if input is $80 (or $8000 or $80000000
                depending upon size of operand).

  But in practice I've never had a real issue with this.

> In case of chars, this
> is only a 1% defect, but in case of a2, this is a 25% defect, which is
> really hard to explain to any user.

  And a 50% defect in the case of "a1".  But even there, on page 150 of my
copy of K&R C (the C Bible, and remember, Lua is based upon C):

        struct {
                unsigned int is_keyword : 1;
                unsigned int is_extern  : 1;
                unsigned int is_static  : 1;
        } flags;

        This defines a variable called flags that contains three 1-bit
        fields.  The number following the colon represents the field width
        in bits.  The fields are declared unsigned int to ensure that they
        are unsigned quantities.

> So let's concentrate on the positive world and on the unsigned bit numbers
> A, A2 nd A4.
>
> Labeling bits with large letter is of course a half nightmare again. So if
> you are flexible enough for this, I would use the sing "." to mark a bit.
> And if I have convinced you already enough concerning the importance of bits
> in such packings, you could please also allow the short cuts : and | for
> 2-bit and 4-bit unsigned.
>
> So then 3 more lines in your format list:
> .     a bit (value 0/nil/false, 1/true)
> .[n]  an (unsigned) bit number with n bits (value 0/nil/false, 1/true ...
> 2^n-1)
> :     an unsigned number with 2 bits (value 0/nil/false, 1/true, 2, 3)  
> |     an unsigned number with 2 bits (value 0/nil/false, 1/true, 2, 3)  
>
> If possible also the 2 following additonal float types:
> r     short float
> D     long double
>
> (remark: long double is an ansi C standard type - this in ANY case needs to
> be somehow in the list ... the 64bit fans otherwise will kill you...)
>
> As separators you have already allowed spaces, some people for sure want
> colons, I would propose also single quotes / hyphens '. So then the last
> line of format list should read:
> " ", ",", "'": These 3 charakters (space, colon, single quote) are ignored
>
> Then you could write nice formats to pack/unpack String into its bits like
> this (e. 1 Byte, 1short, 1 int):
> "....'...." or "||"
> "....'....'....'...." or "||'||"
> "||||'||||"
> (of course you should also allow "8." or "16." or "32." ... but the above
> notaions really look nice, even deigners would like this, I hope)

  At this point, the Erlang bit syntax is looking better and better.  An
example:

        <<IP_VERSION:4, HLen:4, SrvcType:8, TotLen:16,
          ID:16, Flgs:3, FragOff:13,
          TTL:8, Proto:8, HdrChksum:16,
          SrcIP:32,
          DestIP:32>>

  It will even allow you to specify things like signedness, endianess:

        X:6/little-signed-integer

(more about this here: http://erlang.org/doc/programming_examples/bit_syntax.html)

  Put that into a string, and your packing/unpacking module could even
return a table with the fields prenamed and everything.

> A further VERY nice application would appear, if you would allow to specify
> the n in the format list. You use the parameter n already for lua_Number
> (SIDE REMARK1: which makes sense - just please specify how you do this - I

 ... [ snip ] ...


  Or one could use a pre-existing module to serialize data.  I wrote a very
extensive CBOR (Cocise Binary Object Representation, RFC-7049)
implementation for Lua:

        https://github.com/spc476/CBOR

and I hear CBOR is very popular among the IoT crowd for its compactness of
representation.  And it's not like it's hard to use.

> assume you need _tt and then the native byte number (so in LUA_32BITS this
> would be 4 byte for int or 4 byte for float)

  Assuming the platform in question uses 4-byte ints and IEEE-754 floating
point.  Again, you are making assumptions that Lua does not.

> - you have to specify how long
> the _tt is - I assume 1 byte is fine for this, and maybe zero for float and
> 1 for integer and 2 for boolean or s - this of course really MUST be
> specified exactly in the descirption of pack / unpack. You could e. g. also
> use _tt marking 0 for boolean with 1 byte, 4 for int32, 5 for float32, 8 for
> int64, 9 for double64, maybe also FE for pointer32 and FF for pointer64").
> ... so n is given away already... then maybe use #).
> (SIDE REMARK2: The specifier j and J is stupid - this makes no sense ... if
> somebody wants an integer in this list, please i should be used)

  j and J give you at least a 64-bit int.  i and I only give you the native
integer size, which can be 16-bit or larger.

> (SIDE REMARK3: In the format list for h and l you write "native size" - this
> is stupid in my eyes, please change to "2 bytes" for h, and "4 bytes" for l,
> or do you know some other native size for short and long??? - only
> lua_Integer and lua_Number has native size, as I see it)

  I have actually used the "native size" for a project recently, to read
native binary integers written by a C program.  You might not find a use for
the "native sizes" but that doesn't mean others won't.
 
> So for the extension I want to describe in the following, please two further
> line in your format list:

  Have you thought of maybe implementing this yourself?

  ... [ snip ] ...

> (but please do not come with the argument, that these things bloat up the c
> code for string.pack / string.unpack very much - this I do not believe you
> ... these are just some minor additons in c code, which make these functions
> MUCH more flexible in use)

  Which means this should be easy to implement, right?

  -spc (Right?)

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

bil til
This post was updated on .
Hi Sean,
thank you for your answer.

Sorry for my ironic start, this was ironic, I have to admit.

Concerning the "wicked 0x80" problem, you maybe program mainly on large
systems - in case of large ints 4byte or more you hardly ever reach such
limit. But if you program controllers which often use 1byte for a number,
then this really can be very dangerous, believe me.

Concerning 1bit, I have to defend them. Then in my opinion clearly unsigned
0/1... (and if you consider them as signed, then also no problem, as it
boolean requests in C it does not matter, whether the result is 0/1 or
0/-1).

The problem with the sign only appears with the 2nd bit... .

Thank you for the Erlang bit syntax example, but this is somewhat too much
for me (I always strike with these strange string search patterns used for
string.match etc... I must admit ... of course they are nice for many
applications, but I am always happy if I can avoid them...).

My proposal for this # and ^ is only meant as request TWO (and inferior) ...
if you do not like this, then also fine, no problem. Just please at least
add the bits. (and n1..7 already very fine...).

Just I think this # and ^ concept would increase the flexibility of this
pack/unpack functions tremendously, and it will NOT lead to much further
programming overhead in C, I am quite sure.

Of course I principally COULD program this myself. But it would be a very
large advantage, if such a format is uniquily defined by somebody with
"wider fan group", like e. g. lua ... then to communicate such format
strings to other people, you just could specify "format string in lua-style"
- would be very nice and useful in my point of view.

Just it really has to be unique, and my side remarks concerning n, j and J
ARE important... . It must clearly be defined such, that no problem to pack
e. g. on a lua 64bit device, and then unpack on lua 32bit device and vc vs
... otherwise this pack/unpack does not make too much sense. I really think
best kill j and J, and keep n, but with 1 additional typinfo byte at the
start. (and for this n type I would always use signed ints, as lua anway
always does... if somebody wants unsigned ints, I4 or I8 is a perfect
choice).

64bit int is NO problem with I8 and i8, even in the lua5.3 version of
pack/unpack.

The native size of course is nice, but it has to be included into the
string, otherwise the unpacker is mixed up if he should get the task to
unpack on another system... . Therefore I think much better to include this
as type info in n, and then all fine, j and J not needed any more. (n of
course IS native size, but it really has to specify this "native info" in
the result string somehow...).

I really think that these add's to pack and unpack would be easy to
implement.

Just not easy to get agreement in the community ...

and before no agreement in the community, it makes no sense to start
programming ...

(as a main goal of these format definition clearly must be to transfer data
in uniquely described form from one system to another, at least from lua
32bit to lua 64bit and vc vs - I hope you completely agree with this at
least).


(you really have to avoid problems like with the CSV file format wrongly
specified by Microsoft in the 1980s... this has this stupid "localisation"
dependency... probably the group at MS was even especially proud of
this feature at that time... but if you live in poor old Europe, where the
notation of decimal point changes from country to country (even then
using colon like in Germany-France, and then needing another list
delimiter) ... this really is a nightmare... and this info about the
correct localisation is not added anywhere in the file of course ...
so you can just open it and try one or the other ... of course
usually you recognize it quite fast, but this is anyway a tragedy... ).







--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

Sean Conner
It was thus said that the Great bil til once stated:
> Hi Sean,
> thank you for your answer.
>
> Sorry for my ironic start, this was ironic, I have to admit.
>
> Concerning the "wicked 0x80" problem, you maybe program mainly on large
> systems - in case of large ints 4byte or more you hardly ever reach such
> limit. But if you program controllers which often use 1byte for a number,
> then this really can be very dangerous, believe me.

  I programmed nearly exclusively in assembly language (6809, 8088, 68000)
from 1985 to about 1995, so I am aware that a 16bit limit can be reached
quite readily, but again, I don't ever recall $80 (as a signed value) ever
being an issue for me.

> My proposal for this # and ^ is only meant as request TWO (and inferior) ...
> if you do not like this, then also fine, no problem. Just please at least
> add the bits. (and n1..7 already very fine...).
>
> Just I think this # and ^ concept would increase the flexibility of this
> pack/unpack functions tremendously, and it will NOT lead to much further
> programming overhead in C, I am quite sure.
 
  And I think you are underestimating the complexities involved.  One format
I've been working with recently has the following 8-bits defined as:

        0 nn FFFFF
        1 nn i oooo

  That is, if the most significant bit is 0, then the last five bits is a
5-bit SIGNED value (-16 to 15), whereas if the most significant bit is 1,
bit 4 has a distince meaning, and the remain 4 bits define an operation,
with the 'nn' field (a 2-bit unsigned field) having mostly the same meaning
across both (but in the second case, 'nn' is a "don't care" for a few cases
of 'oooo'). Could I expect this to handed with a "bit formating langauge"?

  If no such distinctions are made, then I suppose I could live with:

        x nn zzzzz

and further parse the 'zzzzz' field depending upon the 'x' field but at this
point, I don't think I'm any better off than doing:

        local n = byte & 0x60
        if byte & 0x80 == 0x80 then
          local i = byte & 0x10
          local o = byte & 0x0F
          -- handle this case
        else
          local f = byte & 0x1F
          if f > 15 then
            f = (-1 & ~0x1F) | f
            -- handle this case
          end
        end

  Some other questions---assume we have "q[n]" meaning "an unsigned
collection of n bits (where n by default is 1).  What happens in the
following cases?

        x = string.pack("q",1) -- how many bytes is x? what should x be?
        x = string.pack("q17",5)
        x = string.pack("q17",-12)
        x = string.pack("q17q19q23",1,2,3)

> Of course I principally COULD program this myself. But it would be a very
> large advantage, if such a format is uniquily defined by somebody with
> "wider fan group", like e. g. lua ... then to communicate such format
> strings to other people, you just could specify "format string in lua-style"
> - would be very nice and useful in my point of view.

  You need to first convince the "wider fan group" first.

> Just it really has to be unique, and my side remarks concerning n, j and J
> ARE important... . It must clearly be defined such, that no problem to pack
> e. g. on a lua 64bit device, and then unpack on lua 32bit device and vc vs
> ... otherwise this pack/unpack does not make too much sense.

  I can do that now.  The following binary format (it's the main ZIP file
header, by the way):

      end of central dir signature    4 bytes  (0x06054b50)
      number of this disk             2 bytes              
      number of the disk with the            
      start of the central directory  2 bytes
      total number of entries in the        
      central directory on this disk  2 bytes
      total number of entries in            
      the central directory           2 bytes
      size of the central directory   4 bytes
      offset of start of central            
      directory with respect to
      the starting disk number        4 bytes
      .ZIP file comment length        2 bytes
      .ZIP file comment       (variable size)

is

        "!1<c4 I2 I2 I2 I2 I4 I4 s2"

  That will work to both pack and unpack that structure.

> I really think that these add's to pack and unpack would be easy to
> implement.

  The code is there.  It should be easy to add the functionality and release
it so people can bang on it.  I mean, even *IF* the Lua core team wanted to
do this, we won't see it until the next major release of Lua after 5.4
(which if history is any indication, won't be for a few years at least).
 
> and before no agreement in the community, it makes no sense to start
> programming ...

  That's a self-defeating way of looking at it.  If I find something
lacking and the maintainers aren't willing to make a change, I'll jump in
and do it myself---maybe they'll accept it, maybe not.  But at least I'm
solving my immediate issue.

> (as a main goal of these format definition clearly must be to transfer data
> in uniquely described form from one system to another, at least from lua
> 32bit to lua 64bit and vc vs - I hope you completely agree with this at
> least).

  Yes.

  -spc

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

bil til
This post was updated on .
Hi Sean, thanks for further info.

So you propose the letter q for the bits? Also fine for me. Helpful idea,
thank you.

Concerning your format strings "q", "q17", "q17q19q23": The byte size
according to my recommendation would be 1byte, 3byte, and 8byte. (Trailing
bits are filled with zeroes, to give bytes...  If Endianess setting would be
considered here, this would be nice. So little endian should start at LSB and
fill trailing MSB zeroes, and big indian should start at MSB and fill trailing
LSB zeroes. I would allow endianess spec only as first byte of format
string (NOT allow to change during the string) - anything else bloats up
the C code I think quite a bit ... if you allow change of endianess during
a format string, then of course the next bit starts on byte border).

The format string "1!<c4 I2 I2 I2 I2 I4 I4 s2" is nice seen from my
side - with this I have no problem.

In LUA_32BIT system you could identically also use "1!<c4 I2 I2 I2
I2 J J s2" for this. This is a VERY dangerous ambiguity, if you want to
exchange data with a LUA_64bit device... . Therefore please strictly FORBID
this "J" possibility. If you do programming on one machine, then
flexibility and syntactic sugar is always nice. But if you want to transfer
data between different machines, then the format specification MUST be 100%
water-tight strict, otherwise it is useless/ dangerous.

Thank you for confirmation, that the transfer between LUA_32BIT and
LUA_64bit must be uniquely defined... if we agree on this, then I can sleep
well :).

But I found one further ambiguous code number in the list, this is T (size_t
native size - also NOT clear in data exchange LUA_32 <> LUA_64 ... so
seen from my side the codes j, J, T need killing in this table (or at least
they should have some (*) comment: "Do not use specifiers marked 'native'
for data exchange LUA_32 <> LUA_64". And the commment "(native size)" at h,
H, l, L, f, d please should be killed, as h, H, l, L, f, d have STRICTLY
defined size of 2/ 4 / 8 bytes. And the comment "(default is native
alignment)" for the ! parameter should clearly say "default is 1byte
alignment".

The "i" alone ("native size") without number I would also recommend to kill.
But as long as you have the i[n] possibility, I am fine with i.

The n I would extend to "native (number, boolean, string)", and as an update
to my post above, I would now recommend the following type byte specifier
(the marking 0B / 4B... in the table defines the number of follow-up bytes):
0x00  0B Boolean False
0x04  4B int32
0x08  8B int64 (warning: unpack in LUA_32 will cut to int32 and use
MIN_INT32/MAX_INT32 if necessary)
0x10  16B int128 (warning: unpack in LUA_32/64 will cut to int32/64)
0x40  0B Boolen True
0x44  4B float32 normalized
0x48  8B float64=double normalized (warning: unpack in LUA_32 will cut to
float32 and use -INF/+INF if necessary )
0x50  16B float128=long double normalized (warning: unpack in LUA_32/64 will
cut to float 32/64)
0x81...0xD4: nB String with 1...100 Bytes (n=1..100)
0xE4  (n+4)B String with n Bytes, preceded by 4byte signed length info
(valid range 1...2G)
0xE8  (n+8)B String with n Bytes, preceded by 8byte signed length info
(valid range 1...8GG)
0xF0...0xF8 0B Error "no_number" (0xF0+_tt info, so
0xF0=nil,0xF2=luserdata,0xF5=table,0xF6=function,0xF7=userdata,0xf8=thread)

To unpack function please allow a third parameter Error (which can be NIL,
but usually a string "Error", please allow also "Error%d"). If unpack meets
an invalid type specifier in the "string to unpack", then it should please
put this Error string in the return list, please best using
string.format(strError, Bytenumber) (so that if the Error string is
"Error%d", the byte number will be returned in the return list).

With this, then 'n' gets REALLY native for lua and extremely powerful, as
now it will not support only lua_Number, but also lua_Boolean and lua_String
in a very optimum way... . And this is what the users realy want, I am sure,
also the "wider fan group".


I entered int128 and float128 in the list above, because I assume that in
not too far future there will come a lua128 which will support 128bit
numbers. long double is already supported in Ansi C, and Microsoft e. g.
speakes already about "single cycle" usage of long double for "CPUs with
SSE2" (google for "Floating point support Microsoft").

But I am quite sure that the size_t length for pointer size will be kept at
64bit for VERY long time. Even storage devices with atomic 1nm resolution
would need an area of 1m² for 8GG bits, so for 8GG Bytes minimum 10m²
storage area will be required - I cannot imagine that this arrives in some
nearer future..., maybe in 2324 :), otherwise you can kill me, but you have
to speed up then, I will leave our nice earth in latest 60 years by default
:).

But therfore please also best insert the following 2 floating point
specifiers:
"r" short float/float16 (normalized, for "greedy byte" poeple like me)
"D" long double/float128 (normalized, warning: unpack on LUA_32 and LUA_64
will cut to float32/64 range)

For all floats I added the spec "normalized". But as every floating point
machine has a 1 cycle normalize command, I assume that any compiler anyway
normalizes floats automatically, even the usual FPU will typically do this
after every floating point operation.

Converting normalized floats between 16-32-64-128 bits is easy bit-bangling,
you can do this in about 5-20 C instructions... (you do NOT need float
support for this conversion, if the floats are normalized).

In the delimiter list, in my above post I proposed to use the hyphen/single
quote ' as additional delimiter to space. But this I redraw - this is
nonsense in lua... . But perhaps colon (and additionally also semi-colon
perhaps) as additional delimiters, it usually is very helpful to "optically
structure" such number formats.. .

My additional tags "#" (to repeat equal elements in some "dynamic way"), and
"^" (to specify number size in some "dynamic way") I would like to keep as
"WISH NUMBER 3" (Wish number 1 are bits, Wish number 2 is uniqueness and n
with type info, and #,^is wish number 3. ^you could also replace by _ or by
° if you prefer...).

So to summarize, here my propsed change to the pack/unpack format specifier
table:
n: a native lua element with type info start byte
(lua_Boolean/lua_Number/lua_String)
q[n]: a signed bit (Q[n] unsigned, n=1...16 fine for me)
r: short float/ float16 (normalized)
D: long double/ float128 (normaliezd, warning LUA_32/64 will unpack to float
32/64)
# post-char: specifies repeat count (no byte in data str, only for format
str),  then follow-up specifiers can be written #q or #i or #z or #n to use
this repeat count.
# pre-char: specifies repeat count dynamically (previous #post-char
required)
^ pre-char: specifies bit/byte length of numbers marked [n], then follow-up
specifiers can be written q^ or i^ ...
^ post-char: specifies bit/byte length dynamically (previous #post-char
required)
' ' or ',' or ';' can be used as delimiters in format (no influence on
result)
'!(n)' sets alignment to n (defult is size of follow-up element)
For the format types "h,H,l,L,f,d please REMOVE the text '(native size)'"
Please add a footnote warning: "(X) Do not use 'native size' marked format
types for transfer LUA_32<>LUA_64 " and add this (X) to j, J, T ... (but I
would better recommend to kill j, J, T from the table).
for i[n], I[n]: Please add the warning: unpack for LUA_32/64 will restrict
to int32/64.
Concerning this alignment operator "!(n)" - I think this bloats up the C
code of pack / unpack unnecessarily (also the testing time for possible
errors / glitches...). I think usually anyone who uses pack/unpack, wants to
have alingment "!1"... . So I would also kill this "!(n)" specifier. So
pack/unpack will strictly use byte alignment, this is no restriction from my
point of view, but perfectly clear from what you expect for pack/unpack. (In
case of successive q[n]s of course please please please bit packing is
required... ).

Endianess please clearly can NOT be "native" by default, please use "little
endian" by default, Intel clearly has defeated Motorola :). But a
PRE-DEFINED endianess really should be specified clearly.. . (as written
on start, I would allow endianess definition only at start of string for
complete string, also to keep the c code neat and clear... but as you
want...).

And the string.unpack please needs this 2nd optional parameter errorstring
(defaults to nil).

And the last "future wish" would then be some table functions table.setn,
table.seti, table.setnknv, table.getnk, table.getk, table.getv, so that you
can write the following commands:
string.pack( "n# #n", #t, t)
t.setn, t.seti= string.unpack( "n# #n", errorstring)
string.pack( "n# #n #n", t.getnk(), t.getk(), t.getv())
t.setnknv( string.unpack( "n# #n #n"))

... and for indexed boolean tables the string size will be reduced by a factor 8
for the following format:
string.pack( "n# #q", #t, t)
t.setn, t.seti= string.unpack( "n# #q", errorstring)


(please strictly take care, that the format string for pack and unpack can
be identical, also if you transfer data LUA32 <> LUA64).

This would be a miracle then ... .

... thank you for the patience, for all reading this excessively long post
...

PS: Thinking about this table issue ... it is very cumbersome that you HAVE
to specify the number of table elements... . Of course, if somebody wants,
it CAN be done like this with the number of elements.

But with 2 further format specifiers (assumed that you killed before the
stupid "T"=sizef_of), something really VERY wonderful would be possible, and
I think even not much more C code required:

"t[n]" Table (n=1: only index part, n=2: only hash part) (at end of format
"T[n]" Exploded Table (n=1: only index parts, n=2: only hash parts) (at end)
These two parameter MUST be placed a the end of the format string.

Then you could write the following:
t= { 1, 3, "hallo", name=test, version=1.0 }
str= string.pack( "t", t)
t= string.unpack( str, "Error on char %d")

... to allow the complete table with index + hash, the unpack needs to know
somehow when the hash starts in the string... see below.

And T would be for an "Exploded table", where links to "subtabels" should
be resolved:
T= { 1, 3, {4, 5, name=sub1}, name=test, version=1.0, further_info= {6, 7, name=sub2}}
strT= string.pack( "T", T)
T=string.unpack( str, "Error on char %d")

To handle this, 4 further type info numbers would be needed.

Best in this case use 0x20 for "floats + bolean true", so:
0x20  0B Boolen True
0x24  4B float32 normalized
0x28  8B float64=double normalized (warning: unpack in LUA_32 will cut to
float32 and use -INF/+INF if necessary )
0x30  16B float128=long double normalized (warning: unpack in LUA_32/64 will
cut to float 32/64)

And then the following further codes:
0x40 0B start of sub table (unpack produces " { ")
0x41 0B end of sub table (unpack produces " } ")
0x42 0B hash part of table starts (so unpack has to produce k=v elements)

... just concerning unpack, this spec is "lazy style" spec... I am not sure
how you can give this to "T=" somehow automatically ... this might require
some further programming on the table side... .
I would recommend to encapsulate ANY table (also to main table
with 0x40 0x41 ... this is also a good checking, that he string did
not corrupt... . Only 1 byte more, and in case of tables might be helpful :).




--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

bil til
This post was updated on .
One further offer, concerning the “n” specifier.

To appease the “native“ fans, I would propose to keep “n” and use “N” for
“platform-independent usage”:
“n” a lua_Number (native format, faster packing / unpacking) (*)
“N” a lua_Number (platform-independet format)

(*) warning use conversion options marked “native” only locally, they can
not be used to transfer data between lua_32bit <> lua_64bit.

To allow this, I would add 3 further type numbers to the type byte list:
0x48 (4/8/16B) a lua_Number of type integer , native format(int32 / int64 /
int128)
0x49 (4/8/16B) a lua_Number of type float, native format(float32/float64/
float 128)
0x4A (4/8B+nB) a lua_String with n Bytes, preceded by length info as native
size_t

And one further improvement to the type table: Integers please should not
only support 0x04/0x08/0x10, but best 0x1…0x10 in “optimum selection”, so
for 7, 15, 23, … significant bits (MSB always the sign bit of course…). This
would have 2 advantages:
- tables with many integers would create much shorter strings typically.
- The “packed string” format for an integer table gets unique. (Before
unpacking an integer table, you could check the string to some reference
string, and if you have this already, then the unpacking is not needed, as
you then can use the table of the reference string…).

After these modifications, the complete type byte would be the following:
0x00  0B Boolean False
0x01..10 1-16B int8, int16…int63 (7, 15, …63 significant bits) (warning:
unpacking into LUA_32 will use MIN_INT32/MAX_INT32 for 0x05…0x10, and
unpacking into LUA_64 will use MIN_INT64/MAX_INT64 for x09…0x10)
0x20  0B Boolen True
0x24  4B float32 normalized
0x28  8B float64=double normalized (warning: unpack in LUA_32 will cut to
float32 and use -INF/+INF if necessary )
0x30  16B float128=long double normalized (warning: unpack in LUA_32/64 will
cut to float 32/64)
0x40 0B start of table (unpack produces " { ")
0x41 0B end of table (unpack produces " } ")
0x42 0B hash part of table starts (so unpack has to produce k=v elements)
0x48 (4/8/16B) a lua_Number of type integer , native format(int32 / int64 /
int128)
0x49 (4/8/16B) a lua_Number of type float, native format(float32/float64/
float 128)
0x4A (4/8B+nB) a lua_String with n Bytes, preceded by length info as native
size_t
0x81...0xD4: nB String with 1...100 Bytes (n=1..100)
0xE4  (n+4)B String with n Bytes, preceded by 4byte signed length info
(valid range 1...2G)
0xE8  (n+8)B String with n Bytes, preceded by 8byte signed length info
(valid range 1...8GG)
0xF0...0xF8 0B Error "no_number" (0xF0+_tt info, so
0xF0=nil,0xF2=luserdata,0xF5=table,0xF6=function,0xF7=userdata,0xf8=thread)

… and the conversion option table should be changed as shown in the
following:
<: sets little endian (this is the default endianity, if no endian option is
specfied)
>: sets big endian
=: sets native endian (*)
q[n]: a signed bit (Q[n] unsigned) n=1...16  (successive q’s are bit-packed,
anything else then is byte-packed) (default is 1 bit) (q=q1 for boolean
false/true, Q=Q1 for integers 0/1, n=2... for integers) (**)
b: a signed byte (char) (B: unsigned) (“shortcut” for i1/I1)
h: a signed short (H: unsigned) (“shortcut” for i2/I2)
l: a signed long (L: unsigned) (“shortcut” for i4/I4)
‘i[n]’ a signed int with n bytes (default is native size (*)) )(I[n]
unsigned), n=1..16
(warning: unpacking into LUA_32/64 will strip ints with more than 31/63
significant bits to MIN_INT/MAX_INT)  (**)
r: a short float / float16
f: a float / float32
d: a double / float64 (warning: unpacking in LUA_32 will strip accuracy to
float and in case overflow will use –INF / +INF)
D a long double / float128 (warning: unpacking in LUA_32/64 will strip
accuracy to float32/foat64 and in case overflow will use –INF / +INF)
n: a lua_Number (faster packing/unpacking) (native size (*))
N: a lua_Number (size-optimized packing, platform-independet format)
c[n]: a fixed  string with n Bytes (n=1...MAX_INT)
z: a zero terminated string
s[n]: a string preceded by its length coded as an unsigned integer with n
bytes (default is native size_t size for n (*)) (**)
x: one zero byte of padding (unpack skips a byte (ANY byte))
' ' or ',' or ';' empty space (ignored) (use as delimiters in the format
string)
t[n]: linar table (do NOT include sub-tables) (n=1: only index part,
n=2: only hash part, 3(default): first index, then hash)
Format of index part: count (N), then values as N's,
hash part: count (N), keys as N's, count (N), values as N's (**)
T[n]: nested table (do include sub-tables) (n=1: only index part for
table+sub-tables, n=2: only hash part for table+sub tables,
3(default): first index, then hash for table + sub tables)
(uses platform-independent lua_Number N type lists) (**)
Format: Table-Start, then index part as N's,
Table-keystart, then key-value pairs, Table-End
(sub-tables create embedded Table-start...-End sequences)
| as pre-char: linear table, only index part, with defined nubmer format.
Use e. g. pack( "|q", t) to pack an index table of booleans
optimized
("|N" gives same result as "t1").
: as pre-char: linear table, only key part, with defined nubmer format.
 Use e. g. pack( ":q", t) to pack a hash table (boolean values)
optimized (":N" gives same result as "t2").
*: as pre-char: linear table, index + key part, with defined nubmer format.
 Use e. g. pack( "*q", t) to pack table (boolean values)
optimized ("*N" gives same result as "t3" or "t").
# post-char: specifies repeat count,  then follow-up specifiers can be
 written #q or #i or #z or #n to use this repeat count.
# pre-char: specifies repeat count dynamically (previous #post-char
required) (**)
^ pre-char: specifies bit/byte length of numbers marked [n], then follow-up
specifiers can be written q^ or i^ ...
^ post-char: specifies bit/byte length dynamically (previous #post-char
required)
Number pre-char: specifies repeat count for format options (**)

(*): use “native size” options only locally on one system (do NOT use to
transfer data e. g. from LUA_32 to LUA_64 or vc. vs.)

(**) If a format option using length specifier [n] is used with repeat count,
then a delimiter (space/colon/semi-colon) must be used BEFORE the
repeat count in the format string.

(As I told already in my blogs before, I see no application why to support
alignment larger than 1 Byte … and I think this alignment support bloats the
C code quite a bit … but if there are good reasons to support alignment,
then please correct me, then no problem to re-install the option ![n]’ …
just I really do NOT understand what you mean with the option ‘Xop’ in your
existing format table… .)

(maybe you Xop would be nice to ask pack to include a non-zero byte
of padding, e. g. X00...XFF to include a byte 0x00...0xFF into the string,
so then x would be shortcut for X00)

[updated: multiple tables are allowed (in first version I thought only one
table possible, but multiple should be no problem either...)]
[#post-char and ^pre-char MUST be present also in the str-buffer,
otherwise unpack has no possbility to get these numbers...(in first
version I thought they can be "left out" in the packed string, but this
is nonsense)]
[updated: footmark (**) in type option list - space before repeat
count...]
[updated: Table options t, T, |, : *]
[updated: q=q1 for boolean, Q=Q1 for integer 0-1]



--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

Sean Conner
It was thus said that the Great bil til once stated:
> One further offer, concerning the “n” specifier.

  [ ... snip of about 100 lines of specification ... ]

  At this point, it would be wise to actually look for an explicit module to
do the packing that you want.  One such example would be CBOR, as it's a
documented standard (RFC-7049) and there are a lot of implementations for
many languages.  It defines, I think, a nice balance between dense packing
and efficient encoding/decoding.

  Seriously, please!  Look into that [1] and stop trying to modify
string.pack()/string.unpack().  Even *IF* (and that's a big "if") the Lua
team decide to implement your ideas, they won't be in the Lua 5.4 release,
so you are looking at several *YEARS* at the minimum.

> (As I told already in my blogs before, I see no application why to support
> alignment larger than 1 Byte … and I think this alignment support bloats the
> C code quite a bit …

  And your 100+ line wish list won't bloat the C code?

  But more seriously, not every machine in existance is x86; there ARE
architetures out there with alignment restrictions STILL in use today (I use
them at work).  And they can run Lua (which I also do at work).  So
alignment is STILL important.

> but if there are good reasons to support alignment,
> then please correct me, then no problem to re-install the option ![n]’ …
> just I really do NOT understand what you mean with the option ‘Xop’ in your
> existing format table… .)

  It's for alignment purposes.  An example, if I want to pack the following
C structure [2]:

        struct foo
        {
          char c;
          int  i;
        };

        with the following packing string:

                "=! c1 XI I" -- [3]

  The "XI" part says advance to the next alignment spot and not return said
padding.  

  -spc

[1] Here's one for Lua:

                https://github.com/spc476/CBOR

        And it supports half floats, which seem to be important to you.
        Heck, here's a whole list of implementations for various languages:
       
                http://cbor.io/impls.html
       
[2] The rules for C structures is that the order is kept, but padding
        bytes can be added.

[3] I know!  The size of the integer is unspecified---it's an example.
        Deal with it!

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

bil til
Hi Sean,
thank you for your answer.

But above, you were already confirming, that some format string to transfer
data uniquely between different lua systems is necessary (e. g. LUA_32 <>
LUA_64, or this would also need uniqueness for LUA_64 (little-endian) <>
LUA_64 (big_endian)).

Alignment OF COURSE is an issue in number formats. But I do not know an
application, where alignment "in a packed number string" would be an issue.
Or do you want to specify the alignment only for unpack to specify the
number alignment? ... but unpack of course when creating numbers ALWAYS will
create lua_Number, and this then of course is the native alignment on the
machine doing the unpacking?




--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

Philippe Verdy
In reply to this post by bil til
- no 128-bit integers (including IPv6 addresses or UUIDs)?
- no 80-bit long doubles (native on x86 FPU) ?
- not all IEEE number types are supported ?
- no native vector formats (AVX/SSE/GPUs...) including for OpenCL and similar APIs
- no support of variable-length lengths (using unary encoding like in several audio-video codecs) for integers of arbitrary lengths ?
- no support for differential encodings (also used on several audio-video codecs) ? including for length prefixes (note that the mantissa part can always strip the most significant bit which is always 1, except for encoding zero with a specific length value, that bit is replaced by a sign bit): integers become then encoded much like variable length floating points.
- And why do you still need space separators between EVERY item? that space counts for an incredible size in the generated stream. We can certainly imagine a format that does not require ANY separator, using distinct prefixes only, given that you already expect 8-bit clean bytes without restriction on their values, you are already building a binary format.  
- no support for subtable backward references (allows non-tree structures, including recursion and common subbranches) ?

Note that this last point is different from classic dictionnary-based Lempel-Ziv compression, whose decompression creates a copy and does not imply shared data (and is then usable to compressing long "strings"). For data compression there's only a too limited repetition prefix, but I think that classic compression (like zlib) would perform better.

Or may be you assume that the result of your encoding will pass through a downstream compressor, but this adds an other layer used for everything, and requiring its own internal buffering (so performance and memory and CPU usage increases as well as the use of internal CPU caches with lower cache hits; if the two passes are chained ane not parallelized, this requires external buffering, so increases the memory or I/O footprint and this can be a limitation for use in small embedded systems like IoT: a streamable format working in one pass would be preferable)

For subtable backward references we have to choose if we want to mark specific items that will be referenced downstream (to avoid maintaining a large backward lookup buffer) and a way to remove old backward references no longer used downstream (this can be implemented by overwriting one of the previous markers; decoding them requires a dynamic dictionnary; if we don't mark, the dictionnary would be as large as the full size of the stream; but in most cases, cyclic structures are local to some subtables and no longer used later so a special mark to create a stack on which we would push and pop lookup-dictionnary contexts could cleanup instantly the past marks no longer referenced downstreams; this allows representing any kind of graph, not just trees)

Le dim. 20 oct. 2019 à 09:24, bil til <[hidden email]> a écrit :
One further offer, concerning the “n” specifier.

To appease the “native“ fans, I would propose to keep “n” and use “N” for
“platform-independent usage”:
“n” a lua_Number (native format, faster packing / unpacking) (*)
“N” a lua_Number (platform-independet format)

(*) warning use conversion options marked “native” only locally, they can
not be used to transfer data between lua_32bit <> lua_64bit.

To allow this, I would add 3 further type numbers to the type byte list:
0x48 (4/8/16B) a lua_Number of type integer , native format(int32 / int64 /
int128)
0x49 (4/8/16B) a lua_Number of type float, native format(float32/float64/
float 128)
0x4A (4/8B+nB) a lua_String with n Bytes, preceded by length info as native
size_t

And one further improvement to the type table: Integers please should not
only support 0x04/0x08/0x10, but best 0x1…0x10 in “optimum selection”, so
for 7, 15, 23, … significant bits (MSB always the sign bit of course…). This
would have 2 advantages:
-       tables with many integers would create much shorter strings typically.
-       The “packed string” format for an integer table gets unique. (Before
unpacking an integer table, you could check the string to some reference
string, and if you have this already, then the unpacking is not needed, as
you then can use the table of the reference string…).

After these modifications, the complete type byte would be the following:
0x00  0B Boolean False
0x01..10 1-16B int8, int16…int63 (7, 15, …63 significant bits) (warning:
unpacking into LUA_32 will use MIN_INT32/MAX_INT32 for 0x05…0x10, and
unpacking into LUA_64 will use MIN_INT64/MAX_INT64 for x09…0x10)
0x20  0B Boolen True
0x24  4B float32 normalized
0x28  8B float64=double normalized (warning: unpack in LUA_32 will cut to
float32 and use -INF/+INF if necessary )
0x30  16B float128=long double normalized (warning: unpack in LUA_32/64 will
cut to float 32/64)
0x40 0B start of table (unpack produces " { ")
0x41 0B end of table (unpack produces " } ")
0x42 0B hash part of table starts (so unpack has to produce k=v elements)
0x48 (4/8/16B) a lua_Number of type integer , native format(int32 / int64 /
int128)
0x49 (4/8/16B) a lua_Number of type float, native format(float32/float64/
float 128)
0x4A (4/8B+nB) a lua_String with n Bytes, preceded by length info as native
size_t
0x81...0xD4: nB String with 1...100 Bytes (n=1..100)
0xE4  (n+4)B String with n Bytes, preceded by 4byte signed length info
(valid range 1...2G)
0xE8  (n+8)B String with n Bytes, preceded by 8byte signed length info
(valid range 1...8GG)
0xF0...0xF8 0B Error "no_number" (0xF0+_tt info, so
0xF0=nil,0xF2=luserdata,0xF5=table,0xF6=function,0xF7=userdata,0xf8=thread)

… and the conversion option table should be changed as shown in the
following:
<: sets little endian (this is the default endianity, if no endian option is
specfied)
>: sets big endian
=: sets native endian
q[n]: a signed bit (Q[n] unsigned) n=1...16  (successive q’s are bit-packed,
anything else then is byte-packed)
b: a signed byte (char) (B: unsigned) (“shortcut” for i1/I1)
h: a signed short (H: unsigned) (“shortcut” for i2/I2)
l: a signed long (L: unsigned) (“shortcut” for i4/I4)
‘i[n]’ a signed int with n bytes (default is native size (*)) )(I[n]
unsigned), n=1..16
(warning: unpacking into LUA_32/64 will strip ints with more than 31/63
significant bits to MIN_INT/MAX_INT)
r: a short float / float16
f: a float / float32
d: a double / float64 (warning: unpacking in LUA_32 will strip accuracy to
float and in case overflow will use –INF / +INF)
D a long double / float128 (warning: unpacking in LUA_32/64 will strip
accuracy to float32/foat64 and in case overflow will use –INF / +INF)
n: a lua_Number (faster packing/unpacking) (native size (*))
N: a lua_Number (size-optimized packing, platform-independet format)
cn: a fixed  string with n Bytes
z: a zero terminated string
s[n]: a string preceded by its length coded as an unsigned integer with n
bytes (default is native size_t size for n (*))
t[n]: table (do NOT include sub-tables) (n=1: only index part, n=2: only
hash part, default: first index, then hash) (may be used ONLY at last option
in a format string)
T[n]: table (do include sub-tables) (n=1: only index part for
table+sub-tables, n=2: only hash part for table+sub tables, default: first
index, then hash for table + sub tables) (may be used ONLY at last option in
a format string)
x: one zero byte of padding
' ' or ',' or ';' empty space (ignored) (use as delimiters in the format
string)
# post-char: specifies repeat count (no byte in data str, only for format
str),  then follow-up specifiers can be written #q or #i or #z or #n to use
this repeat count.
# pre-char: specifies repeat count dynamically (previous #post-char
required)
^ pre-char: specifies bit/byte length of numbers marked [n], then follow-up
specifiers can be written q^ or i^ ...
^ post-char: specifies bit/byte length dynamically (previous #post-char
required)

(*): use “native size” options only locally on one system (do NOT use to
transfer data e. g. from LUA_32 to LUA_64 or vc. vs.)


(As I told already in my blogs before, I see no application why to support
alignment larger than 1 Byte … and I think this alignment support bloats the
C code quite a bit … but if there are good reasons to support alignment,
then please correct me, then no problem to re-install the option ![n]’ …
just I really do NOT understand what you mean with the option ‘Xop’ in your
existing format table… .)




--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

bil til
This post was updated on .
Hi Philippe,
thank you for your answer.

But maybe there are some misunderstandings:
- Long double and int128 is all included... (int128 is i16, integer with 16
bytes...for N see type codes 0.01...0x10). Just please understand, that you would get REAL support of 128bit integers (with "REAL" I mean also for unpack, also in lua_Number...) only in
some future LUA_128BIT (currently only LUA_64BIT and LUA_32BIT available, as
I see it, but it should not be any magic to extend to LUA_128BIT). What do
you mean with "all IEEE number types" - do you want longer numbers than
int128?
- variable-length lengths of strings also included (see the N type byte ... codes
0x81...0xE8)
- vectors are in lua lua_table - these are also included, see codes
0x40..0x42 (options 't' and 'T'), so this clearly also included.
- Space separtors: These spaces are ONLY in the fomat string, to make the
format more readable. But NOT in the packed string of course. Please check
the option table for ' ', ',', ';', there it clearly says "(ignored)" ...
these separators of course do NOT penetrate into the packed string.

For "different encodings" - woud you plesae give some "typical" (=widely
used I hope) example? What do you mean by "subtable backward reference"?
can you give some example?

... just please understand, that I clearly would NOT have the ambition to
create ZIP code by string.pack/unpack, this would really be a bit too much.
Besides ZIP code is EXTREMLY inefficient for smaller strings, it gets good
only after about 100...1000 bytes... . For small strings you better should
use some "small string compressing" things, but this needs to be done in
really special C code of course - this typically needs a strong usage of
tables, and somehow needs to make some pre-assumptions about your
string (e. g. string is "English ASCII text" or so), to be efficient... .
You can find nice things to do this by googling for "compressor for short
strings"...).

The last 2 paragrpahs of your answer (about "encoding downstream")  I do not
really understand ... I assume that this is somehow linked to your zip
question ... but I really would have NOT the ambition to use string
compression here in any way... . If you to combine this with string
compression, this would be no problem, but it would need a separate command
before, e. g. strcompr= compressfunction(str), and then give strcompr to
your string.pack command, this is also is NO problem... .


--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

Philippe Verdy


Le dim. 20 oct. 2019 à 15:13, bil til <[hidden email]> a écrit :
Hi Philippe,
thank you for your answer.

But maybe there are some misunderstandings:
- Long double and int128 is all included... (int128 is i16, integer with 16
bytes...). Just please understand, that you would get REAL support of 128bit
integers (with "REAL" I mean also for unpack, also in lua_Number...) only in
some future LUA_128BIT (currently only LUA_64BIT and LUA_32BIT available, as
I see it, but it should not be any magic to extend to LUA_128BIT). What do
you mean with "all IEEE number types" - do you want longer numbers than
int128.

IEEE has many types including vectorized numbers. They were added notably for GPU and signal processors. It also defines numeric semantics like interval bounds (and no infinites or nans), fixed point numbers (basically like integers with an implicit non-encoded scale), packed BCD...

- variable-length length also included (see the N type byte ... codes
0x81...0xE8

Consider what was done in recent audio/video/photo codecs, like WebP, and other open works made by Google as part of HTTP2, or other open formats developed by cloud storage providers and social networks (that helped them save a lot of storage, including reencoders preserving the accuracy for JPEG photos). What they have in common is that they no longer restrict themselves to 2-complement notations: unary representation is used for variable length data (e.g. for Huffman decrompression lookups), as well a differential encodings. And they are not restricted to byte alingment: the unit of measurement is the single bit (this is no longer a problem even on small systems, given that bit handling benefits of local caches: the most restrictive condition is not the processor speed, but the external storage in memory (for embedded devices like IoT or in shared environments like wikis with tons of users: the scripts have to run in very strict conditions that are necessary to keep the servers responsive for all their users), or in storage (for databases and cloud servers) or in transmission (for mobile networks whose data volume is restricted, even if they are now much faster, but as well on webservers for website hosting). Research in efficient data compression and representation has never been more active, even with large servers to be used by many users.

There's also the goal of preserving the energy for mobile users as well as for servers in clouds (and not just commercial clouds). The final goal is also allow easy redeployment and scalability by allowing processes to be relocated (for now this is largely based on length static decisions, but dynamic redeployments on demand is in the rise, and this includes transparenty changes of native architecture, and reliability with failover solutions to backup systems with minimal time of unavailability: the systems must be able to restart in very short time and must be able to reconfigure themselves with reasonnable defaults and then use training to converge to a stable and reliable solution that will meet the demand). For that goal, being efficiently agnostic about architectures is important: the solution must be portable with minimal efforts or automatically with minimum change of code, and we need the system to work even with very heterogeneous systems with all sorts of scales.

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

bil til
Hi Phillipe,
thank you for your answer.

So you mean IEEE float numbers, you want to have even more there? So which
ones exactly?

Just please understand that it does not make too much sense to support IEEE
numbers, which are NOT supported by the lua_number type ... and I think it
is wise of lua to concentrate thus on the "ANSI C" types. Just for
reasonable future compatibilty, I would recommend to add "long double /
float128", as this is included already in ANSI C (though of course not yet
available on LUA_32BITS / LUA_64BITS), and a special wish for greedy "small
string optimizers" like for "short packing control applications", I have
this additional wish for "r"="float16" (but you could skip this, I would not
get a nerve attack then ...).

With codec's I assume you mean something like JPEG or even MP3 compression /
decompression ... but this to my opinione really should is FAR behind any
intrinsic support in LUA ... if you want to do this you really need to
define a special "compressstring" function in you private C code ... and you
can do this if you want. JPEG or MP3 compression assumes the bit set of all
pixels of a picture is a giantic vector of 1M, or 10M bits, depending on
your image resolution, and then does some nice linear algebra with this
vector. But this you for sure can NOT do in some small C code ... this would
really be something REALLY big, needing for sure several 10-100kBytes of
code..., this needs to be EXTREMELY speed optimized, if it should run
reasonably fast, in any case I think this usually is also done fairly
platform-DEPENDENT (not independent, as lua...).





--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

Reply | Threaded
Open this post in threaded view
|

Re: string.pack with bit resolution

Philippe Verdy
Just look at this interesting list for floatting point formats (intended for data interchange):
This is something you should keep in mind
Also consider the additional formats used in FPUs for x86 and ARM (in discrete instructions)

And vector formats in SSE/AVX and APU/GPU (for OpenCL) as they are now widespread

It would be interesting to have Lua ported to work within an OpenCL engine and then be able to use GPUs and other vector extensions (which are very userful for massive data processing, which for now are still processed in legacy languages like FORTRAN, or proprietary mathematical languages and libraries).

Such work has been done for porting apps written in C, C++, Java, Perl, Python, .Net/CLI... but still not Lua and allow their parallelization. Note that I do not propose here a native parallelization in Lua, something that would be another proposal. ANSI C is just a very limited goal for Lua and it does not ease its safe integration in existing application servers or SQL engines (which could host Lua scriptlets, instead of using only Java or some old specific GL language like PL/SQL), except in specific containers running on separate isolated servers (but with poor/slow interaction, as this also requires them to communicate over some networking API, including REST API with response delays, and this is not acceptable for embedded devices and cause complex security concerns)



Le dim. 20 oct. 2019 à 16:21, bil til <[hidden email]> a écrit :
Hi Phillipe,
thank you for your answer.

So you mean IEEE float numbers, you want to have even more there? So which
ones exactly?

Just please understand that it does not make too much sense to support IEEE
numbers, which are NOT supported by the lua_number type ... and I think it
is wise of lua to concentrate thus on the "ANSI C" types. Just for
reasonable future compatibilty, I would recommend to add "long double /
float128", as this is included already in ANSI C (though of course not yet
available on LUA_32BITS / LUA_64BITS), and a special wish for greedy "small
string optimizers" like for "short packing control applications", I have
this additional wish for "r"="float16" (but you could skip this, I would not
get a nerve attack then ...).

With codec's I assume you mean something like JPEG or even MP3 compression /
decompression ... but this to my opinione really should is FAR behind any
intrinsic support in LUA ... if you want to do this you really need to
define a special "compressstring" function in you private C code ... and you
can do this if you want. JPEG or MP3 compression assumes the bit set of all
pixels of a picture is a giantic vector of 1M, or 10M bits, depending on
your image resolution, and then does some nice linear algebra with this
vector. But this you for sure can NOT do in some small C code ... this would
really be something REALLY big, needing for sure several 10-100kBytes of
code..., this needs to be EXTREMELY speed optimized, if it should run
reasonably fast, in any case I think this usually is also done fairly
platform-DEPENDENT (not independent, as lua...).





--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

12