ANN: Serpent: Lua serializer and pretty printer

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

ANN: Serpent: Lua serializer and pretty printer

Paul K
Yes, yet another serializer and pretty printer. I carefully studied
existing implementations and described my rationale and examples here:
http://notebook.kulchenko.com/programming/serpent-lua-serializer-pretty-printer.

My requirements for a serializer: (1) pure Lua (need to execute it as
part of a debugger on various platforms including mobile), (2) does
both pretty printing and robust serialization, (3) handles shared and
self-references, (4) serializes keys of various types, including
tables as keys, and (5) is short and doesn't have too many
dependencies to be included with another module.

To summarize, I want the serialized result that is as readable as
possible and is still a valid fragment that I can load with
loadstring. The implementation is fairly short and sufficiently fast
for my needs, but I'm sure it can be improved on both of those counts.

"Serpent" because it handles self-references and reminds me of a
serpent eating its own tail (http://en.wikipedia.org/wiki/Ouroboros).
Available on github.

Paul.

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Dirk Laurie-2
2012/6/6 Paul K <[hidden email]>:
> Yes, yet another serializer and pretty printer. I carefully studied
> existing implementations and described my rationale and examples here:
> http://notebook.kulchenko.com/programming/serpent-lua-serializer-pretty-printer.
>

Very nice.  I've been using Lua for 18 months but Serpent taught me
something I never quite realized before: a "long comment" can
be shorter than a "short comment".

--[[ Initialize ]]        x=2
--[[ Newton iteration ]]  repeat local y=x; x=(x+2/x)/2
--[[ Convergence test ]]  until x>=y

It runs under Lua 5.2 but does not treat "goto" as a keyword.

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Paul K
Hi Dirk,

> It runs under Lua 5.2 but does not treat "goto" as a keyword.

Right; thank you for reminding me that the list has changed in Lua
5.2. Fixed in git.

Paul.

On Wed, Jun 6, 2012 at 6:51 AM, Dirk Laurie <[hidden email]> wrote:

> 2012/6/6 Paul K <[hidden email]>:
>> Yes, yet another serializer and pretty printer. I carefully studied
>> existing implementations and described my rationale and examples here:
>> http://notebook.kulchenko.com/programming/serpent-lua-serializer-pretty-printer.
>>
>
> Very nice.  I've been using Lua for 18 months but Serpent taught me
> something I never quite realized before: a "long comment" can
> be shorter than a "short comment".
>
> --[[ Initialize ]]        x=2
> --[[ Newton iteration ]]  repeat local y=x; x=(x+2/x)/2
> --[[ Convergence test ]]  until x>=y
>
> It runs under Lua 5.2 but does not treat "goto" as a keyword.
>

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Eric Wing
In reply to this post by Paul K
On 6/6/12, Paul K <[hidden email]> wrote:

> Yes, yet another serializer and pretty printer. I carefully studied
> existing implementations and described my rationale and examples here:
> http://notebook.kulchenko.com/programming/serpent-lua-serializer-pretty-printer.
>
> My requirements for a serializer: (1) pure Lua (need to execute it as
> part of a debugger on various platforms including mobile), (2) does
> both pretty printing and robust serialization, (3) handles shared and
> self-references, (4) serializes keys of various types, including
> tables as keys, and (5) is short and doesn't have too many
> dependencies to be included with another module.
>
> To summarize, I want the serialized result that is as readable as
> possible and is still a valid fragment that I can load with
> loadstring. The implementation is fairly short and sufficiently fast
> for my needs, but I'm sure it can be improved on both of those counts.
>
> "Serpent" because it handles self-references and reminds me of a
> serpent eating its own tail (http://en.wikipedia.org/wiki/Ouroboros).
> Available on github.
>
> Paul.
>
>

Very nice! I think you managed to accomplish everything I had on my
personal wish list for a serializer.

Thanks so much for making this available!
-Eric
--
Beginning iPhone Games Development
http://playcontrol.net/iphonegamebook/

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

David Manura
In reply to this post by Paul K
On Wed, Jun 6, 2012 at 3:41 AM, Paul K <[hidden email]> wrote:
> Yes, yet another serializer and pretty printer [...]
> http://notebook.kulchenko.com/programming/serpent-lua-serializer-pretty-printer.

Dumping is a common requirement yet I've been uncomfortable using any
of the off-the-shelf dumping routines on the web.  Your module holds
some promise though (at least it has tests, documentation, design
goals, license, and versioning and works nearly as one would expect
it.)

> My requirements for a serializer: (1) pure Lua [...], (2) does
> both pretty printing and robust serialization, (3) handles shared and
> self-references, (4) serializes keys of various types, including
> tables as keys, and (5) is short and doesn't have too many
> dependencies to be included with another module.

The output looks reasonable, in general.  Some comments:

(1) Dumping _G would be a good test of this.  The output of
serpent.printmult(_G) is attached.  As seen, the main downside is that
some packages are expanded deeply inside _G.package.loaded rather than
_G.  Dumping tables at the lowest possible nesting level would be
ideal though may add some complexity.

(2) A "--[[table: 0x9aa7988]]" style comment is appended after each
table, even when the table is non-recursive.  A possible concern is
that these addresses are not deterministic, so dumps of structurally
equivalent tables in different program executions will give different
results.  This makes textual comparisons (diffs) of structures
difficult and causes large diffs when serializations are maintained
under revision control.  For readability, you may wish to replace
addresses (32- or 64-bit) with an integer counter that gives
deterministic dumps on structural equality, though insertions could
still cause massive renumbering.  Some time ago I was working on a
dumper like Perl Data::Dumper that would dump `x = {}; return
{a={b=x},c={d=x}}` as "{a={b={}}, c={d=T.a.b}}", where "T" refers to
the top-level table.  Here, T could even have a metatable such that
T.a.b evaluates to a node that is expanded on a subsequent walk
through the table.

(3) The keys are not currently being sorted.  Sorting would improve
readability and deterministic output but may affect your performance
numbers.

(4) In the attached dump of _G, the string.gmatch is curiously set to
"nil --[[ref]]".  It took me some time before realizing this was
because the deprecated string.gfind is an alias to string.gmatch, and
for gfind it outputs "= string.gmatch --[[function: 0x9b3ab28]]".
Currently, in Lua 5.1, string.gmatch would serialize as string.gfind,
which will be absent upon unserializing with for a VM's compiled with
LUA_COMPAT_GFIND undefined.  Perhaps [[ref]] should have some more
information.  It would also be ideal if it could prefer non-deprecated
functions, but that requires special cases.

(5) For pretty printing, "1\n2" would be nicer than "1\0102".  \n is
common, but beginning users might not recognize \0102.

(6) `package.loadlib` dumping shows an `--[[err]]`, which looks
worrisome as presented as an error not warning.  This occurs because
it's not in globals.  Other cases include package.loaders[i].

(7) The naming of "serialize", "printmult", and "printsing" was
somewhat jarring to me (think: "print multiple objects" and "prints
ing"/"print sing (song)").  I thought the latter would internally
invoke `print` when first running it -- not that I wanted it to
because it's not general.  Also, single- and multi-line forms of
serialization and pretty printing would suggest 2 x 2 = 4 combinations
for orthogonality.  In my recent lua-mbuild, for example (in which I
would consider adding your library), I currently use my own trivial
dumper to serialize a (non-recursive) data structure to disk in a
multi-line format (to allow easier debugging), and I'd be tempted to
call printmult even though this is serialization not pretty printing
as the name implies.  Finally, "The library provides three functions
-- serialize, printmult, and printsing -- with the last two being
shortcuts for the main serialize function." is not strictly correct:
All three are wrappers around a "local function serialize" that is not
exposed, and, as is, it's not possible to pass through a nil name in
the public version of `serialize`.

(8) The coding style is a little too compact for my taste, concerning
readability.  Usually I prefer to define variables on separate lines,
as opposed to

  local n, v, c, d = "serpent", 0.1, -- (C) 2012 Paul Kulchenko; MIT License
    "Paul Kulchenko", "Serialization and pretty printing of Lua data types"
  local keyword, globals, G = {}, {}, (_G or _ENV)
  local ttype, level = type(t), (level or 0)

The above single character top-level variable names aren't so great
either, but you might just inline those values into the table at the
bottom.  Some may also object to

  local function safestr(s) return type(s) == "number" and
(snum[tostring(s)] or s)
    or type(s) ~= "string" and tostring(s) -- escape NEWLINE/010 and EOF/026
    or ("%q"):format(s):gsub("\010","010"):gsub("\026","\\026") end

(9) Another interesting test is `_ENV = loadstring(require
"serpent".serialize(_G))(); <test suite code>`.  It won't work for all
cases, but it does still run less than trivial things like life.lua.

(10) Dumps of bytecode are probably not useful for pretty printing
unless perhaps you were to decompile the bytecode.  More useful info
may be found in debug info (including sometimes a file name with
source code), in cases debug.* is even permitted.   Some such things,
however, are probably outside the scope of a simple dumping module.
Bytecode may still have limited uses though in serialization (transfer
code between Lua states?).

(11) One area I've used dumpers is to dump AST's (dump.lua in
LuaInspect).  In cases like {tag=Op, '+', ...}, I'd prefer the named
part *before* the positional part, particularly the 'tag' name at the
very front.  This might be outside the scope of this module though.

(12) In "nil values are included when expected ({1, nil, 3} instead of
{1, [3]=3})", I'm not sure what "expected" means given the
undefinedness of # with holes.  I'd actually expect the latter format
deterministically for sparse arrays.

(13) Maybe use 1/0 rather than math.huge (after all you already rely
on 0/0) to avoid simple serializations of numerical arrays causing
lookups into `math`, which won't exist under empty _ENV's sandboxes.

g.txt (13K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Paul K
Hi David,

Thank you for the detailed feedback.

> (1) Dumping _G would be a good test of this.  The output of
> serpent.printmult(_G) is attached.  As seen, the main downside is that
> some packages are expanded deeply inside _G.package.loaded rather than
> _G.  Dumping tables at the lowest possible nesting level would be
> ideal though may add some complexity.

I agree; In fact, I tested it on _G as one of the use cases. The
packages are expanded inside _G.package.loaded because it does depth
first walk; I can see how much work it is to do a breadth first one.
This would made them expand at the lowest nesting level.

> (2) A "--[[table: 0x9aa7988]]" style comment is appended after each
> table, even when the table is non-recursive.  A possible concern is
> that these addresses are not deterministic, so dumps of structurally
> equivalent tables in different program executions will give different
> results.  This makes textual comparisons (diffs) of structures
> difficult and causes large diffs when serializations are maintained
> under revision control.  For readability, you may wish to replace
> addresses (32- or 64-bit) with an integer counter that gives
> deterministic dumps on structural equality, though insertions could
> still cause massive renumbering.  Some time ago I was working on a
> dumper like Perl Data::Dumper that would dump `x = {}; return
> {a={b=x},c={d=x}}` as "{a={b={}}, c={d=T.a.b}}", where "T" refers to
> the top-level table.  Here, T could even have a metatable such that
> T.a.b evaluates to a node that is expanded on a subsequent walk
> through the table.

These comments can be disabled permanently, but they are too useful
for pretty printing. I'm thinking about an option for these.

> (3) The keys are not currently being sorted.  Sorting would improve
> readability and deterministic output but may affect your performance
> numbers.

They used to be sorted. The problem is that they were sorted with
something like this:

          table.sort(o, function(a,b) -- sort keys; numbers first
            if tonumber(a) == tonumber(b) then return tostring(a) < tostring(b)
            else return (tonumber(a) or 0) < (tonumber(b) or 0) end end)

to get numbers first and sort the rest and it was painfully slow
(probably 30% slower than the current version on the benchmark I
included). It is a huge penalty to pay for sorted keys, although in
many cases it may not matter that much if you care about pretty
printing mostly. I decided to remove it for this reason, although it
may still leave as an option.

> (4) In the attached dump of _G, the string.gmatch is curiously set to
> "nil --[[ref]]".  It took me some time before realizing this was
> because the deprecated string.gfind is an alias to string.gmatch, and
> for gfind it outputs "= string.gmatch --[[function: 0x9b3ab28]]".
> Currently, in Lua 5.1, string.gmatch would serialize as string.gfind,
> which will be absent upon unserializing with for a VM's compiled with
> LUA_COMPAT_GFIND undefined.  Perhaps [[ref]] should have some more
> information.  It would also be ideal if it could prefer non-deprecated
> functions, but that requires special cases.

I can (and used to) point to the actual path it refers to, but the
problem was that I couldn't guarantee that the path will be safe to
deserialize. For example, I had this in one of the tests:

...nil --[[ref to a.b[a[1]]]],

which makes it invalid because of early ]]. I played with ]=+], but
even that is not a guarantee as you may have a string key that
includes that very fragment. I can analyze the string to see if it has
anything that matches my closing bracket, but it was getting too
complex/slow for my taste ;).

> (5) For pretty printing, "1\n2" would be nicer than "1\0102".  \n is
> common, but beginning users might not recognize \0102.

Agree. Will do.

> (6) `package.loadlib` dumping shows an `--[[err]]`, which looks
> worrisome as presented as an error not warning.  This occurs because
> it's not in globals.  Other cases include package.loaders[i].

This is because the value was stringified. I intentionally didn't
serialize those as their values depend on what's loaded locally and
are probably not portable. --[[err]] is simply a visual reminder this
that value is not serialized. The main function include fourth
parameter, which when set to "true" will cause a fatal error on a
values like this.

> (7) The naming of "serialize", "printmult", and "printsing" was
> somewhat jarring to me (think: "print multiple objects" and "prints
> ing"/"print sing (song)").  I thought the latter would internally
> invoke `print` when first running it -- not that I wanted it to
> because it's not general.  Also, single- and multi-line forms of
> serialization and pretty printing would suggest 2 x 2 = 4 combinations
> for orthogonality.  In my recent lua-mbuild, for example (in which I
> would consider adding your library), I currently use my own trivial
> dumper to serialize a (non-recursive) data structure to disk in a
> multi-line format (to allow easier debugging), and I'd be tempted to
> call printmult even though this is serialization not pretty printing
> as the name implies.  Finally, "The library provides three functions
> -- serialize, printmult, and printsing -- with the last two being
> shortcuts for the main serialize function." is not strictly correct:
> All three are wrappers around a "local function serialize" that is not
> exposed, and, as is, it's not possible to pass through a nil name in
> the public version of `serialize`.

That's correct and is my oversight. The intent was to be able to
overwrite the default value of '_' with anything and indeed you can
with anything except, alas, nil (and false). I'm rethinking the
interface in light of this (I have noticed the same thing working on a
couple of changes) and am open to suggestions. printmult does, in fact
pretty printing, rather than serialization, as it doesn't include the
section with all shared/circular references.

It may be easier to provide just one method (as I had originally), but
I didn't like the idea of always providing a variable name when full
serialization is needed (as in `serialize(t, '_')`).

> (8) The coding style is a little too compact for my taste, concerning
> readability.  Usually I prefer to define variables on separate lines,
> as opposed to
>
>  local n, v, c, d = "serpent", 0.1, -- (C) 2012 Paul Kulchenko; MIT License
>    "Paul Kulchenko", "Serialization and pretty printing of Lua data types"
>  local keyword, globals, G = {}, {}, (_G or _ENV)
>  local ttype, level = type(t), (level or 0)
>
> The above single character top-level variable names aren't so great
> either, but you might just inline those values into the table at the
> bottom.  Some may also object to
>
>  local function safestr(s) return type(s) == "number" and
> (snum[tostring(s)] or s)
>    or type(s) ~= "string" and tostring(s) -- escape NEWLINE/010 and EOF/026
>    or ("%q"):format(s):gsub("\010","010"):gsub("\026","\\026") end

Yes, I was trying to keep in intentionally compact, but not too terse.
Maybe went a bit overboard with this.

> (9) Another interesting test is `_ENV = loadstring(require
> "serpent".serialize(_G))(); <test suite code>`.  It won't work for all
> cases, but it does still run less than trivial things like life.lua.
>
> (10) Dumps of bytecode are probably not useful for pretty printing
> unless perhaps you were to decompile the bytecode.  More useful info
> may be found in debug info (including sometimes a file name with
> source code), in cases debug.* is even permitted.   Some such things,
> however, are probably outside the scope of a simple dumping module.
> Bytecode may still have limited uses though in serialization (transfer
> code between Lua states?).

I was thinking about using debug.* to provide original function
name/file in comments and to capture upvalues, but this was going
against my 90% rule (I was targeting 90% of use cases with the
existing code).

> (11) One area I've used dumpers is to dump AST's (dump.lua in
> LuaInspect).  In cases like {tag=Op, '+', ...}, I'd prefer the named
> part *before* the positional part, particularly the 'tag' name at the
> very front.  This might be outside the scope of this module though.

I'm frequently dumping ASTs myself and would prefer event more
changes, like having lineinfo encoded as one line while the rest may
still be multi-line and so on, but I couldn't find a straightforward
way to fit it into this code without penalty for regular use cases.

> (12) In "nil values are included when expected ({1, nil, 3} instead of
> {1, [3]=3})", I'm not sure what "expected" means given the
> undefinedness of # with holes.  I'd actually expect the latter format
> deterministically for sparse arrays.

"Expected" in the sense of being "reported" by #t. if #t return 3,
Serpent will serialize it as {1, nil, 3}, rather than {1, [3]=3}. This
should match the original representation closer (although it probably
doesn't matter in most cases). #{1, nil, 3}==3, but #{1, [3] = 3} ==
1. The array content is the "same", but #t as reported is different,
so, I was trying to match that.

> (13) Maybe use 1/0 rather than math.huge (after all you already rely
> on 0/0) to avoid simple serializations of numerical arrays causing
> lookups into `math`, which won't exist under empty _ENV's sandboxes.

1/0 would be preferred indeed, but it was a tradeoff between
readability of prettyprinted output and portability. Some people may
not recognize 1/0 as math.huge, although those that serialize
math.huge numbers probably would.

I've checked in #5 and #13 and welcome your feedback on everything
else, especially #7. Thanks again.

Paul.

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Eric Wing
My two cents.


> They used to be sorted. The problem is that they were sorted with
> something like this:
>
>           table.sort(o, function(a,b) -- sort keys; numbers first
>             if tonumber(a) == tonumber(b) then return tostring(a) <
> tostring(b)
>             else return (tonumber(a) or 0) < (tonumber(b) or 0) end end)
>
> to get numbers first and sort the rest and it was painfully slow
> (probably 30% slower than the current version on the benchmark I
> included). It is a huge penalty to pay for sorted keys, although in
> many cases it may not matter that much if you care about pretty
> printing mostly. I decided to remove it for this reason, although it
> may still leave as an option.

Sort as an option is nice. In most of my cases, I'm more concerned
about speed. So non-sorting doesn't bother me at all.


>> (10) Dumps of bytecode are probably not useful for pretty printing
>> unless perhaps you were to decompile the bytecode.  More useful info
>> may be found in debug info (including sometimes a file name with
>> source code), in cases debug.* is even permitted.   Some such things,
>> however, are probably outside the scope of a simple dumping module.
>> Bytecode may still have limited uses though in serialization (transfer
>> code between Lua states?).
>
> I was thinking about using debug.* to provide original function
> name/file in comments and to capture upvalues, but this was going
> against my 90% rule (I was targeting 90% of use cases with the
> existing code).
>

Much of my interest is in serializing parts of my Lua state to
something that can be read back in. I'm thinking about things like
save game files or things to help suspend/resume processes on devices
like iOS.

At the basic level, I would like simple toLuaCode serialization that
can be read in by loadstring. But I also have a secondary, more
complicated objective of wanting to support iOS. Because of App Store
rules, loadstring may or may not be available depending on how one
uses it and Apple interprets the rules. If I play it conservatively, I
assume I don't have loadstring, but then I might be able to fallback
to something LPeg and Leg to read in tables and basic types.

But thinking longer term and more generally to other platforms,
assuming I can use loadstring, being able to serialize everything and
reload it is terrific. But I assume that not everything can be easily
dumped to Lua script code (thinking about how string.dump works), so I
still consider bytecode useful.

If there was a way to get this additional stuff into actual code
instead of bytecode using the debug library, I think that would be
very cool. I would love to have that ability as an auxiliary function
if it was something that you didn't feel appropriate in the primary
APIs.


>> (12) In "nil values are included when expected ({1, nil, 3} instead of
>> {1, [3]=3})", I'm not sure what "expected" means given the
>> undefinedness of # with holes.  I'd actually expect the latter format
>> deterministically for sparse arrays.
>
> "Expected" in the sense of being "reported" by #t. if #t return 3,
> Serpent will serialize it as {1, nil, 3}, rather than {1, [3]=3}. This
> should match the original representation closer (although it probably
> doesn't matter in most cases). #{1, nil, 3}==3, but #{1, [3] = 3} ==
> 1. The array content is the "same", but #t as reported is different,
> so, I was trying to match that.

If I recall, the latter format {1, [3]=3} will make things harder for
the Lua compiler to optimize for size of the array and take a
non-optimal code path to fill the array. From a performance
standpoint, I would expect Lua to do much better with {1, nil, 3}. In
either case, I thought the #t will be somewhat unpredictable depending
on how the table was manipulated through its life. And I'm worried
that even if you could figure out how to preserve it, for complex
enough tables, I'm worried the performance will be terrible. I
personally accept that this value may change and rather have better
performance.

Thanks,
Eric
--
Beginning iPhone Games Development
http://playcontrol.net/iphonegamebook/

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

David Manura
In reply to this post by Paul K
On Sun, Jun 10, 2012 at 12:38 AM, Paul K <[hidden email]> wrote:
>> (2) A "--[[table: 0x9aa7988]]" style comment is appended after each
>> (3) The keys are not currently being sorted.
>> (10) Dumps of bytecode are probably not useful for pretty printing
>> (12) In "nil values are included when expected ({1, nil, 3} instead of

The above four points all are related to whether the dumper should
have the property of deterministic output under structural equality.
Either way, this property may be worth discussing in Features.

Arguments for deterministic output:

- Dumping may be used as a poor man's test of structural equality:
serialize(a) == serialize(b).  This can be useful in test suites.
More sophisticated alternatives include
https://github.com/silentbicycle/tamale .

- Sizes of diffs between dumps is reduced.  This may be useful when
the serializer is used to write data or configuration files to be
maintained under revision control.  It may also be useful in
debugging, such as diffing dumps from two separate runs.  However,
deterministic output is not the only factor affecting diff size.

- Sorting may improve readability, particularly for tables with many
elements (e.g. thousands).  However, optimal sort order is subjective
-- http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html
.

Arguments against:

- Impact on performance and code complexity?

- Not needed in some cases.

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Paul K
Hi David,

Here is what I came up with based on this discussion:

Methods:

  dump -- dump fully serialized value
  line -- pretty single-line output, no self-ref section
  long -- pretty multi-line output, no self-ref section

Options:

  name     nil/string -- name; triggers full serialization
  indent   nil/string -- indentation; triggers long multi-line output
  comment  True/false -- provide stringified value in a comment
  sortkeys true/False -- sort keys using alphanum sorting
  sparse   true/False -- force sparse encoding (no nil filling based on #t)
  compact  true/False -- remove spaces around = and after ,
  fatal    true/False -- raise fatal error on non-serilizable values
  custom   function   -- to provide custom output for tables
  nocode   true/False -- disables bytecode serialization for easy comparison

Usage:

dump(a, ...) serialize(a, {name = '_', compact = true, comment =
false, sparse = true ...})
line(a, ...) serialize(a, {sortkeys = true, ...})
long(a, ...) serialize(a, {indent = '  ', sortkeys = true, ...})

You can also provide parameters that overwrite defaults. For example,
for diff optimized output:

diff(a) dump(a, {nocode = true, indent = ' '})

Or for AST printing:

asast(a, ...) serialize(a, {custom = function(tag, head, body, tail), ...})

The custom function can print lineinfo on one line or put the tag
value first and so on.

I also plan to use the alphanumeric sort as you suggested
(http://notebook.kulchenko.com/algorithms/alphanumeric-natural-sorting-for-humans-in-lua),
but only when the sortkeys option is on.

Paul.

On Tue, Jun 12, 2012 at 12:00 AM, David Manura <[hidden email]> wrote:

> On Sun, Jun 10, 2012 at 12:38 AM, Paul K <[hidden email]> wrote:
>>> (2) A "--[[table: 0x9aa7988]]" style comment is appended after each
>>> (3) The keys are not currently being sorted.
>>> (10) Dumps of bytecode are probably not useful for pretty printing
>>> (12) In "nil values are included when expected ({1, nil, 3} instead of
>
> The above four points all are related to whether the dumper should
> have the property of deterministic output under structural equality.
> Either way, this property may be worth discussing in Features.
>
> Arguments for deterministic output:
>
> - Dumping may be used as a poor man's test of structural equality:
> serialize(a) == serialize(b).  This can be useful in test suites.
> More sophisticated alternatives include
> https://github.com/silentbicycle/tamale .
>
> - Sizes of diffs between dumps is reduced.  This may be useful when
> the serializer is used to write data or configuration files to be
> maintained under revision control.  It may also be useful in
> debugging, such as diffing dumps from two separate runs.  However,
> deterministic output is not the only factor affecting diff size.
>
> - Sorting may improve readability, particularly for tables with many
> elements (e.g. thousands).  However, optimal sort order is subjective
> -- http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html
> .
>
> Arguments against:
>
> - Impact on performance and code complexity?
>
> - Not needed in some cases.
>

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Paul K
Hi All,

I released a new Serpent version that incorporates the options we've
been discussing earlier. It's mostly as described in my previous
email, but I also added nohuge option (for those who are on a platform
that generates proper strings in those cases, but want a bit faster
implementation) and changed "long" to "block" method. As the interface
has changed, I also updated the tests, the documentation, and the
benchmark.

>>>> (2) A "--[[table: 0x9aa7988]]" style comment is appended after each
>>>> (3) The keys are not currently being sorted.
>>>> (10) Dumps of bytecode are probably not useful for pretty printing
>>>> (12) In "nil values are included when expected ({1, nil, 3} instead of

All these items should be addressed: block(a, {comment=false,
sortkeys=true, nocode=true, sparse=true}), should produce diffable
output.

Custom formatters also allow making tweaks to the output, for example,
printing of ASTs in Metalua format. You can use the following code:

print((require "serpent").block(ast, {comment = false, custom =
  function(tag,head,body,tail)
    local out = head..body..tail
    if tag:find('^lineinfo') then
      out = out:gsub("\n%s+", "") -- collapse lineinfo to one line
    elseif tag == '' then
      body = body:gsub('%s*lineinfo = [^\n]+', '')
      local _,_,atag = body:find('tag = "(%w+)"%s*$')
      if atag then
        out = "`"..atag..head.. body:gsub('%s*tag = "%w+"%s*$', '')..tail
        out = out:gsub("\n%s+", ""):gsub(",}","}")
      else out = head..body..tail end
    end
    return tag..out
  end}))

to generate this output:

{
  `Call{`Id{"require"},`String{"foobar"}},
  `Local{{`Id{"foo"}},{`Id{"foo"}}},
  `Local{{`Id{"y"}},{`Number{5}}},
  `Do{`Local{{`Id{"y"}},{`Number{6}}},`Local{{`Id{"y"}},{`Number{7}}}},
  `Localrec{{`Id{"test"}},{`Function{{`Id{"x"}},{`Call{`Id{"print"},`String{"123"},`Id{"x"},`Id{"y"},`Id{"z"}}}}}},
  `Call{`Id{"bar"},`Number{123}},
  `Set{{`Id{"bar"}},{`Function{{},{}}}},
  `Local{{`Id{"g"}},{`Function{{`Id{"w"}},{`Return{`Op{"mul",`Id{"w"},`Number{2}}}}}}},
  `Set{{`Id{"g"}},{`Number{1}}},
  `Call{`Id{"g"},`Number{1}},
}

The updated version is on github: https://github.com/pkulchenko/serpent.

Paul.

On Tue, Jun 12, 2012 at 12:33 PM, Paul K <[hidden email]> wrote:

> Hi David,
>
> Here is what I came up with based on this discussion:
>
> Methods:
>
>  dump -- dump fully serialized value
>  line -- pretty single-line output, no self-ref section
>  long -- pretty multi-line output, no self-ref section
>
> Options:
>
>  name     nil/string -- name; triggers full serialization
>  indent   nil/string -- indentation; triggers long multi-line output
>  comment  True/false -- provide stringified value in a comment
>  sortkeys true/False -- sort keys using alphanum sorting
>  sparse   true/False -- force sparse encoding (no nil filling based on #t)
>  compact  true/False -- remove spaces around = and after ,
>  fatal    true/False -- raise fatal error on non-serilizable values
>  custom   function   -- to provide custom output for tables
>  nocode   true/False -- disables bytecode serialization for easy comparison
>
> Usage:
>
> dump(a, ...) serialize(a, {name = '_', compact = true, comment =
> false, sparse = true ...})
> line(a, ...) serialize(a, {sortkeys = true, ...})
> long(a, ...) serialize(a, {indent = '  ', sortkeys = true, ...})
>
> You can also provide parameters that overwrite defaults. For example,
> for diff optimized output:
>
> diff(a) dump(a, {nocode = true, indent = ' '})
>
> Or for AST printing:
>
> asast(a, ...) serialize(a, {custom = function(tag, head, body, tail), ...})
>
> The custom function can print lineinfo on one line or put the tag
> value first and so on.
>
> I also plan to use the alphanumeric sort as you suggested
> (http://notebook.kulchenko.com/algorithms/alphanumeric-natural-sorting-for-humans-in-lua),
> but only when the sortkeys option is on.
>
> Paul.
>
> On Tue, Jun 12, 2012 at 12:00 AM, David Manura <[hidden email]> wrote:
>> On Sun, Jun 10, 2012 at 12:38 AM, Paul K <[hidden email]> wrote:
>>>> (2) A "--[[table: 0x9aa7988]]" style comment is appended after each
>>>> (3) The keys are not currently being sorted.
>>>> (10) Dumps of bytecode are probably not useful for pretty printing
>>>> (12) In "nil values are included when expected ({1, nil, 3} instead of
>>
>> The above four points all are related to whether the dumper should
>> have the property of deterministic output under structural equality.
>> Either way, this property may be worth discussing in Features.
>>
>> Arguments for deterministic output:
>>
>> - Dumping may be used as a poor man's test of structural equality:
>> serialize(a) == serialize(b).  This can be useful in test suites.
>> More sophisticated alternatives include
>> https://github.com/silentbicycle/tamale .
>>
>> - Sizes of diffs between dumps is reduced.  This may be useful when
>> the serializer is used to write data or configuration files to be
>> maintained under revision control.  It may also be useful in
>> debugging, such as diffing dumps from two separate runs.  However,
>> deterministic output is not the only factor affecting diff size.
>>
>> - Sorting may improve readability, particularly for tables with many
>> elements (e.g. thousands).  However, optimal sort order is subjective
>> -- http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html
>> .
>>
>> Arguments against:
>>
>> - Impact on performance and code complexity?
>>
>> - Not needed in some cases.
>>

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Ico Doornekamp
Hi Paul,

Thanks for your work on serpent. The following might be a handy feature,
allowing the caller to specify a max level when dumping an object. This comes
in handy for debugging when you're not interested in stuff deeply nested in
tables: http://pastebin.com/1AVwNwNq

Ico

--- serpent.lua.org 2012-06-13 15:04:36.894692373 +0200
+++ serpent.lua 2012-06-13 15:04:39.126693082 +0200
@@ -14,6 +14,7 @@
   local name, indent, fatal = opts['name'], opts['indent'], opts['fatal']
   local sparse, nocode, custom = opts['sparse'], opts['nocode'], opts['custom']
   local huge, space = not opts['nohuge'], (opts['compact'] and '' or ' ')
+  local maxlevel = opts['maxlevel']
   local seen, sref = {}, {}
   local function gensym(val) return tostring(val):gsub("[^%w]","") end
   local function safestr(s) return type(s) == "number" and (huge and snum[tostring(s)] or s)
@@ -51,6 +52,7 @@
         "loadstring("..safestr(res)..",'@serialized')"..comment(t))
       return tag..(func or globerr(t))
     elseif ttype == "table" then
+      if level >= maxlevel then return '{}'..comment('maxlevel') end
       seen[t] = spath
       if next(t) == nil then return tag..'{}'..comment(t) end -- table empty
       local maxn, o, out = #t, {}, {}
--
:wq
^X^Cy^K^X^C^C^C^C

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Paul K
In reply to this post by Paul K
Hi David,

>>> - Dumping may be used as a poor man's test of structural equality:
>>> serialize(a) == serialize(b).  This can be useful in test suites.

I made several changes to get closer to serialize(a) == serialize(b),
but there is still one issue remains. I modified the sorting such that
true (boolean) and 'true' (string) are always sorted in the same order
and removed references to addresses in serialized variables.

Unfortunately, when tables are used as keys, I can only sort them by
their stringified values, and these are simply addresses, so the order
is arbitrary. This only makes difference for the serialized fragment
that includes shared/self-references, so if you don't care about that,
your serialize(a) == serialize(b) test should work (I included it in
the tests).

The new version (https://github.com/pkulchenko/serpent) also includes
Ico's patch for maxlevel.

Paul.

On Tue, Jun 12, 2012 at 11:23 PM, Paul K <[hidden email]> wrote:

> Hi All,
>
> I released a new Serpent version that incorporates the options we've
> been discussing earlier. It's mostly as described in my previous
> email, but I also added nohuge option (for those who are on a platform
> that generates proper strings in those cases, but want a bit faster
> implementation) and changed "long" to "block" method. As the interface
> has changed, I also updated the tests, the documentation, and the
> benchmark.
>
>>>>> (2) A "--[[table: 0x9aa7988]]" style comment is appended after each
>>>>> (3) The keys are not currently being sorted.
>>>>> (10) Dumps of bytecode are probably not useful for pretty printing
>>>>> (12) In "nil values are included when expected ({1, nil, 3} instead of
>
> All these items should be addressed: block(a, {comment=false,
> sortkeys=true, nocode=true, sparse=true}), should produce diffable
> output.
>
> Custom formatters also allow making tweaks to the output, for example,
> printing of ASTs in Metalua format. You can use the following code:
>
> print((require "serpent").block(ast, {comment = false, custom =
>  function(tag,head,body,tail)
>    local out = head..body..tail
>    if tag:find('^lineinfo') then
>      out = out:gsub("\n%s+", "") -- collapse lineinfo to one line
>    elseif tag == '' then
>      body = body:gsub('%s*lineinfo = [^\n]+', '')
>      local _,_,atag = body:find('tag = "(%w+)"%s*$')
>      if atag then
>        out = "`"..atag..head.. body:gsub('%s*tag = "%w+"%s*$', '')..tail
>        out = out:gsub("\n%s+", ""):gsub(",}","}")
>      else out = head..body..tail end
>    end
>    return tag..out
>  end}))
>
> to generate this output:
>
> {
>  `Call{`Id{"require"},`String{"foobar"}},
>  `Local{{`Id{"foo"}},{`Id{"foo"}}},
>  `Local{{`Id{"y"}},{`Number{5}}},
>  `Do{`Local{{`Id{"y"}},{`Number{6}}},`Local{{`Id{"y"}},{`Number{7}}}},
>  `Localrec{{`Id{"test"}},{`Function{{`Id{"x"}},{`Call{`Id{"print"},`String{"123"},`Id{"x"},`Id{"y"},`Id{"z"}}}}}},
>  `Call{`Id{"bar"},`Number{123}},
>  `Set{{`Id{"bar"}},{`Function{{},{}}}},
>  `Local{{`Id{"g"}},{`Function{{`Id{"w"}},{`Return{`Op{"mul",`Id{"w"},`Number{2}}}}}}},
>  `Set{{`Id{"g"}},{`Number{1}}},
>  `Call{`Id{"g"},`Number{1}},
> }
>
> The updated version is on github: https://github.com/pkulchenko/serpent.
>
> Paul.

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Frank Meier-Dörnberg
Am 13.06.2012 20:12, schrieb Paul K:
> Unfortunately, when tables are used as keys, I can only ...
Maybe it is possible to allow an user defined (set of) sort function(s)?
e.g. "sortkey=my_sorter_function"
--
Cheers
Frank

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Paul K
> Maybe it is possible to allow an user defined (set of) sort function(s)?
> e.g. "sortkey=my_sorter_function"

It may be possible to do that, but it's not clear what the caller
would do in this case either.

It *may* be possible to sort the tables based on their serialized
*values*, which would be consistent, but the logic for that was
getting too complex, so I removed it.

I'm considering another option, which is to sort the serialized
values, but it has its own challenges as the order in which those
values are produced is always the correct one for deserialization and
the sorting may break that.

In the end, it may be one of the rare cases not worth worrying too much about.

Paul.

On Wed, Jun 13, 2012 at 11:22 AM, Frank Meier-Dörnberg <[hidden email]> wrote:

> Am 13.06.2012 20:12, schrieb Paul K:
>>
>> Unfortunately, when tables are used as keys, I can only ...
>
> Maybe it is possible to allow an user defined (set of) sort function(s)?
> e.g. "sortkey=my_sorter_function"
> --
> Cheers
> Frank
>

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Eric Wing
If you allow a user defined function, don't worry about the caller.
Then it becomes their problem to deal with the hard cases if they need
them. (If they do, then they probably have a specific solution that
will work for their own problem.)



On 6/13/12, Paul K <[hidden email]> wrote:

>> Maybe it is possible to allow an user defined (set of) sort function(s)?
>> e.g. "sortkey=my_sorter_function"
>
> It may be possible to do that, but it's not clear what the caller
> would do in this case either.
>
> It *may* be possible to sort the tables based on their serialized
> *values*, which would be consistent, but the logic for that was
> getting too complex, so I removed it.
>
> I'm considering another option, which is to sort the serialized
> values, but it has its own challenges as the order in which those
> values are produced is always the correct one for deserialization and
> the sorting may break that.
>
> In the end, it may be one of the rare cases not worth worrying too much
> about.
>
> Paul.
>
> On Wed, Jun 13, 2012 at 11:22 AM, Frank Meier-Dörnberg <[hidden email]>
> wrote:
>> Am 13.06.2012 20:12, schrieb Paul K:
>>>
>>> Unfortunately, when tables are used as keys, I can only ...
>>
>> Maybe it is possible to allow an user defined (set of) sort function(s)?
>> e.g. "sortkey=my_sorter_function"
>> --
>> Cheers
>> Frank
>>
>
>


--
Beginning iPhone Games Development
http://playcontrol.net/iphonegamebook/

Reply | Threaded
Open this post in threaded view
|

Re: ANN: Serpent: Lua serializer and pretty printer

Paul K
I think I have an acceptable way of doing this. Will go in with the
next version.

Paul.

On Wed, Jun 13, 2012 at 11:59 AM, Eric Wing <[hidden email]> wrote:

> If you allow a user defined function, don't worry about the caller.
> Then it becomes their problem to deal with the hard cases if they need
> them. (If they do, then they probably have a specific solution that
> will work for their own problem.)
>
>
>
> On 6/13/12, Paul K <[hidden email]> wrote:
>>> Maybe it is possible to allow an user defined (set of) sort function(s)?
>>> e.g. "sortkey=my_sorter_function"
>>
>> It may be possible to do that, but it's not clear what the caller
>> would do in this case either.
>>
>> It *may* be possible to sort the tables based on their serialized
>> *values*, which would be consistent, but the logic for that was
>> getting too complex, so I removed it.
>>
>> I'm considering another option, which is to sort the serialized
>> values, but it has its own challenges as the order in which those
>> values are produced is always the correct one for deserialization and
>> the sorting may break that.
>>
>> In the end, it may be one of the rare cases not worth worrying too much
>> about.
>>
>> Paul.
>>
>> On Wed, Jun 13, 2012 at 11:22 AM, Frank Meier-Dörnberg <[hidden email]>
>> wrote:
>>> Am 13.06.2012 20:12, schrieb Paul K:
>>>>
>>>> Unfortunately, when tables are used as keys, I can only ...
>>>
>>> Maybe it is possible to allow an user defined (set of) sort function(s)?
>>> e.g. "sortkey=my_sorter_function"
>>> --
>>> Cheers
>>> Frank
>>>
>>
>>
>
>
> --
> Beginning iPhone Games Development
> http://playcontrol.net/iphonegamebook/
>