Lua interface standards

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Lua interface standards

Dirk Laurie-2
The topic is inspired by several recent threads, and may well have been
addressed on this list before.

Briefly, I think we do not need a Python-like standard library as much
as we need standards for those common tasks that tend to entice module
writers into inventing ever-better wheels.

The standard should be as sparse as possible, so that
  (a) There is not much that the user needs to know.
  (b) The quirks of particular existing implementation do not get
so fondly described as 'standard' that no other implementation
qualiifies.

For example, a module conforming to the 'json' standard should
be loadable as `json = require "bestjson"` and provide methods
json.decode (string to table), encode (table to string), and a unique
immutable value json.null. In this case it is obvious how the table
should look, but the standard should document it. Anything else is extra.

When it comes to XML, it is no longer so obvious (does the attribute
table go into [0] or a member named 'attr'? must the code/decode be
deterministically reversible?), so the foundation or user association
or benevolent dictator has some decisions to make.

We would not need subjective evaluations (this json/xml codec
is fast/reliable etc) but only objective ones (it conforms to the standard
interface, it passes this test).

Reply | Threaded
Open this post in threaded view
|

Re: Lua interface standards

Martin
I appreciate this idea to standartize interfaces for common tasks.

Just wrote dummy JSON codec although useless, may illustrates idea:

--[[
  Absolutely correct implementation for decoding and encoding
  JSON with contents "{}".

  Conforms community JSON codec standard.
]]
return
  {
    decode =
      function(s)
        return {}
      end,
    encode =
      function(t)
        return '{}'
      end,
    null = {}, -- used as GUID for JSON "null"
  }

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Lua interface standards

Rain Gloom
In reply to this post by Dirk Laurie-2
IMHO this is more of a packaging issue. If Luarocks had a way to tell if two packages provide the same functionality, then things like loading either LPeg or LuLpeg depending on the environment would become a breeze, same goes for compatible OOP models.

I've said it before, Luarocks should take hints (lots of them) from how ArchLinux's libalpm and pacman do things, because that is a system that work very nicely and is reasonably compact.

Alternatively, why not adopt pacman? It's not pure Lua, sure, but given that most packages already need a C compiler, I think that's not really a problem. One issue is that it's not entirely cross-platform, but it has been ported to MSYS so it's not far either.

So to sum up, having standards for some packages could be a way to go, but if package management didn't suck (sorry @ Luarocks devs) de-facto standards could already work.
Reply | Threaded
Open this post in threaded view
|

Re: Lua interface standards

Martin
In reply to this post by Dirk Laurie-2
On 04/22/2017 11:11 PM, Dirk Laurie wrote:
> When it comes to XML, it is no longer so obvious (does the attribute
> table go into [0] or a member named 'attr'? must the code/decode be
> deterministically reversible?), so the foundation or user association
> or benevolent dictator has some decisions to make.

Here we need good design. Does order of attribute key-values matters?
If no, we may map their names to table. If yes, we should use
sequence. (Personally I prefer "attr" subtable for element attributes
with key-values map.)

Also there will be need to standartize behavior for bad data both
when decoding and coding. For example you can't serialize table
with table-type keys to JSON. Or table with cycles. Should error()
be raised or just (nil, err_msg) returned?

Should standard provide .verify(t) function?

What if I want to serialize parts that can be serialized and omit
others? Should standard provide .align(t) function?

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Lua interface standards

Sean Conner
In reply to this post by Dirk Laurie-2
It was thus said that the Great Dirk Laurie once stated:
> The topic is inspired by several recent threads, and may well have been
> addressed on this list before.

  Yes, but no consensus reached.

> Briefly, I think we do not need a Python-like standard library as much
> as we need standards for those common tasks that tend to entice module
> writers into inventing ever-better wheels.
>
> The standard should be as sparse as possible, so that
>   (a) There is not much that the user needs to know.
>   (b) The quirks of particular existing implementation do not get
> so fondly described as 'standard' that no other implementation
> qualiifies.

  And so the lowest-common-demoninator wins.  Woot!

  I'll bring up syslog as an example---most implementations are a literal
wrapper around the C API [1] on the grounds that it's often easier [2] and
leave the "Luaization" of the module to actual Lua code (that never gets
written for some odd reason) that means that any "standard Lua mdoule" will
be horrible to use (if I want to program in C, I know where to find it [3]).

  Now I'll switch topics and bring up signal().  Again, most Lua modules
around signal() follow the same approach as syslog() and worse, usually wrap
the POSIX version, completely ignoring ANSI C (POSIX has more functionality,
but not everything uses Lua on a POSIX system).  As such, the API I present
works the same for ANSI C and POSIX (with a bit more available under POSIX).
The minimum API (ANSI C) consists of:

        -- signal can be
        -- vwl imprd vrsn fully spelled out version
        -- ------------ --------------
        -- 'abrt' 'abort'
        -- 'fpe'
        -- 'ill' 'illegal'
        -- 'int' 'interrupt'
        -- 'segv' 'segfault'
        -- 'term' 'terminate'

        signal.catch(signal[,handler]) -- either set a flag, or run handler on signal
        signal.ignore(signal)
        signal.default(signal)
        signal.caught([signal]) -- if no signal given, all signals will be checked
        signal.raise(signal)
        signal.defined(signal) -- check for support of a given signal
        signal._implemetation -- "ANSI" or "POSIX"

  The POSIX version will support the above as is, but has further
functionality that I won't go into right now.  That's the bare minimum for
an API for signals.

  Could I implement the above using, say, luaposix?  Or some other Lua
signal module?  Probably yes, but this is why I am not hopeful about
this---because the lowest-common-demoninator, non-Luaesque version of
modules will win.

> For example, a module conforming to the 'json' standard should
> be loadable as `json = require "bestjson"` and provide methods
> json.decode (string to table), encode (table to string), and a unique
> immutable value json.null. In this case it is obvious how the table
> should look, but the standard should document it. Anything else is extra.

  So what about

        __tojson
        __jsonorder
        __jsontype
        __is_luajson
        __json

> When it comes to XML, it is no longer so obvious (does the attribute
> table go into [0] or a member named 'attr'? must the code/decode be
> deterministically reversible?), so the foundation or user association
> or benevolent dictator has some decisions to make.

  It's obvious:

        <xsl:call-template name="common-index">
          <xsl:with-param name="objects" select="section[@id != 'Errors']" />
        </xsl:call-template>

will translate to:

        {
          [0]   = "call-template",
          xmlns = "http://www.w3.org/1999/XSL/Transform",
          name  = "common-index"
          [1] = {
                [0]    = "with-param",
                xmlns  = "http://www.w3.org/1999/XSL/Transform",
                name   = "objects",
                select = "section[@id != 'Errors']",
          },
        }

  -spc (Oh, and let us not forget about metatables ... )

[1] http://lua-users.org/lists/lua-l/2015-03/msg00001.html

[2] "Because I'm lazy and don't like programming in C"

[3] I think I've said this before ... oh yeah ...

        http://lua-users.org/lists/lua-l/2016-03/msg00010.html
        http://lua-users.org/lists/lua-l/2016-03/msg00152.html
        http://lua-users.org/lists/lua-l/2016-07/msg00113.html
        http://lua-users.org/lists/lua-l/2015-01/msg00687.html

        and possibly other messages ...

Reply | Threaded
Open this post in threaded view
|

Re: Lua interface standards

Sean Conner
In reply to this post by Martin
It was thus said that the Great Martin once stated:
>
> Also there will be need to standartize behavior for bad data both
> when decoding and coding. For example you can't serialize table
> with table-type keys to JSON. Or table with cycles. Should error()
> be raised or just (nil, err_msg) returned?

  I actually prefer (nil, err_value).  First, the caller might be able to
deal with the error (don't force me to pcall() just to capture an error I
can handle), and an error value is easier to check than a string (especially
given locales).  It's not uncommon for my code to do:

        local remote,data,err = socket:recv()
        if not remote then
          if err == errno.ETIMEDOUT then
            -- timed out, probably return or something
          end

          if err ~= errno.EINTR then
            syslog('error',"socket:recv() = %s",errno[err])
          end

        else
          -- happy path
        end

> Should standard provide .verify(t) function?

  What should this do?

> What if I want to serialize parts that can be serialized and omit
> others? Should standard provide .align(t) function?

  Least-common-demoninitor behavior is to assume error() and if it can't
serialize what you give it, then error().

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: Lua interface standards

Martin
On 04/23/2017 12:54 AM, Sean Conner wrote:

> It was thus said that the Great Martin once stated:
>>
>> Also there will be need to standartize behavior for bad data both
>> when decoding and coding. For example you can't serialize table
>> with table-type keys to JSON. Or table with cycles. Should error()
>> be raised or just (nil, err_msg) returned?
>
>   I actually prefer (nil, err_value).  First, the caller might be able to
> deal with the error (don't force me to pcall() just to capture an error I
> can handle), and an error value is easier to check than a string (especially
> given locales).
Me too. But error() is good stopping berserked user scripts that does
not bother to check success of called function.

>> Should standard provide .verify(t) function?
>
>   What should this do?

verify(data) should return true if codec fully understand given <data>.
Else it should return nil and error message. So it guaranties that
subsequent call to encode() or decode() will not raise data-dependent
errors. (In many cases checking input before actual work is cheaper in
resources than interrupting work due inconsistent data.)

But this conflicts with least-common-denominator philosophy, so
let's drop this function. (Else we should provide one verify() to check
string before decoding and other verify() to check table before
encoding.)

>> What if I want to serialize parts that can be serialized and omit
>> others? Should standard provide .align(t) function?
>
>   Least-common-demoninitor behavior is to assume error() and if it can't
> serialize what you give it, then error().

error() is too ridiculous. Suppose user calls json.encode(_G).
Can it be fully serialized? No. Can it be partly serialized? Yes.
What should be done to serialize it partly? align() data before
encoding.

So user have options to call
  s = json.encode(_G)

which will return nil or raise error() but

  data = json.align(_G)
  s = json.encode(data)

will do job to some degree.

Similar problems will rise if we want standard for lua table
serializers, which encodes table to string with lua code.
AFAIK noone of them can handle table with cycles, with
metatables and with table keys.

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Lua interface standards

Sean Conner
It was thus said that the Great Martin once stated:

> On 04/23/2017 12:54 AM, Sean Conner wrote:
> > It was thus said that the Great Martin once stated:
> >> Should standard provide .verify(t) function?
> >
> >   What should this do?
>
> verify(data) should return true if codec fully understand given <data>.
> Else it should return nil and error message. So it guaranties that
> subsequent call to encode() or decode() will not raise data-dependent
> errors. (In many cases checking input before actual work is cheaper in
> resources than interrupting work due inconsistent data.)

  Um ... what?

  The work that verify() would do is pretty much the same that
encode()/decode() would do, so it doesn't make sense.  Besides, someone
could *still* call encode()/decode() and skip calling verify() entirely, so
you *still* have to do the work in eocode()/decode().

> But this conflicts with least-common-denominator philosophy,

  I'm not sure if I made myself clear, but I'm decrying the
"least-common-denominator" philosophy [1], not praising it.

> so
> let's drop this function. (Else we should provide one verify() to check
> string before decoding and other verify() to check table before
> encoding.)

  And how do you enforce calling verify() prior to encode()/decode()?

> >> What if I want to serialize parts that can be serialized and omit
> >> others? Should standard provide .align(t) function?
> >
> >   Least-common-demoninitor behavior is to assume error() and if it can't
> > serialize what you give it, then error().
>
> error() is too ridiculous. Suppose user calls json.encode(_G).
> Can it be fully serialized?

  Maybe.  See below.

> No. Can it be partly serialized? Yes.
> What should be done to serialize it partly? align() data before
> encoding.

  The name "align()" is a bad name for this.  "sanitize()" would be better.

> So user have options to call
>   s = json.encode(_G)
>
> which will return nil or raise error() but
>
>   data = json.align(_G)
>   s = json.encode(data)
>
> will do job to some degree.
>
> Similar problems will rise if we want standard for lua table
> serializers, which encodes table to string with lua code.
> AFAIK noone of them can handle table with cycles, with
> metatables and with table keys.

  My CBOR [3] module [4] can deal with cycles, metatables and tables as
keys.  An earlier version (not released) could also deal with serializing
_G, or even _G.io, _G.math, etc.  One of the neat things about CBOR is that
one can semantically mark data, so I got around serializing _G by marking
the string literal "_G" as "standard Lua object" (same with serializing
functions like print() or userdata like io.stdio).  So it can be done, to a
degree.  I even started on serializing actual Lua functions (as long as they
were written in Lua)---first by sending the actual code if avaiable and
failing that, as PUC Lua bytecode in an architecture neutral format (never
finished that part).

  -spc

[1] One of the worse offenders of this is Danial Bernstein, who's code
        assumes *no standard library whatsoever* and provides his own
        version of routines like memcpy() or strlen() [2].

[2] Granted, back in 1990, that might have made a bit of sense since C
        (and its standard library) was only standardized the previous year.
        But most of his code was written nearly a decade later.

[3] Concise Binary Object Representation http://cbor.io/

[4] https://github.com/spc476/CBOR

        Also available via LuaRocks as "org.conman.cbor"

Reply | Threaded
Open this post in threaded view
|

Re: Lua interface standards

Martin
Thank you for reply!

I've just had a nice time reading about Daniel Bernstein
(which occurs to be a cool guy really understanding number
theory and cryptanalysis), CBOR RFC (heh, coauthored by
C. Bormann) and your code of CBOR codec.

On 04/23/2017 02:05 AM, Sean Conner wrote:

>   The work that verify() would do is pretty much the same that
> encode()/decode() would do, so it doesn't make sense.  Besides, someone
> could *still* call encode()/decode() and skip calling verify() entirely, so
> you *still* have to do the work in eocode()/decode().

Yes, I agree. There is no need to verify() function and no way to
enforce it's usage.

>> What should be done to serialize it partly? align() data before
>> encoding.
>
>   The name "align()" is a bad name for this.  "sanitize()" would be better.

Yes, probably "align" is not a very suitable name. From the other hand
"sanitize" triggers some medical associations in my mind. Something
like "biohazard", "contamination", "desinfection", "desinsection".
Other variant is "adjust".

(Or any other suitable name for function which removes some data from
given table (breaks cycles, removes all keys that are not strings
(for JSON object) or not elements of sequence (for JSON array), removes
all values that are not tables, strings, numbers, booleans or json.null).)

> [1] One of the worse offenders of this is Danial Bernstein, who's code
> assumes *no standard library whatsoever* and provides his own
> version of routines like memcpy() or strlen() [2].

On Joel Spolsky site [1] stated that for some period Microsoft Excel
team had it's own C compiler for project. I don't consider this bad.
If someone likes reimplementing stuff despite possibility to reuse it -
why not?

[1]
https://www.joelonsoftware.com/2001/10/14/in-defense-of-not-invented-here-syndrome/

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Lua interface standards

Jay Carlson
In reply to this post by Dirk Laurie-2
tl;dr: JSON and XML are relatively easy to standardize because they have strong specifications. We don't have to design as much. That being said, there are still a lot of design problems even in those simple cases.

I warned you this was long.

> On Apr 23, 2017, at 2:11 AM, Dirk Laurie <[hidden email]> wrote:
>
> Briefly, I think we do not need a Python-like standard library as much
> as we need standards for those common tasks that tend to entice module
> writers into inventing ever-better wheels.
>
> The standard should be as sparse as possible, so that
>  (a) There is not much that the user needs to know.
>  (b) The quirks of particular existing implementation do not get
> so fondly described as 'standard' that no other implementation
> qualiifies.

I agree that another possibility should be automated testing; it could define the first cut at interface compatibility. Then we can argue about specific test IDs, which tends to focus the mind more than generalities.

>
> For example, a module conforming to the 'json' standard should
> be loadable as `json = require "bestjson"` and provide methods
> json.decode (string to table), encode (table to string), and a unique
> immutable value json.null. In this case it is obvious how the table
> should look, but the standard should document it. Anything else is extra.
>
> When it comes to XML, it is no longer so obvious (does the attribute
> table go into [0] or a member named 'attire'?

My first attempt at documenting XML trees was in the 4.0 http://lua-users.org/wiki/XmlTree . And let's not forget: is the name of the tag called "name" or "tag"? One of them is *wrong*. :-)

I gave up on being right. LuaExpat presented lxp.lom as the API, and LOM trees were effectively standardized. So I switched to that. Have I mentioned semi-blessing?

It is not obvious to me whether LOM elements must be pure tables (i.e., no metatables, no proxies/userdata). That this is not obvious is probably because lxp.lom is intended as an implementation, and most people have the source to their tools; you can just read the code to answer questions like that. But if there's really an ecosystem around a data type, everybody may not be clear on which tacit requirements everybody else has. For example, here's something I've written and complained about too many times:

  for i=1,#t.attr do
    t.attr[i] = nil
  end

since I don't want pairs() to get the annoying numeric attributes. (This happens to be explicitly allowed by Matthew's spec.)

My typical style is to put all kinds of non-XML annotation on LOM nodes. That is, I expect to be able to write t.foo = "bar", and have t.foo stay with the table, but not affect its XML content. A pattern for XML processing is tearing down the DOM tree to build your own tree, and the ability to scribble notes on the LOM nodes is very helpful--for starters, you can make a .parent link if you really need it. Is this allowed by the LOM?

Anybody formalizing LOM further should take a look at my ancient http://lua-users.org/wiki/LazyTree for a weird implementation of XML parse trees. [1] I think it's *probably* easily interoperable with everybody--well, everybody in the, uh, Lua 4.0 XML ecosystem. I should clean up some 5.x versions. Didn't everybody switch from XML to JSON already though? Maybe that makes this discussion easier...

One thing's still true. In case there are any pure table-centric implementations (and there are), we can't call methods on LOM nodes. So nearly everybody is going to need external functions for tree walkers, iterating over tags with certain names, etc. (Since Lua is the reimplement-it-yourself language, you probably end up with a good chunk of DOM/XPath/XFoo functionality replicated in your private nop.lom.* library. I bet it's super-fast though.)

One of the functions nearly everybody has to use is lom.unparse, and it is almost as critical to get right as parsing.[2] Can I mix and match parsers and unparsers from different projects?

If interface definitions are in our future, I agree it would be very education for us to take the LOM spec and see if we can document it enough such that interesting alternative implementations are possible. I could hack up a 5.3 LazyTree if anybody wants to try tests against it.

One appealing test is a tree walker for consolidation of text runs, so you will always have {"ab"} instead of {"a", "b"}. This can be done inside a LOM parser relatively easily, but since implementations are free to break up text runs, many apps need to clean the text runs themselves in some way. [3] Another possibility is a tree walker that metatable-izes a tree in place, or makes a metatable-ized copy. Hey, at least the copy will share strings...

> must the code/decode be
> deterministically reversible?

XML applications must treat attribute order as insignificant, so any order is equally (non-)deterministic at the XML level. Only humans hand-editing XML have a case that the surface syntax should be left alone. [4]

Canonicalization of XML is a whole separate topic, but if you needed to know about it, you probably already do. Anyway, I tend to sort attribute names, but I often have controllable degrees of perversity in XML unparsing. :-)

> We would not need subjective evaluations (this json/xml codec
> is fast/reliable etc) but only objective ones (it conforms to the standard
> interface, it passes this test).

Many of the claimed XML parsers for Lua are nowhere near XML parsers--they fail to implement basic requirements. I'm not talking about DTDs, I'm talking character sets and escaping. Is that a quality-of-implementation issue? Note that many of the pure-Lua parsers explicitly say they don't conform, but people still like them. Do we knock them out in the test suite?

Jay

[1]: A lazy-loading tree makes sense for large JSON documents too, and the same consumption techniques apply for user-friendly high efficiency. Maybe I should extend LazyKit to JSON. Then I can find out how implementation-independent Dirk's hypothetical Lua ison module interface spec is....

[2]: Matthew Wild pointed out that using a strong parser like expat solves many problems when you are just cutting&pasting opaque trees. You should still see "HOWTO Avoid Being Called a Bozo When Producing XML" at https://hsivonen.fi/producing-xml/#serializer for the full argument. The whole document is great, and relevant to some parts of JSON processing too, so none of you are allowed to skip it.

[3]: Again, a lot of XML processing involves passing LOM subtrees opaquely; in that case, only the unparser would notice unconsolidated text runs, and it probably doesn't care.

[4]: And sometimes humans like the pretty-printed version better than the "untouched" version. But I already said that on the list recently.


Reply | Threaded
Open this post in threaded view
|

Re: Lua interface standards

Dirk Laurie-2
2017-04-24 0:10 GMT+02:00 Jay Carlson <[hidden email]>:

> LuaExpat presented lxp.lom as the API, and LOM trees were
> effectively standardized. So I switched to that. Have I mentioned
> semi-blessing?

Only semi, in this case. LHF has also provided us with an XML module,
which does not use the LOM.

BTW, on my system, if I say `lua -l xml`, I get a module which illustrates
niclely why standards would be useful.

1. I tried 'xml._version' since many modules put useful information in there.
The result was
/usr/local/share/lua/5.3/lub/Autoload.lua:28: module 'xml._version' not found:
    no field package.preload['xml._version']
    no file '/usr/local/share/lua/5.3/xml/_version.lua'
    no file '/usr/local/share/lua/5.3/xml/_version/init.lua'
    no file '/usr/local/lib/lua/5.3/xml/_version.lua'
    no file '/usr/local/lib/lua/5.3/xml/_version/init.lua'
    no file './xml/_version.lua'
    no file './xml/_version/init.lua'
    no file '/usr/local/lib/lua/5.3/xml/_version.so'
    no file '/usr/local/lib/lua/5.3/loadall.so'
    no file './xml/_version.so'
    no file '/usr/local/lib/lua/5.3/xml.so'
    no file '/usr/local/lib/lua/5.3/loadall.so'
    no file './xml.so'
There is clearly some cleverness with a function as __index metamethod
for the module table itself.
2. used 'pairs' to I list what is in the module and learnt that it has
VERSION and
DESCRIPTION. These look like documentation.
3, The result of xml.VERSION is "1.1.2".
4. The result of xml.DESCRIPTION is a table, not a string.

So I still don't know what this module calls itself. I probably installed it
via LuaRocks. Wait --- the first line of the error message suggests that
the module is called 'lub'. Let's ask LuaRocks for help.

$ luarocks doc lub

~~~~
Documentation files for lub 1.1.0-1
-----------------------------------

/usr/local/lib/luarocks/rocks/lub/1.1.0-1/doc/
    index.html

Opening /usr/local/lib/luarocks/rocks/lub/1.1.0-1/doc/index.html ...
~~~~

Aha! A page is opening on my browser ... here it comes ...

~~~Online documentation
doc.lubyk.org/lub.html
Generate
lua scripts/makedoc.lua && open html/index.html
~~~~

Click on the link ... namebright.com tells me that lubyk.org is coming
soon.

I AM NOT GIVING UP YET.

~~~
> xml.DESCRIPTION.summary
Very fast xml parser based on RapidXML
> xml.DESCRIPTION.detailed
    This module is part of the Lubyk project.

    Main features are:
     - Fast and easy to use
     - Complete documentation
     - Based on proven code (RapidXML)
     - Full test coverage

    Read the documentation at http://doc.lubyk.org/xml.html.
~~~

Catch-22 ...

Can you see why I press for standards?