[ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor

Gavin Kistner-4
For a [separate project][1] I needed a pure-Lua XML parser. Unsatisfied with my previous [quick-n-dirty pattern-based XML parser][2], I've created a streaming parser that is far more robust. I give you:

SLAXML - https://github.com/Phrogz/SLAXML
(It's pronounced "Slacks-Em-Ell")

Copy/pasting key sections from the README:

## Features

* Pure Lua in a single file (two files if you use the DOM parser).
* Streaming parser does a single pass through the input and reports what it sees along the way.
* Supports processing instructions (`<?foo bar?>`).
* Supports comments (`<!-- hello world -->`).
* Supports CDATA sections (`<![CDATA[ whoa <xml> & other content as text ]]>`).
* Supports namespaces, resolving prefixes to the proper namespace URI (`<foo xmlns="bar">` and `<wrap xmlns:bar="bar"><bar:kittens/></wrap>`).
* Supports unescaped greater-than symbols in attribute content (a common failing for simpler pattern-based parsers).
* Unescapes named XML entities (`&lt; &gt; &amp; &quot; &apos;`) and numeric entities (e.g. `&#10;`) in attributes and text nodes (but—properly—not in comments or CDATA). Properly handles edge cases like `&#38;amp;`.
* Optionally ignore whitespace-only text nodes (as appear when indenting XML markup).
* Includes a DOM parser that is a both a convenient way to pull in XML to use as well as a nice example of using the streaming parser.
* Adds only a single `SLAXML` key to the environment; there is no spam of utility functions polluting the global namespace.

## Usage
    require 'slaxml'

    local myxml = io.open('my.xml'):read()

    -- Specify as many/few of these as you like
    parser = SLAXML:parser{
      startElement = function(name,nsURI)       end, -- When "<foo" or <x:foo is seen
      attribute    = function(name,value,nsURI) end, -- attribute found on current element
      closeElement = function(name,nsURI)       end, -- When "</foo>" or </x:foo> or "/>" is seen
      text         = function(text)             end, -- text and CDATA nodes
      comment      = function(content)          end, -- comments
      pi           = function(target,content)   end, -- processing instructions e.g. "<?yes mon?>"
      namespace    = function(nsURI)            end, -- when xmlns="..." is seen (after startElement)
    }

    -- Ignore whitespace-only text nodes and strip leading/trailing whitespace from text
    -- (does not strip leading/trailing whitespace from CDATA)
    parser:parse(myxml,{stripWhitespace=true})

If you just want to see if it will parse your document correctly, you can simply do:

    require 'slaxml'
    SLAXML:parse(myxml)

…which will cause SLAXML to use its built-in callbacks that print the results as seen.

## Known Limitations / TODO
- Does not require or enforce well-formed XML (silently ignores and consumes certain syntax errors)
- No support for entity expansion other than
  `&lt; &gt; &quot; &apos; &amp;` and numeric ASCII entities like `&#10;`
- XML Declarations (`<?xml version="1.x"?>`) are incorrectly reported
  as Processing Instructions
- No support for DTDs
- No support for extended characters in element/attribute names
- No support for [XInclude](http://www.w3.org/TR/xinclude/)



[1]: https://github.com/Phrogz/LXSC
[2]: http://phrogz.net/lua/AKLOMParser.lua
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor

Miles Bader-2
Gavin Kistner <[hidden email]> writes:
> * Adds only a single `SLAXML` key to the environment; there is no spam
> of utility functions polluting the global namespace.

It should not add _any_ keys to the global environment.

> ## Usage
>     require 'slaxml'

local slaxml = require 'slaxml'

-miles

--
永日の 澄んだ紺から 永遠へ

Reply | Threaded
Open this post in threaded view
|

local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Gavin Kistner-4
On Feb 17, 2013, at 10:15 PM, Miles Bader <[hidden email]> wrote:

> Gavin Kistner <[hidden email]> writes:
>> * Adds only a single `SLAXML` key to the environment; there is no spam
>> of utility functions polluting the global namespace.
>
> It should not add _any_ keys to the global environment.
>
>> ## Usage
>>    require 'slaxml'
>
> local slaxml = require 'slaxml'

Thank you for the suggestion. I've updated the library to use this pattern.

This works fine for this project where I only have two files. How would others suggest enforcing the same pattern for a different project that has many files all augmenting the same table?

For example, in LXSC[1] I currently have 10+ files like so:

    lxsc.lua
    lib/event.lua
    lib/scxml.lua
    lib/state.lua
    lib/...etc...

and I use this to build the common object like so:

    # lxsc.lua
    LXSC = { VERSION="0.3.1" }
    require 'lib/state'
    require 'lib/scxml'
    require 'lib/event'
    ...etc.

    # lib/event.lua
    LXSC.Event = { ... }

    # lib/scxml.lua
    LXSC.SCXML = { ... }
   
    # lib/state.lua
    LXSC.State = { ... }

How might I modify this small multiple of files to work together in a way that doesn't spam the global namespace?

I can't do something like:

    # lxsc.lua
    local LXSC = { VERSION="0.3.1" }
    require 'lib/state'
    return LXSC

    # lib/state.lua
    local LXSC = require 'lxsc'
    LXSC.State = { ... }

…because that causes a loop in the loader. Should I instead do this?

    LXSC = { VERSION="0.3.1" } # Spam the global for now
    require 'lib/state'
    require 'lib/scxml'
    require 'lib/event'
    local LXSC = LXSC
    _G.LXSC = nil # remove the global spam
    return LXSC

Your experienced advice is requested :)


[1] https://github.com/Phrogz/LXSC
Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Justin Cormack
all requires should be assigning to a local variable.

you either need the sublibs for the main one or the user requires them.

so either user does

local state = require "lib/state"
local event = require "lib/event"

or main lib does

local scxml = {}

scxml.state = require "lib/state"
scxml.event = require "lib/event"

return scxml

then user does

local scxml = require "scxml"

scxml.state.blah() etc









On Mon, Feb 18, 2013 at 2:43 PM, Gavin Kistner <[hidden email]> wrote:
On Feb 17, 2013, at 10:15 PM, Miles Bader <[hidden email]> wrote:

> Gavin Kistner <[hidden email]> writes:
>> * Adds only a single `SLAXML` key to the environment; there is no spam
>> of utility functions polluting the global namespace.
>
> It should not add _any_ keys to the global environment.
>
>> ## Usage
>>    require 'slaxml'
>
> local slaxml = require 'slaxml'

Thank you for the suggestion. I've updated the library to use this pattern.

This works fine for this project where I only have two files. How would others suggest enforcing the same pattern for a different project that has many files all augmenting the same table?

For example, in LXSC[1] I currently have 10+ files like so:

    lxsc.lua
    lib/event.lua
    lib/scxml.lua
    lib/state.lua
    lib/...etc...

and I use this to build the common object like so:

    # lxsc.lua
    LXSC = { VERSION="0.3.1" }
    require 'lib/state'
    require 'lib/scxml'
    require 'lib/event'
    ...etc.

    # lib/event.lua
    LXSC.Event = { ... }

    # lib/scxml.lua
    LXSC.SCXML = { ... }

    # lib/state.lua
    LXSC.State = { ... }

How might I modify this small multiple of files to work together in a way that doesn't spam the global namespace?

I can't do something like:

    # lxsc.lua
    local LXSC = { VERSION="0.3.1" }
    require 'lib/state'
    return LXSC

    # lib/state.lua
    local LXSC = require 'lxsc'
    LXSC.State = { ... }

…because that causes a loop in the loader. Should I instead do this?

    LXSC = { VERSION="0.3.1" } # Spam the global for now
    require 'lib/state'
    require 'lib/scxml'
    require 'lib/event'
    local LXSC = LXSC
    _G.LXSC = nil # remove the global spam
    return LXSC

Your experienced advice is requested :)


[1] https://github.com/Phrogz/LXSC

Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Gavin Kistner-4
On Feb 18, 2013, at 09:59 AM, Justin Cormack <[hidden email]> wrote:
all requires should be assigning to a local variable.
[...]
local scxml = {}

scxml.state = require "lib/state"
scxml.event = require "lib/event"

return scxml
 
This works for the simplest cases I supplied where each file adds exactly one completely independent table to the master. However, I have three more cases that aren't quite as clean:
* Multiple Datatypes per File
* Multiple Files Augmenting the Same Table
* One File Referencing Tables from Another


1) Multiple Datatypes per File
One of the file declares a few (currently-global) new datatypes that are used by various methods throughout. Would you recommend:

    # lxsc.lua
    local LXSC = {}
    ...require others...
    for name,t in pairs(require('lib/datatypes') do LXSC[name]=t end
    return LXSC

    # lib/datatypes.lua
    local Queue = {}; ...
    local OrderedSet = {}; ...
    local List = {}; ...
    return {Queue=Queue, OrderedSet=OrderedSet, List=List}

This is easily worked around by making one file per 'object', but that feels like an ugly burden. Further, while this makes these classes available on the master LXSC table, there is no way for the sub-tables to access this master. Which brings me more explicitly to problem #2:


2) Multiple Files Augmenting the Same Table
Some of the files do not add a new table, but augment another existing table. For example:

    # lxsc.lua
    LXSC = {}
    require 'lib/scxml'
    require 'lib/runtime'

    # lib/scxml.lua
    LXSC.SCXML = {}

    # lib/runtime.lua
    function LXSC.SCXML:go() ... end -- modify a table defined elsewhere

If there was only one case like this, I could 'invert' the hierarchy like so:

    # lxsc.lua
    local LXSC = {}
    LXSC.SCXML = require 'lib/runtime'
    return LXSC

    # lib/runtime.lua
    local SCXML = require 'lib/scxml'
    function SCXML:go() ... end -- modify a table defined elsewhere
    return SCXML

    # lib/scxml.lua
    local SCXML = {}
    return SCXML

...but that technique does not work when multiple files augment the same datatype. Is there a pattern that allows multiple files to work on the same common table without polluting the global namespace?


3) One File Referencing Tables from Another
Various files and methods need to talk to other high-level objects in the system. For example:

    # lib/parse.lua
    function LXSC:parse(...) -- augments the master table
      local name = getSomeNameString()
      local object = LXSC[name]() -- dynamically picks methods from the master table
    end

    # lib/transition.lua
    function Transition:addTarget(...)
      self.targets = LXSC.List() -- accesses a datatype from the namespace
    end

How would I cause the master table to be exposed to each child file that needs to access it?

Is setfenv() the proper way to go about all this? Is there something better?



On Mon, Feb 18, 2013 at 2:43 PM, Gavin Kistner <[hidden email]> wrote:
On Feb 17, 2013, at 10:15 PM, Miles Bader <[hidden email]> wrote:

> Gavin Kistner <[hidden email]> writes:
>> * Adds only a single `SLAXML` key to the environment; there is no spam
>> of utility functions polluting the global namespace.
>
> It should not add _any_ keys to the global environment.
>
>> ## Usage
>>    require 'slaxml'
>
> local slaxml = require 'slaxml'

Thank you for the suggestion. I've updated the library to use this pattern.

This works fine for this project where I only have two files. How would others suggest enforcing the same pattern for a different project that has many files all augmenting the same table?

For example, in LXSC[1] I currently have 10+ files like so:

    lxsc.lua
    lib/event.lua
    lib/scxml.lua
    lib/state.lua
    lib/...etc...

and I use this to build the common object like so:

    # lxsc.lua
    LXSC = { VERSION="0.3.1" }
    require 'lib/state'
    require 'lib/scxml'
    require 'lib/event'
    ...etc.

    # lib/event.lua
    LXSC.Event = { ... }

    # lib/scxml.lua
    LXSC.SCXML = { ... }

    # lib/state.lua
    LXSC.State = { ... }

How might I modify this small multiple of files to work together in a way that doesn't spam the global namespace?

I can't do something like:

    # lxsc.lua
    local LXSC = { VERSION="0.3.1" }
    require 'lib/state'
    return LXSC

    # lib/state.lua
    local LXSC = require 'lxsc'
    LXSC.State = { ... }

…because that causes a loop in the loader. Should I instead do this?

    LXSC = { VERSION="0.3.1" } # Spam the global for now
    require 'lib/state'
    require 'lib/scxml'
    require 'lib/event'
    local LXSC = LXSC
    _G.LXSC = nil # remove the global spam
    return LXSC

Your experienced advice is requested :)


[1] https://github.com/Phrogz/LXSC

Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Gavin Kistner-4
Put more simply: how would you rewrite the following suite of files so that the user can "require 'master'" and not spam the global namespace with "MASTER", but still have all the assertions pass?

    ### _test_usage_.lua
    require 'master'
    
    assert(MASTER.Simple)
    assert(MASTER.simple)
    assert(MASTER.Shared)
    assert(MASTER.Shared.go)
    assert(MASTER.Simple.ref1()==MASTER.Multi1)
    assert(pcall(MASTER.Simple.ref2))
    
    ### master.lua
    MASTER = {}
    require 'simple'
    require 'multi'
    require 'shared1'
    require 'shared2'
    require 'reference'
    
    ### simple.lua
    MASTER.Simple = {}
    function MASTER:simple() end
    
    ### multi.lua
    MASTER.Multi1 = {}
    MASTER.Multi2 = {}
    
    ### shared1.lua
    MASTER.Shared = {}
    
    ### shared2.lua
    function MASTER.Shared:go() end
    
    ### reference.lua
    function MASTER.Simple:ref1() return MASTER.Multi1 end
    function MASTER.Simple:ref2() MASTER:simple()      end
    

On Feb 18, 2013, at 10:32 AM, Gavin Kistner <[hidden email]> wrote:

On Feb 18, 2013, at 09:59 AM, Justin Cormack <[hidden email]> wrote:
all requires should be assigning to a local variable.
[...]
local scxml = {}

scxml.state = require "lib/state"
scxml.event = require "lib/event"

return scxml
 
This works for the simplest cases I supplied where each file adds exactly one completely independent table to the master. However, I have three more cases that aren't quite as clean:
* Multiple Datatypes per File
* Multiple Files Augmenting the Same Table
* One File Referencing Tables from Another


1) Multiple Datatypes per File
One of the file declares a few (currently-global) new datatypes that are used by various methods throughout. Would you recommend:

    # lxsc.lua
    local LXSC = {}
    ...require others...
    for name,t in pairs(require('lib/datatypes') do LXSC[name]=t end
    return LXSC

    # lib/datatypes.lua
    local Queue = {}; ...
    local OrderedSet = {}; ...
    local List = {}; ...
    return {Queue=Queue, OrderedSet=OrderedSet, List=List}

This is easily worked around by making one file per 'object', but that feels like an ugly burden. Further, while this makes these classes available on the master LXSC table, there is no way for the sub-tables to access this master. Which brings me more explicitly to problem #2:


2) Multiple Files Augmenting the Same Table
Some of the files do not add a new table, but augment another existing table. For example:

    # lxsc.lua
    LXSC = {}
    require 'lib/scxml'
    require 'lib/runtime'

    # lib/scxml.lua
    LXSC.SCXML = {}

    # lib/runtime.lua
    function LXSC.SCXML:go() ... end -- modify a table defined elsewhere

If there was only one case like this, I could 'invert' the hierarchy like so:

    # lxsc.lua
    local LXSC = {}
    LXSC.SCXML = require 'lib/runtime'
    return LXSC

    # lib/runtime.lua
    local SCXML = require 'lib/scxml'
    function SCXML:go() ... end -- modify a table defined elsewhere
    return SCXML

    # lib/scxml.lua
    local SCXML = {}
    return SCXML

...but that technique does not work when multiple files augment the same datatype. Is there a pattern that allows multiple files to work on the same common table without polluting the global namespace?


3) One File Referencing Tables from Another
Various files and methods need to talk to other high-level objects in the system. For example:

    # lib/parse.lua
    function LXSC:parse(...) -- augments the master table
      local name = getSomeNameString()
      local object = LXSC[name]() -- dynamically picks methods from the master table
    end

    # lib/transition.lua
    function Transition:addTarget(...)
      self.targets = LXSC.List() -- accesses a datatype from the namespace
    end

How would I cause the master table to be exposed to each child file that needs to access it?

Is setfenv() the proper way to go about all this? Is there something better?



On Mon, Feb 18, 2013 at 2:43 PM, Gavin Kistner <[hidden email]> wrote:
On Feb 17, 2013, at 10:15 PM, Miles Bader <[hidden email]> wrote:

> Gavin Kistner <[hidden email]> writes:
>> * Adds only a single `SLAXML` key to the environment; there is no spam
>> of utility functions polluting the global namespace.
>
> It should not add _any_ keys to the global environment.
>
>> ## Usage
>>    require 'slaxml'
>
> local slaxml = require 'slaxml'

Thank you for the suggestion. I've updated the library to use this pattern.

This works fine for this project where I only have two files. How would others suggest enforcing the same pattern for a different project that has many files all augmenting the same table?

For example, in LXSC[1] I currently have 10+ files like so:

    lxsc.lua
    lib/event.lua
    lib/scxml.lua
    lib/state.lua
    lib/...etc...

and I use this to build the common object like so:

    # lxsc.lua
    LXSC = { VERSION="0.3.1" }
    require 'lib/state'
    require 'lib/scxml'
    require 'lib/event'
    ...etc.

    # lib/event.lua
    LXSC.Event = { ... }

    # lib/scxml.lua
    LXSC.SCXML = { ... }

    # lib/state.lua
    LXSC.State = { ... }

How might I modify this small multiple of files to work together in a way that doesn't spam the global namespace?

I can't do something like:

    # lxsc.lua
    local LXSC = { VERSION="0.3.1" }
    require 'lib/state'
    return LXSC

    # lib/state.lua
    local LXSC = require 'lxsc'
    LXSC.State = { ... }

…because that causes a loop in the loader. Should I instead do this?

    LXSC = { VERSION="0.3.1" } # Spam the global for now
    require 'lib/state'
    require 'lib/scxml'
    require 'lib/event'
    local LXSC = LXSC
    _G.LXSC = nil # remove the global spam
    return LXSC

Your experienced advice is requested :)


[1] https://github.com/Phrogz/LXSC

Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Kevin Martin
On 18 Feb 2013, at 17:46, Gavin Kistner wrote:

> Put more simply: how would you rewrite the following suite of files so that the user can "require 'master'" and not spam the global namespace with "MASTER", but still have all the assertions pass?

I'm not suggesting this is a good idea, but I think it's the simplest way to make it work without any code redesign.

Kev

-- master.lua
assert(package.preload["master.modtbl"] == nil)
package.preload["master.modtbl"] = function ()
        return {}
end
local MASTER = require("master.modtbl")

require 'simple'
require 'multi'
require 'shared1'
require 'shared2'
require 'reference'

return MASTER
-- multi.lua
local MASTER = require("master.modtbl")
MASTER.Multi1 = {}
MASTER.Multi2 = {}
-- reference.lua
local MASTER = require("master.modtbl")
function MASTER.Simple:ref1() return MASTER.Multi1 end
function MASTER.Simple:ref2() MASTER:simple()      end
-- shared1.lua
local MASTER=require("master.modtbl")
MASTER.Shared = {}
-- shared2.lua
local MASTER = require("master.modtbl")
function MASTER.Shared:go() end
-- simple.lua
local MASTER = require("master.modtbl")
MASTER.Simple = {}
function MASTER:simple() end
-- test.lua
do
        local MASTER = require 'master'

        assert(MASTER.Simple)
        assert(MASTER.simple)
        assert(MASTER.Shared)
        assert(MASTER.Shared.go)
        assert(MASTER.Simple.ref1()==MASTER.Multi1)
        assert(pcall(MASTER.Simple.ref2))
end

assert(MASTER == nil)


Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Bernd Eggink
In reply to this post by Gavin Kistner-4
On 18.02.2013 18:46, Gavin Kistner wrote:

> Put more simply: how would you rewrite the following suite of files so
> that the user can "require 'master'" and not spam the global namespace
> with "MASTER", but still have all the assertions pass?
>
>      ### _test_usage_.lua
>      require 'master'
>      assert(MASTER.Simple)
>      assert(MASTER.simple)
>      assert(MASTER.Shared)
>      assert(MASTER.Shared.go)
>      assert(MASTER.Simple.ref1()==MASTER.Multi1)
>      assert(pcall(MASTER.Simple.ref2))
>      ### master.lua
>      MASTER = {}
>      require 'simple'
>      require 'multi'
>      require 'shared1'
>      require 'shared2'
>      require 'reference'
>      ### simple.lua
>      MASTER.Simple = {}
>      function MASTER:simple() end
>      ### multi.lua
>      MASTER.Multi1 = {}
>      MASTER.Multi2 = {}
>      ### shared1.lua
>      MASTER.Shared = {}
>      ### shared2.lua
>      function MASTER.Shared:go() end
>      ### reference.lua
>      function MASTER.Simple:ref1() return MASTER.Multi1 end
>      function MASTER.Simple:ref2() MASTER:simple()      end
>

I use an "auto-require" mechanism for that. My master module contains a
table which maps each public function or variable to a source file. The
module table itself has a metatable with a __index function which looks
up the name and then automatically calls require(file). Apart from its
metatable, the master module is an empty table.
After setting this up, any submodule and any application only has to
require the master module explicitly. The rest is done automatically.

Here is skeleton of the master module:

------------------------------------------------
local M = {}
package.loaded[...] = M
local submodules = {
        foo = 'file1.lua',
        bar = 'file1.lua',
        baz = 'file2.lua',
        ...
}
setmetatable(M, { __index = function(tbl, key)
     local v = rawget(submodules, key)
     if v then
         require(v)
         return rawget(M, key)
     end
     error(key .. ' not found!')
end })
return M
------------------------------------------------

Of course this mechanism relies on the completeness of the 'submodules'
table. But I found that if you follow a certain coding discipline, the
complete master module can easily be created by a little Lua program.

Hope that helps.
Regards, Bernd
--
http://sudrala.de

Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Petite Abeille
In reply to this post by Gavin Kistner-4

On Feb 18, 2013, at 3:43 PM, Gavin Kistner <[hidden email]> wrote:

> This works fine for this project where I only have two files. How would others suggest enforcing the same pattern for a different project that has many files all augmenting the same table?

Perhaps you are over thinking it? :)

Keep in mind the fundamentals: a module is just a table. Nothing more, nothing less.

As soon as one finds a reference to a module table, one is free to manipulate it any way one sees fit.

This can be as simple as:

do
  local _ENV = require( 'mymodule' ) -- assuming 'mymodule' is defined somehow

  function bar() end -- add a new function to mymodule
end

Of course, one could do the reverse and define multiple modules in one file, for example:

do
  local _ENV = {}

  function foo() end

  package.loaded[ 'foo' ] = _ENV
end

do
  local _ENV = {}

  function bar() end

  package.loaded[ 'bar' ] = _ENV
end

do
  local _ENV = {}

  function baz() end

  package.loaded[ 'baz' ] = _ENV
end

Avoiding a 'return' in the module definition is rather handy as it allows one to combine multiple modules in one physical file, be it Lua code or bytecode, while coding or while deploying, or any combination thereof.

As an added bonus, defining modules in terms of _ENV enforces locality.

As this is/was a rather common pattern in 5.1, courtesy of the now deprecated 'module' function, one can of course reimplement it in 5.2 as well:

local function module( aName, ... )
  local aModule = package.loaded[ assert( aName ) ]

  if type( aModule ) ~= 'table' then
    aModule = {}
    aModule._M = aModule
    aModule._NAME = aName
    aModule._PACKAGE = aName:sub( 1, aName:len() - ( aName:reverse():find( '.', 1, true ) or 0 ) )

    package.loaded[ aName ] = aModule

    for anIndex = 1, select( '#', ... ) do
      select( anIndex, ... )( aModule )
    end
  end

  return aModule
end

Then one can define modules in 5.2 with all the convenience and features of 5.1, and then some:

local _ENV = module( 'dwim' )



FWIW, here is an example of the above principles at work:

http://alt.textdrive.com/svn/altdev/Mail/DB.lua

But sadly, now, in 5.2, every Tom, Dick, and Harry has to reimplement and reinvent their own half-baked way to handle module definitions. A sad step back from 5.1 where 'module' naturally, meaningfully, and predictably, complemented 'require'. Oh well, every one is entitled to their own mistakes. Even Lua's authors.





Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Gavin Kistner-4
In reply to this post by Kevin Martin
Here's what I ended up going with, as it only requires a change to the master file. I'd be interested in comments on why this might be a bad idea.

    ### master.lua
    local m = {}
    local env = getfenv(0)
    setfenv(0,setmetatable({MASTER=m},{__index=env}))

    require 'simple'
    require 'multi'
    require 'shared1'
    require 'shared2'
    require 'reference'
    
    setfenv(0,env)
    return m

All other files are unchanged, except the usage itself which gets a `local MASTER = require 'master'` 

Written up at:
http://stackoverflow.com/q/14942472/405017



On Feb 18, 2013, at 11:42 AM, Kevin Martin <[hidden email]> wrote:

On 18 Feb 2013, at 17:46, Gavin Kistner wrote:

> Put more simply: how would you rewrite the following suite of files so that the user can "require 'master'" and not spam the global namespace with "MASTER", but still have all the assertions pass?

I'm not suggesting this is a good idea, but I think it's the simplest way to make it work without any code redesign.

Kev

-- master.lua
assert(package.preload["master.modtbl"] == nil)
package.preload["master.modtbl"] = function ()
return {}
end
local MASTER = require("master.modtbl")

require 'simple'
require 'multi'
require 'shared1'
require 'shared2'
require 'reference'

return MASTER
-- multi.lua
local MASTER = require("master.modtbl")
MASTER.Multi1 = {}
MASTER.Multi2 = {}
-- reference.lua
local MASTER = require("master.modtbl")
function MASTER.Simple:ref1() return MASTER.Multi1 end
function MASTER.Simple:ref2() MASTER:simple() end
-- shared1.lua
local MASTER=require("master.modtbl")
MASTER.Shared = {}
-- shared2.lua
local MASTER = require("master.modtbl")
function MASTER.Shared:go() end
-- simple.lua
local MASTER = require("master.modtbl")
MASTER.Simple = {}
function MASTER:simple() end
-- test.lua
do
local MASTER = require 'master'

assert(MASTER.Simple)
assert(MASTER.simple)
assert(MASTER.Shared)
assert(MASTER.Shared.go)
assert(MASTER.Simple.ref1()==MASTER.Multi1)
assert(pcall(MASTER.Simple.ref2))
end

assert(MASTER == nil)


Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Petite Abeille

On Feb 18, 2013, at 9:34 PM, Gavin Kistner <[hidden email]> wrote:

>     local env = getfenv(0)
>     setfenv(0,setmetatable({MASTER=m},{__index=env}))

Oh my… as Rumpelstiltskin is found to say:  "All magic comes with a price!"


Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Tomás Guisasola-2
In reply to this post by Gavin Kistner-4
  Hi Gavin

On Mon, 18 Feb 2013, Gavin Kistner wrote:

> Here's what I ended up going with, as it only requires a change to the master
> file. I'd be interested in comments on why this might be a bad idea.
>     ### master.lua
>     local m = {}
>     local env = getfenv(0)
>     setfenv(0,setmetatable({MASTER=m},{__index=env}))
>
>     require 'simple'
>     require 'multi'
>     require 'shared1'
>     require 'shared2'
>     require 'reference'
>     
>     setfenv(0,env)
>     return m
>
> All other files are unchanged, except the usage itself which gets a `local
> MASTER = require 'master'` 
>
> Written up at:
> http://stackoverflow.com/q/14942472/405017
  Maybe I've missed something, but why don't you just:

return {
  simple = require"simple",
  multi = require 'multi',
  shared1 = require 'shared1',
  shared2 = require 'shared2',
  reference = require 'reference',
}

  Of course you'll end up with a hierarchical structure, which
couldn't be what you want.  But I am sure it is worth it!

  Regards,
  Tomás
Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Andrew Starks
This very issue was a huge problem for me. How do I make do module dependency in a post module era?? What I came up with was the following...

mymodule/init.lua
mymodule/_ENV.lua
mymodule/submodule/init.lua
mymodule/submodule/etc...

_ENV.lua has all of my common dependancies, such as all of the standard lua libraries (coroutine, table, string, etc, pairs, ipairs, etc). It DOES NOT have any "mymodule" files loaded into it, however. So, a typical module starts with:

_ENV = require'mymoule._ENV'

Now I know I have the same common set of dependencies. This works really well for almost everything. The "almost" part comes when I want to test a submodule from its own file, when that testing requires the loading of the main module. I can't remember, but I think my solution was to not do that anymore...

Anyway, the other thing to keep in mind is the case where you want "self" to be loaded into a sub-module. In that case, what I do is make the return value of my submodule a function, which accepts self. I don't make it accept an _ENV, because I want to keep that to the standard _ENV that I use. This idiom goes something like:
--a.b.lua

_ENV = require"a._ENV"

return function(self)
  assert(self.bar, "bar needs to be set in self, of course!")

  local b = {foo = "baz" .. self.bar = "dizzle" }
  return b
end

-----------

when I call it, I use:

self.b = require'a.b'(self)

----
I could have set `self.b` directly, but I like to always return a table with the new values, just to keep it relatively consistent, even when I'm chaining the function call to the return of require...


Without these tricks, which I may have been badly explaining, I would get constant require-loops, "index to self (a nil value)", and other such crappy-ness. Now it all works and nothing is in the global table!
  


Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Andrew Starks
I'm sorry for the double-post-spam, but I read over my last email, and
it was pretty much garbage. I struggled with this issue so much that
I'm going to repost with more verbosity and clarity. Also, I can't
wait to hear other solutions, because I (despite heroic googling)
could not find adequate answers to my issues, other than figuring it
out on my lonesome...

The premise is that you want to make a many-file module, with some of
the files in submodules or otherwise hitherandyon. You do not want to
pollute the global environment and you want to steer clear of
deprecated facilities, such as module.

The first thing to do is to get rid of the global environment
altogether. Remembering that it's only available as long as you don't
reset _ENV, make that your statement! However, constantly re-typing
all of the things that you need from lua into each file is also a
giant, error-prone pain. So, make one file in your module's base
directory the home for your common environment. I call it "_ENV.lua".

Note: You cannot use "init.lua" for this madness, because you need to
be able to load it from your submodules, which are being loaded by
init.lua, which loads your submodules, which are...

My _ENV.lua file, at the risk of being unhelpfully wordy, is the following:

````lua
--_ENV.lua
_ENV = {
        type = type,  pairs = pairs,  ipairs = ipairs,  next = next,  print =
print,  require = require,
        io = io,  table = table,  string = string,  setmetatable =
setmetatable,  assert = assert,
        rawget = rawget, rawset = rawset, error = error,  getmetatable = getmetatable,
        package = package, load = load, loadfile=loadfile, coroutine = coroutine,
        setmetatable = setmetatable, tostring = tostring, tonumber = tonumber;
        pcall = pcall, xpcall = xpcall, select = select,
        lxp = require"lxp", lfs = require"lfs",
        socket = require("socket"), lpeg = require'lpeg',
}

_ENV = require'pl.import_into'(_ENV)
_ENV = tablex.merge(_ENV,{ P = lpeg.P, S = lpeg.S, R = lpeg.R, C =
lpeg.C, Cs = lpeg.Cs,
        Ct = lpeg.Ct, Cf = lpeg.Cf,  Cg = lpeg.Cg,  Cmt = lpeg.Cmt, V = lpeg.V,
        Cc = lpeg.Cc,
        ts = pretty.write, pt = function(t) print(ts(t)) return t end
}, true)

string.strip = stringx.strip
lpeg.locale(lpeg)
eol = P'\r\n' + P'\r' + P'\n'; space = P' '; white_space = space + eol + P'\t'
printf = function(s, ...)  print(string.format(s, ...)) end

return _ENV
````

--[[A shout out to Steve's Penlight package, which makes life
wonderful and to Roberto's LPeg, which has turned regex-hell into a
readable and productive endeavor. I even wrote a validating XML parser
in it, but that's completely unrelated to this thread. :)]]

Anyway, with this file, I now have a common base from which to work.
All of my other modules load this first, using the following command:

     _ENV = require'trms._ENV' --where trms is the base of my module.

This facility was critical for me, for two reasons. First, it kept me
out of global space. If I saw that I was missing something from _G
(took me a surprisingly long time before I saw that I didn't have
tostring!), I would just go back into my _ENV.lua file and add it. As
a required file, this only gets loaded one time, so having it applied
to all of my submodules is 0 calories.

Second, I found it gave me everything that I really needed for using
the "return module as table" protocol, with only a few exceptions. The
biggest exception was when I needed tighter interdependancy between
submodules. This issue has nothing to do with `module` or _ENV... It's
common to all approaches that:

    function my_module:my_method()

....doesn't work when `my_module` was defined in another file. For
example, LuaExpat needs callbacks. These callbacks need access to some
kind of document table. You can put it into the callback, but I didn't
want to do that. Putting all of the callbacks into my my `_init`
method violated my fragile sense of decency. So I passed self into my
`callback.lua` file, which starts off like this:

````callbacks.lua
_ENV = require'trms._ENV'
return function(self)
        local callbacks = {}
        callbacks.StartElement =  function(parser, elementName, attributes)
                local res = {}
                local stack = self.stack
        ---awesome stuff for about 150 lines...
        return callbacks
end
````
To use this, I simply call the return value of require when assigning
the callback table to my parent module, like so:

    --back in init.lua
    self.callbacks = require'trms.xml.callbacks'(self)

Now, I get my callbacks table out of my initializer and `callbacks`
has access to the state, as it needs to.

Most often, I don't need to do this. If I'm passing state or self
between submodules, I'm almost always doing it wrong. My internal
policy is that if I'm doing something that is highly-related to
another file, I *might* be okay. More likely, I'm putting something in
the wrong spot and there is a way to do it without passing anything
between modules.

The one limitation that I don't have an answer for is loading a parent
module from its child. I mean, I could, if I wanted to spend way too
much time figuring it out, but I gave up and now I don't do it. I just
test from the root module responsible for loading the others.

This all works really well for me, it's organized and I don't think
about it anymore. Hopefully I'm not missing some huge problem and if I
am, I'm ready to accept the bad news (and then ignore it until my
world falls apart). I'd also love to hear other solutions to module
management in the 5.2 era.

-Andrew Starks

Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Gavin Kistner-4
In reply to this post by Petite Abeille
On Feb 18, 2013, at 1:53 PM, Petite Abeille <[hidden email]> wrote:
> On Feb 18, 2013, at 9:34 PM, Gavin Kistner <[hidden email]> wrote:
>>    local env = getfenv(0)
>>    setfenv(0,setmetatable({MASTER=m},{__index=env}))
>
> Oh my… as Rumpelstiltskin is found to say:  "All magic comes with a price!"

This sounds like you are making a criticism, but I can't figure out the particulars. If you have a point to make, I'd be very interested in more details about what "price" I am paying that I may not be aware of.

If, on the other hand, you are just making a joke, that is also fine. :)
Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Andrew Starks
In reply to this post by Gavin Kistner-4
On Mon, Feb 18, 2013 at 11:46 AM, Gavin Kistner <[hidden email]> wrote:

> Put more simply: how would you rewrite the following suite of files so that
> the user can "require 'master'" and not spam the global namespace with
> "MASTER", but still have all the assertions pass?
>
>     ### _test_usage_.lua
>     require 'master'
>
>     assert(MASTER.Simple)
>     assert(MASTER.simple)
>     assert(MASTER.Shared)
>     assert(MASTER.Shared.go)
>     assert(MASTER.Simple.ref1()==MASTER.Multi1)
>     assert(pcall(MASTER.Simple.ref2))
>

Having read the thread after I gave my answer, sigh.... I like Petite
Abeille's answer the most, although it requires a rewrite of
everything I"ve done.

Using my method, I've come up with a solution to your extremely
out-there example:

--_test_usage_.lua
print(package.path)
package.path = package.path .. ';./?.lua;./?/init.lua'
local MASTER = require 'master'

assert(MASTER.Simple)
assert(MASTER.simple)
assert(MASTER.Shared)
assert(MASTER.Shared.go)
assert(MASTER.Simple.ref1()==MASTER.Multi1)
assert(pcall(MASTER.Simple.ref2))

print(MASTER)
print(_G.MASTER)

-->  table: 0x7ff613408e70
-->  nil

--master.lua
local MASTER = {}
MASTER = require 'simple'(MASTER)  --you MADE ME DO THIS!! Yuck.
MASTER = require 'multi'(MASTER)
MASTER.Shared = require'shared1'
MASTER = require'shared2'(MASTER)
MASTER = require 'reference'(MASTER)

return MASTER
--simple.lua
return function(MASTER)
        MASTER.Simple = {}
        function MASTER:simple() end
        return MASTER
end
--multi.lua
return function(MASTER)
        MASTER.Multi1 = {}
        MASTER.Multi2 = {}
        return MASTER
end
--reference.lua
return function(MASTER)
        function MASTER.Simple:ref1() return MASTER.Multi1 end
        function MASTER.Simple:ref2() MASTER:simple()      end
        return MASTER
end
--shared2.lua
return function(MASTER)
        function MASTER.Shared:go() end
        return MASTER
end

--shared1.lua
return {}

-----------

It's one of those compact examples that was carefully crafted to cover
every use case and, as a result, left me angry when I was done solving
it. I now need to shower, although I mean no disrespect to your
illustrating example.

That is to say, if you think my solution looks worse than yours,
consider that your example is purpose-conceived to be difficult, not
useful in real life. In real life, doing these kinds of things is just
bad design, although one or two in of each in a single project is
certainly possible.

-Andrew

Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Gavin Kistner-4
On Feb 18, 2013, at 8:26 PM, Andrew Starks <[hidden email]> wrote:

> On Mon, Feb 18, 2013 at 11:46 AM, Gavin Kistner <[hidden email]> wrote:
>> Put more simply: how would you rewrite the following suite of files so that
>> the user can "require 'master'" and not spam the global namespace with
>> "MASTER", but still have all the assertions pass?
>
> Having read the thread after I gave my answer, sigh.... I like Petite
> Abeille's answer the most, although it requires a rewrite of
> everything I"ve done.

Thank you very much for your contributions. I'd be interested in your response to my current solution as described here:
http://stackoverflow.com/questions/14942472/create-suite-of-interdependent-lua-files-without-affecting-the-global-namespace



> MASTER = require 'simple'(MASTER)  --you MADE ME DO THIS!! Yuck.

LOL...and yes, yuck. :) Modifying each sub-file to be a function seems rather beastly, but it certainly does cleanly allow the shared master table to be passed around.

> It's one of those compact examples that was carefully crafted to cover
> every use case and, as a result, left me angry when I was done solving
> it. I now need to shower, although I mean no disrespect to your
> illustrating example.

:D


> That is to say, if you think my solution looks worse than yours,
> consider that your example is purpose-conceived to be difficult, not
> useful in real life. In real life, doing these kinds of things is just
> bad design, although one or two in of each in a single project is
> certainly possible.

For what it's worth, it is decidedly compact, but it does actually exactly cover the specific problems I faced with my then-current implementation for my project. It was truly real-world.

I am not averse to the suggestion that perhaps the way in which I broke up the files was bad design, though I'm not seeing specifics of where I've made things too tightly coupled or such. If you like, you're welcome to browse the pre-local implementation here:

https://github.com/Phrogz/LXSC/tree/2062ca9c11520c591c218d88a204d38d13369ea3/lib


Thanks again for your feedback.
Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Andrew Starks
On Mon, Feb 18, 2013 at 9:45 PM, Gavin Kistner <[hidden email]> wrote:

> Thank you very much for your contributions. I'd be interested in your response to my current solution as described here:
> http://stackoverflow.com/questions/14942472/create-suite-of-interdependent-lua-files-without-affecting-the-global-namespace
>
>

I think it's solid, but it's also deprecated in 5.2. My understanding
of the reasons were that getfenv violated CSCI's agreement that scope
should be sacred. Lua 5.2 exchanged this magic (hid it, actually) for
a clever twist (different than magic), which was _ENV. Since _ENV
can't be applied to a function that was already defined, it doesn't
really solve all of your problems.

I mean it *could*, but you're back to passing _ENV into the submodule
and then you may as well do it my way, so that you can also benefit
from the goodness and bounty that the  '_ENV = require"mymodule._ENV"'
construct provides: a common module platform from which to work and
protection for the user's global table, explicitly.

But, you do get huge bonus points for cleverness, and I mean that with
all sincerity. It took me a good bit of time, just staring at it,
before I understood it. AND, I didn't feel like I needed *another*
shower. :)

-Andrew

Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

steve donovan
In reply to this post by Gavin Kistner-4
On Tue, Feb 19, 2013 at 4:38 AM, Gavin Kistner <[hidden email]> wrote:
> If, on the other hand, you are just making a joke, that is also fine. :)

PA likes making jokes, but they usually have a point ;)

One consequence of setfenv magic is that everything becomes slower,
because you now have indirect lookup for everything (that's another
criticisim of module(...,package.seeall) apart from allowing all your
laundry to be exposed).

The pattern that Tomas gives puts everything explicitly into the
master table. Generally the idea is to keep as much in locals as
possible - which is BTW the reason I'm not so keen on the _ENV 5.2
construction.

I've found myself doing things like this to split a module over
several files (useful if there's extra functionality you might want to
load)

--foo.lua
local foo = {}

function foo.answer() ... end
....
return foo

--foo.extra.lua
local foo = require 'foo'

function foo.more() ... end

end

return foo

I keep to the 'no magic' style because it discriminates equally
against 5.1 and 5.2 ;)

steve d.

Reply | Threaded
Open this post in threaded view
|

Re: local "module" pattern without module [was Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor]

Gavin Kistner-4
On Feb 18, 2013, at 11:02 PM, steve donovan <[hidden email]> wrote:
> One consequence of setfenv magic is that everything becomes slower,
> because you now have indirect lookup for everything (that's another
> criticisim of module(...,package.seeall) apart from allowing all your
> laundry to be exposed).

Understood. I have benchmarking in place that shows no measurable performance degradation for my usage, though I do cache certain library functions in locals anyhow already.

> I've found myself doing things like this to split a module over
> several files (useful if there's extra functionality you might want to
> load)
>
> --foo.lua
> local foo = {}
>
> function foo.answer() ... end
> ....
> return foo
>
> --foo.extra.lua
> local foo = require 'foo'
>
> function foo.more() ... end
>
> end
>
> return foo

This pattern is dandy. I use it myself in SLAXML:

    --> slaxml.lua <--
    local SLAXML = { ... }
    ...
    return SLAXML

    --> slaxdom.lua <--
    local SLAXML = require 'slaxml'
    function SLAXML:dom(xml,opts) ... end
    return SLAXML

This way the user can require just the core SAX parser or the DOM (which uses the SAX).

However, unless I'm missing something, this only works with strict (inverted) hierarchies of dependencies. It falls down if you two files that both extend the same base file and the user should be able to get code from both.
12