Pure Lua XML parser

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Pure Lua XML parser

peterhickman
After playing around with the BadgerFish implementation mentioned in a previous post I asked myself "how hard would it be to implement an XML parser in Lua?" This is the result, here http:// peterhi.dyndns.org/plxml/index.html

It reads well formed XML strings and creates a nested table structure that does not lose any data or structural information from the input.

<?xml version="1.0"?><doc><!-- test --><person><forename>fred</ forename><br/>smith</person><green id="1"/></doc>

Becomes

return {
  data={
    { data="version=\"1.0\"", name="xml", type="pi" },
    {
      data={
        { data="test", type="comment" },
        {
          data={
{ data={ { data="fred", type="text" } }, name="forename", type="element" },
            { data={  }, name="br", type="element" },
            { data="smith", type="text" }
          },
          name="person",
          type="element"
        },
{ attributes="id=\"1\"", data={ }, name="green", type="element" }
      },
      name="doc",
      type="element"
    }
  },
  type="root"
}

which can be walked over with a walk method to manipulate the tree. It's more fully explained, with some examples, on the web page.

Now at this point it is just a programming exercise for me, I have always meant to try and write my own XML parser and I will probably do most of the things on the to do list. Is this something that other people will be interested in, that is should I turn the code into production code with tests, performance and coverage reviews and all the usual fun and games?

--
If a pickpocket meets a saint, he sees only his pockets


Reply | Threaded
Open this post in threaded view
|

Re: Pure Lua XML parser

Petite Abeille

On Dec 16, 2007, at 5:31 PM, peterhickman wrote:

After playing around with the BadgerFish implementation mentioned in a previous post I asked myself "how hard would it be to implement an XML parser in Lua?"

BadgerFish is a lossy, but rather very convenient, mapping convention. It's not a parser.

http://badgerfish.ning.com/

As far as XML handling in Lua, check the wiki for various implementation examples:

http://lua-users.org/wiki/LuaXml

Cheers,

PA.



Reply | Threaded
Open this post in threaded view
|

Re: Pure Lua XML parser

Ken Smith-2
In reply to this post by peterhickman
On Dec 16, 2007 8:31 AM, peterhickman <[hidden email]> wrote:
> After playing around with the BadgerFish implementation mentioned in
> a previous post I asked myself "how hard would it be to implement an
> XML parser in Lua?" This is the result, here http://
> peterhi.dyndns.org/plxml/index.html

I was looking for something like this recently.  In particular, I need
something that can take tabular data and dump it as XML.  Thanks for
doing it for me!  In your to do list, you mention handling > symbols.
If you do this work before I do, don't forget < (&lt;), & (&amp;), "
(&quot;), and ' (&#39;).

My particular need is to fabricate XML out of thin air rather than
read and modify a source document.  How would you recommend I do so in
order to preserve the well-formedness of the Lua table?

   Thanks,
   Ken Smith

Reply | Threaded
Open this post in threaded view
|

Re: Pure Lua XML parser

peterhickman

On 16 Dec 2007, at 6:04 , Ken Smith wrote:

On Dec 16, 2007 8:31 AM, peterhickman <[hidden email]> wrote:
After playing around with the BadgerFish implementation mentioned in
a previous post I asked myself "how hard would it be to implement an
XML parser in Lua?" This is the result, here http://
peterhi.dyndns.org/plxml/index.html

I was looking for something like this recently.  In particular, I need
something that can take tabular data and dump it as XML.  Thanks for
doing it for me!  In your to do list, you mention handling > symbols.
If you do this work before I do, don't forget < (&lt;), & (&amp;), "
(&quot;), and ' (&#39;).

The problem is that I am using the naked > (as opposed to the &gt;) to mark the end of a token. The following is valid XML but will choke my parser.

<?xml version="1.0?>
<doc>
  <p class="bro>ken">Hi there</p>
</doc>

The > in the attribute value is perfectly valid XML, if a little unusual, but my parser will parse it as

PI: <?xml version="1.0"?>
ELEMENT: <doc>
ELEMENT: <p class="bro>
TEXT: ken">Hi there
ELEMENT: </p>
ELEMENT: </doc>

Which is completely wrong. It's fixable but at present it is an example of a shortcoming in the design.


My particular need is to fabricate XML out of thin air rather than
read and modify a source document.  How would you recommend I do so in
order to preserve the well-formedness of the Lua table?

Presently my code provides no constructors for new nodes, the likes of makepi() are there to deconstruct the input string into a table. The problem is that I am so used to OO programing that getting my head around how to do things without OO is difficult. I keep thinking "all we need is an add method", but the tables are not objects. Of course I just need to get my head around Lua OO and I will have that sorted.


   Thanks,
   Ken Smith

--
Java: There is only so much you can do in a straight jacket


Reply | Threaded
Open this post in threaded view
|

Re: Pure Lua XML parser

Petite Abeille

On Dec 16, 2007, at 10:05 PM, peterhickman wrote:

The > in the attribute value is perfectly valid XML, if a little unusual

XML, noun

A magic elixir of legend, claiming to solve all problems while inevitably exacting an ironic cost.

“Once we drink the XML and take care of a few minor things — parser, DTD, entification, well-formed-ness, validation, namespaces, I18N, XSL transformations, schemas — all will be peaceful in the kingdom!”

http://www.eod.com/devil/archive/xml.html




Reply | Threaded
Open this post in threaded view
|

Re: Pure Lua XML parser

peterhickman

On 16 Dec 2007, at 9:52 , Petite Abeille wrote:


On Dec 16, 2007, at 10:05 PM, peterhickman wrote:

The > in the attribute value is perfectly valid XML, if a little unusual

XML, noun

A magic elixir of legend, claiming to solve all problems while inevitably exacting an ironic cost.

“Once we drink the XML and take care of a few minor things — parser, DTD, entification, well-formed-ness, validation, namespaces, I18N, XSL transformations, schemas — all will be peaceful in the kingdom!”

http://www.eod.com/devil/archive/xml.html




Thank god there is Java to make things easier :)

--
Punishment worse than the crime - Java



Reply | Threaded
Open this post in threaded view
|

RE: Pure Lua XML parser

Gavin Kistner-2
In reply to this post by peterhickman
From: peterhickman
> After playing around with the BadgerFish implementation mentioned in  
> a previous post I asked myself "how hard would it be to implement an  
> XML parser in Lua?" This is the result, here http:// 
> peterhi.dyndns.org/plxml/index.html

See this YAPLXMLP thread:
http://lua-users.org/lists/lua-l/2006-02/msg00264.html