LPeg question: parsing CSV

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

LPeg question: parsing CSV

Ken Smith-2
First, my environment.  I don't patch Lua or modify it other than to
patch the build to generate MacOS universal binaries.

   Lua 5.1.2
   LPeg 0.7
   Darwin 9.0.0

I would like to use LPeg for parsing CSV files and was delighted to
see that there are even two examples for doing so right in the
documentation.  However, I'm having difficulty using them and would
like some advice on how to proceed.  Please consider this example.
I'm using the two CSV recipes unmodified, directly from the
documentation for LPeg 0.7.


local lines =
{
   'somethingin,alllower',
   'SomethingIn,CamelCase',
   'SOMETHING_IN,ALLCAPS',
}

require('re')
require('lpeg')

record_re = re.compile[[
   record <- ( field (',' field)* ) -> {} ('\n' / !.)
   field <- escaped / nonescaped
   nonescaped <- { [^,"\n]* }
   escaped <- '"' {~ ([^"] / '""' -> '"')* ~} '"'
]]

local field =
   '"' * lpeg.Cs(((lpeg.P(1) - '"') + lpeg.P'""' / '"')^0) * '"' +
  lpeg.C((1 - lpeg.S',\n"')^0)

record_lpeg = field * (',' * field)^0 * (lpeg.P'\n' + -1)

for k,impl in ipairs{'record_re', 'record_lpeg'} do
   print('Implementation: ' .. impl)
   local record = _G[impl]
   for i,line in ipairs(lines) do
      io.write('Attempting to match "' .. line .. '": ')
      local m = record:match(line)
      if type(m) == 'table' then
         print('match succeeded')
         for j,v in ipairs(m) do
            print(j,v)
         end
      else
         print('match failed')
         print(tostring(m))
      end
   end
   print('')
end


When I run this program, I get the following output.


Implementation: record_re
Attempting to match "somethingin,alllower": match failed
nil
Attempting to match "SomethingIn,CamelCase": match failed
nil
Attempting to match "SOMETHING_IN,ALLCAPS": match succeeded
1       SOMETHING_IN
2       ALLCAPS

Implementation: record_lpeg
Attempting to match "somethingin,alllower": match failed
somethingin
Attempting to match "SomethingIn,CamelCase": match failed
SomethingIn
Attempting to match "SOMETHING_IN,ALLCAPS": match failed
SOMETHING_IN


In the first case, I seem to get the fields only when the row is in
all capitals.  I discovered this by accident when I ran record_re on a
CSV file which contains lower case letters only in the first line, the
remainder of the file being parsed without errors.

In the second case, I expect to receive a table from the match but get
only the first field as a string.

I tried messing with record_re to get it to take lower case letters
using the Wikipedia article on CSV and RFC 4180 for reference.  I
can't come up with a good reason why it fails.

This is my first foray with LPeg and PEG in general.  Any comments or
criticisms appreciated.

   Ken Smith

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question: parsing CSV

Duncan Cross
> In the second case, I expect to receive a table from the match but get
> only the first field as a string.

You should use lpeg.Ct instead of lpeg.C for this.

(Unfortunately I haven't used the "re" module, so I've no idea about
the case problem)

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question: parsing CSV

Ken Smith-2
On Nov 16, 2007 1:04 PM, Duncan Cross <[hidden email]> wrote:
> > In the second case, I expect to receive a table from the match but get
> > only the first field as a string.
>
> You should use lpeg.Ct instead of lpeg.C for this.

Thanks for the response.  When I change

lpeg.C((1 - lpeg.S',\n"')^0)

to

lpeg.Ct((1 - lpeg.S',\n"')^0)

I get a table from record:match but the table is empty.  Is this not
what you meant?

   Ken

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question: parsing CSV

Duncan Cross
On Nov 16, 2007 11:33 PM, Ken Smith <[hidden email]> wrote:
> On Nov 16, 2007 1:04 PM, Duncan Cross <[hidden email]> wrote:
> > > In the second case, I expect to receive a table from the match but get
> > > only the first field as a string.
> >
> > You should use lpeg.Ct instead of lpeg.C for this.
>
> Thanks for the response.  When I change
>
> lpeg.C((1 - lpeg.S',\n"')^0)
>
> to
>
> lpeg.Ct((1 - lpeg.S',\n"')^0)
>
> I get a table from record:match but the table is empty.  Is this not
> what you meant?

Sorry, my fault, I misread - try putting lpeg.Ct( ... ) around the
following bit of record_lpeg, instead:

field * (',' * field)^0

Reply | Threaded
Open this post in threaded view
|

Re: LPeg question: parsing CSV

Ken Smith-2
On Nov 16, 2007 3:52 PM, Duncan Cross <[hidden email]> wrote:
>
> On Nov 16, 2007 11:33 PM, Ken Smith <[hidden email]> wrote:
> > On Nov 16, 2007 1:04 PM, Duncan Cross <[hidden email]> wrote:
> > > > In the second case, I expect to receive a table from the match but get
> > > > only the first field as a string.
> > >
> > > You should use lpeg.Ct instead of lpeg.C for this.
> >
> > Thanks for the response.  When I change
> >
> > lpeg.C((1 - lpeg.S',\n"')^0)
> >
> > to
> >
> > lpeg.Ct((1 - lpeg.S',\n"')^0)
> >
> > I get a table from record:match but the table is empty.  Is this not
> > what you meant?
>
> Sorry, my fault, I misread - try putting lpeg.Ct( ... ) around the
> following bit of record_lpeg, instead:
>
> field * (',' * field)^0

Beauty.  That does the trick.  Thank you.

   Ken