Request for advice: pure Lua Library to parse mail messages.

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Request for advice: pure Lua Library to parse mail messages.

Lorenzo Donati-3
Hi list!

I need to extract some information from some mail messages. Is there
some pure Lua library that can help me in the process?

The requirements are more or less the following:

* Pure Lua. Possibly simple and lightweight. Maybe short enough to be
embedded in a Lua script or anyway to reside in a single file side to
side to my script.

* Reliable, well-tested and foolproof. I don't know much about all the
RFCs that comprise the mail message format, but the library API should
be easy enough to let me extract the content of any header field and any
text part of the message. I have little time and expertise to cope with
corner cases where the library could fail because of bugs.

* It should handle quoted-printable encoding. In particular, it should
be able to convert from quoted-printable to UTF-8 automatically. I don't
strictly need other encodings, but also converting to Windows CP-1252
would be a bonus.

* It doesn't need to be able handle all MIME types. Just text (both
plain and html).


*It should be able to enumerate every part of a multipart message with
its "local header" fields.

* MIT or similar license (no copy-left hassle). I don't mean to publish
the code but I wouldn't want to have something in my code-base that
needs tracking for the future (besides having a MIT license boilerplate
text inside, of course).


Ideally what I'd like to do is this:

- Reading the message source saved manually from Thunderbird mail client.

- Use the library to enumerate all the header fields and choose the ones
I need. Ideally I would need a Lua table of header field names vs. their
text content.

- Find the message part(s) I need (in a multipart message). The
selection would be made primarily using the MIME type of the part (I
just need access to the text/plain and text/html parts) and their
position in the multipart message.

- Get the decoded UTF-8 content of the message part(s) as a Lua string,
on which I would perform custom processing.

I think I could implement what I want to do directly easily without a
library except the quoted-printable decoding part. But I know little
about the mail format, except a quick glimpse on the related Wikipedia
articles, so I fear I could botch something obvious by simply creating
an ad-hoc "parser", and I don't have much time for this little project.

TIA for any useful advice and hint.

Cheers!

-- Lorenzo.





Reply | Threaded
Open this post in threaded view
|

Re: Request for advice: pure Lua Library to parse mail messages.

Sean Conner
It was thus said that the Great Lorenzo Donati once stated:
> Hi list!
>
> I need to extract some information from some mail messages. Is there
> some pure Lua library that can help me in the process?

  That is a tall order, and I doubt you'll get all what you want in a "pure"
Lua library (more about this below).

> * Pure Lua. Possibly simple and lightweight. Maybe short enough to be
> embedded in a Lua script or anyway to reside in a single file side to
> side to my script.

  I have code to parse email headers [1], but

        1. it's nearly 700 lines of code;
        2. it's GPL, so it fails your "no copy-left hassle" test;
        3. it's mostly LPEG, so it fails your "pure Lua" test.
        4. it doesn't handle quoted-printable [2][3].

> * Reliable, well-tested and foolproof. I don't know much about all the
> RFCs that comprise the mail message format, but the library API should
> be easy enough to let me extract the content of any header field and any
> text part of the message. I have little time and expertise to cope with
> corner cases where the library could fail because of bugs.

  There are quite a number of RFCs actually---I reference 14 different RFCs
in my code, and there might be new ones since I wrote the code.

> * It should handle quoted-printable encoding. In particular, it should
> be able to convert from quoted-printable to UTF-8 automatically. I don't
> strictly need other encodings, but also converting to Windows CP-1252
> would be a bonus.

  This is the biggest issue you'll have.  Handling quoted-printable isn't
that bad in and of itself, but converting everything to UTF-8 will be a
monumental task in pure Lua.  Personally, for a task like this, I would use
iconv (I know it as a GNU library to do character set conversions and I am
unaware of any non-GNU library that does the same).

> I think I could implement what I want to do directly easily without a
> library except the quoted-printable decoding part. But I know little
> about the mail format, except a quick glimpse on the related Wikipedia
> articles, so I fear I could botch something obvious by simply creating
> an ad-hoc "parser", and I don't have much time for this little project.

  Parsing email is tricker than expected.  First, the header names have a
canonical form, but ideally you need to compare them case-insensitive, so
the following are all the same:

        From: [hidden email]
        FROM: [hidden email]
        fRoM: [hidden email]
        froM: [hidden email]

  (NOTE:  each line is *supposed* to end with a CR and LF; I ended up having
to scan for an optional CR and a mandatory LF) Second, a header line can
span multiple lines---subsequent lines start with whitespace (space or
tabs):

        Comment: This is a comment
        COMMENT: So
                is this.
        cOmMeNt: And
         this is
                a
         comment
        commenT: And so am I.

  The headers are separated from the body by a blank line, so the worse (for
multiple line headers) is something like:

        FROM:
         [hidden email]
        to:
         [hidden email]
        sUbJeCt:
         This
                is
         a subject line

        The body of the message goes here.

  Also, each header has a specific format, which goes to explain why my code
is nearly 700 lines long (email addresses are particularly hairy to parse).

> TIA for any useful advice and hint.

  Parsing email with pure Lua---possible, but I wouldn't want to do it.
Convering character sets in pure Lua---theorectically possible but good luck
in finding pure Lua code to do that.

  -spc (There's a reason I used LPEG for this ... )

[1] https://github.com/spc476/LPeg-Parsers/blob/master/email.lua

[2] There is a form of quotable-printable for use in headers (which is
        what I'm thinking of as I write this)---handling quotable-printable
        in the body is *not* a conern of my code, which mostly deals with
        headers.  And it doesn't support the header form of
        quotable-printable.

[3] I suppose I could, but *I* would require the use of iconv in
        addition to LPEG.  Also, not everyone follows the letter of the RFCs
        (headers are *supposed* to be ASCII-only).
Reply | Threaded
Open this post in threaded view
|

Re: Request for advice: pure Lua Library to parse mail messages.

Lorenzo Donati-3
Thank you very much for all the hints!

On 20/07/2020 22:26, Sean Conner wrote:
> It was thus said that the Great Lorenzo Donati once stated:
>> Hi list!
>>
>> I need to extract some information from some mail messages. Is there
>> some pure Lua library that can help me in the process?
>
>   That is a tall order, and I doubt you'll get all what you want in a "pure"
> Lua library (more about this below).
>

I had a bad hunch about this. That's why asked on the list hoping to
have better insight. *sigh*

>> * Pure Lua. Possibly simple and lightweight. Maybe short enough to be
>> embedded in a Lua script or anyway to reside in a single file side to
>> side to my script.
>
>   I have code to parse email headers [1], but
>
> 1. it's nearly 700 lines of code;
> 2. it's GPL, so it fails your "no copy-left hassle" test;
> 3. it's mostly LPEG, so it fails your "pure Lua" test.
> 4. it doesn't handle quoted-printable [2][3].
>
>> * Reliable, well-tested and foolproof. I don't know much about all the
>> RFCs that comprise the mail message format, but the library API should
>> be easy enough to let me extract the content of any header field and any
>> text part of the message. I have little time and expertise to cope with
>> corner cases where the library could fail because of bugs.
>
>   There are quite a number of RFCs actually---I reference 14 different RFCs
> in my code, and there might be new ones since I wrote the code.
>

14!? Ouch!


>> * It should handle quoted-printable encoding. In particular, it should
>> be able to convert from quoted-printable to UTF-8 automatically. I don't
>> strictly need other encodings, but also converting to Windows CP-1252
>> would be a bonus.
>
>   This is the biggest issue you'll have.  Handling quoted-printable isn't
> that bad in and of itself, but converting everything to UTF-8 will be a
> monumental task in pure Lua.  Personally, for a task like this, I would use
> iconv (I know it as a GNU library to do character set conversions and I am
> unaware of any non-GNU library that does the same).
>
>> I think I could implement what I want to do directly easily without a
>> library except the quoted-printable decoding part. But I know little
>> about the mail format, except a quick glimpse on the related Wikipedia
>> articles, so I fear I could botch something obvious by simply creating
>> an ad-hoc "parser", and I don't have much time for this little project.

[snip]

>   Also, each header has a specific format, which goes to explain why my code
> is nearly 700 lines long (email addresses are particularly hairy to parse).
>
>> TIA for any useful advice and hint.
>
>   Parsing email with pure Lua---possible, but I wouldn't want to do it.
> Convering character sets in pure Lua---theorectically possible but good luck
> in finding pure Lua code to do that.
>
>   -spc (There's a reason I used LPEG for this ... )
>
> [1] https://github.com/spc476/LPeg-Parsers/blob/master/email.lua
>
> [2] There is a form of quotable-printable for use in headers (which is
> what I'm thinking of as I write this)---handling quotable-printable
> in the body is *not* a conern of my code, which mostly deals with
> headers.  And it doesn't support the header form of
> quotable-printable.
>
> [3] I suppose I could, but *I* would require the use of iconv in
> addition to LPEG.  Also, not everyone follows the letter of the RFCs
> (headers are *supposed* to be ASCII-only).
>


Very useful insight. I know understand that looking for a pre-made pure
Lua library is not really an option probably.

Fortunately for my use case I don't need to handle any possible field
and any possible format, since I would be parsing mails from a very
specific sender, whose mails are automatically generated.

To be more explicit, the sender is Amazon order system. I wanted to make
a simple script that automates what I do manually now, i.e. extracting
order information and put them in a text file for easy tracking and
reference.

As I said in my previous post, the easiest path seems to parse the
message source, looking for the text/plain part, and extract what I need
from there.

I already examined some sources and the template they follow seems quite
parsable with plain Lua code once the message part is extracted.

The biggest hurdle for me are:

(1) the automatic identification and extraction of the right message
part, since I knew almost nothing about mail format quirks (and there
are many, as you confirmed). That's why I hoped there would be a library
that did that for me. As you confirmed, this is probably not an option.

(2) decode from quoted-printable to UTF-8 or CP-1252 (I'm on Windows).


For (1) I think I could try and look for the `Content-Type:` field,
which is always `multipart/alternative;` and contains a `boundary`
placeholder which separates messages parts.

Given that the message is automatically generated and doesn't seem to
sport a lot of variation in its template, I guess a simple pattern
search should be ok.

Once the boundary marker has been inferred, I'd scan the rest of the
message for the first part that has a text/plain content type.

That seems reasonable (I hope).


(2) Is a biggie, though. Bear in mind that I don't need full UTF-8
support, because the message part I'm looking for seems to contain only
latin-1 characters, so they are all in the Unicode Basic Multilingual
Plane (that's why I would be content with a CP-1252 encoding as well; I
would prefer UTF-8 because it's nicer and interoperable, though :-)

Anyway, since the data I'm looking for is mostly numerical, I could also
live with some data loss in the few textual data I need (if a product
description contained, say, a chinese character, I would happily skip
it). So maybe I have hope there is some simpler library (or algorithm)
that covers that.

Otherwise I think I have to dumb down the process and instead of
handling the message source, I have to copy-paste the text directly from
my mail client, which is a slower and more error-prone process though.

Thanks again!

Cheers!

-- Lorenzo












Reply | Threaded
Open this post in threaded view
|

Re: Request for advice: pure Lua Library to parse mail messages.

Sean Conner
It was thus said that the Great Lorenzo Donati once stated:
> Thank you very much for all the hints!

  You're welcome.

> On 20/07/2020 22:26, Sean Conner wrote:
> >  There are quite a number of RFCs actually---I reference 14 different RFCs
> >in my code, and there might be new ones since I wrote the code.
>
> 14!? Ouch!

  14.  But given what you are trying to parse (order information from
Amazon) that number goes down quite a bit.  Let's see ... at the most basic
level you need RFC-5322 for the general format for email headers, and
RFC-2045 to RFC-2049 for the MIME stuff, so half a dozen.  Yes, it's a bit
of slog, but it does explain the format.

  Of the 14, some are older versions that RFC-5322 updates, you have one
dealing with Usenet (which used a lot of email headers in addition to its
own), mailing list headers (yes, they got their own RFCs) and some
additional email headers added over the years.  

> Fortunately for my use case I don't need to handle any possible field
> and any possible format, since I would be parsing mails from a very
> specific sender, whose mails are automatically generated.

  Then they'll stand a very good chance of being well formed.  Thank God for
small favors.

> For (1) I think I could try and look for the `Content-Type:` field,
> which is always `multipart/alternative;` and contains a `boundary`
> placeholder which separates messages parts.
>
> Given that the message is automatically generated and doesn't seem to
> sport a lot of variation in its template, I guess a simple pattern
> search should be ok.
>
> Once the boundary marker has been inferred, I'd scan the rest of the
> message for the first part that has a text/plain content type.
>
> That seems reasonable (I hope).

  At the very least I would scan through the RFCs I listed above.  I have
some email from Amazon about some orders I placed earlier this year, and in
the header section I find the following three headers:

MIME-Version: 1.0
Content-Type: multipart/alternative;
        boundary="----=_Part_16209100_796398164.1588360430199"
Content-Length: 1690

(not necessarily in that order mind you!).  You can see the Content-Type:
contains the message boundary (and it doesn't always have to be quoted---fun
times, yo).  Each section will then be separated by the boundary string,
prefixed with two '--'.  So the above bounary will actually appear as:

------=_Part_16209100_796398164.1588360430199

and at the end, it will appear with two '--' at the end, like this:

------=_Part_16209100_796398164.1588360430199--

  That said, the message I have from Amazon only has one section and thus,
no boundary actually appears.  Instead, the main body of the email contains
the following two headers for the one section:

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

  Somehow, I didn't get the quoted-printable formating.  Go figure (I guess
because it's all English text, which fits in the 7-bit ASCII range so it's
not needed).  It's the valid variations like these that "quick-and-dirty"
parsing is prone to break [1] (and yes, I can sympathize with you wanting a
library to handle all this for you).

> (2) Is a biggie, though. Bear in mind that I don't need full UTF-8
> support, because the message part I'm looking for seems to contain only
> latin-1 characters, so they are all in the Unicode Basic Multilingual
> Plane (that's why I would be content with a CP-1252 encoding as well; I
> would prefer UTF-8 because it's nicer and interoperable, though :-)

  It may very well be in UTF-8.  It should have the characterset encoding
listed in the headers somewhere.
 
> Anyway, since the data I'm looking for is mostly numerical, I could also
> live with some data loss in the few textual data I need (if a product
> description contained, say, a chinese character, I would happily skip
> it). So maybe I have hope there is some simpler library (or algorithm)
> that covers that.

  I wish you well.

  -spc

[1] I recently wrote an HTML parser using LPEG.  I started out with a
        "quick-n-dirty" one but quickly realized I was going to be worse off
        than with a proper parser.  So I broke out the DTD [2] for the
        version of HTML I had to parse, and wrote one [3].  Works perfectly,
        handles the optional closing tags (and the one opening tag).  It
        helped that all the HTML I need to parse is well formed and
        validated.

[2] Docuemnt Type Definition

[3] Two actually---I started out using the re module from LPEG and that
        hit some limitations, so I switch to actual LPEG.
Reply | Threaded
Open this post in threaded view
|

Re: Request for advice: pure Lua Library to parse mail messages.

Lorenzo Donati-3
On 21/07/2020 10:51, Sean Conner wrote:

> It was thus said that the Great Lorenzo Donati once stated:
>> Thank you very much for all the hints!
>
>   You're welcome.
>
>> On 20/07/2020 22:26, Sean Conner wrote:
>>>  There are quite a number of RFCs actually---I reference 14 different RFCs
>>> in my code, and there might be new ones since I wrote the code.
>>
>> 14!? Ouch!
>
>   14.  But given what you are trying to parse (order information from
> Amazon) that number goes down quite a bit.  Let's see ... at the most basic
> level you need RFC-5322 for the general format for email headers, and
> RFC-2045 to RFC-2049 for the MIME stuff, so half a dozen.  Yes, it's a bit
> of slog, but it does explain the format.
>
>   Of the 14, some are older versions that RFC-5322 updates, you have one
> dealing with Usenet (which used a lot of email headers in addition to its
> own), mailing list headers (yes, they got their own RFCs) and some
> additional email headers added over the years.
>
>> Fortunately for my use case I don't need to handle any possible field
>> and any possible format, since I would be parsing mails from a very
>> specific sender, whose mails are automatically generated.
>
>   Then they'll stand a very good chance of being well formed.  Thank God for
> small favors.
>
>> For (1) I think I could try and look for the `Content-Type:` field,
>> which is always `multipart/alternative;` and contains a `boundary`
>> placeholder which separates messages parts.
>>
>> Given that the message is automatically generated and doesn't seem to
>> sport a lot of variation in its template, I guess a simple pattern
>> search should be ok.
>>
>> Once the boundary marker has been inferred, I'd scan the rest of the
>> message for the first part that has a text/plain content type.
>>
>> That seems reasonable (I hope).
>
>   At the very least I would scan through the RFCs I listed above.

I'll try when I have time. Too optimistically *grin* I hoped mail format
was simple enough not to require so much specification scanning in
advance. :-)

> I have
> some email from Amazon about some orders I placed earlier this year, and in
> the header section I find the following three headers:
>
> MIME-Version: 1.0
> Content-Type: multipart/alternative;
>         boundary="----=_Part_16209100_796398164.1588360430199"
> Content-Length: 1690
>

The first two matches exactly what I've got. Content-length is missing.

> (not necessarily in that order mind you!).  You can see the Content-Type:
> contains the message boundary (and it doesn't always have to be quoted---fun
> times, yo).  Each section will then be separated by the boundary string,
> prefixed with two '--'.  So the above bounary will actually appear as:
>
> ------=_Part_16209100_796398164.1588360430199
>
> and at the end, it will appear with two '--' at the end, like this:
>
> ------=_Part_16209100_796398164.1588360430199--
>

That was something I already inferred by visually scanning some of those
messages. Thanks for confirming my guess.

>   That said, the message I have from Amazon only has one section and thus,
> no boundary actually appears.  Instead, the main body of the email contains
> the following two headers for the one section:
>
> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: 7bit
>
>   Somehow, I didn't get the quoted-printable formating.  Go figure (I guess
> because it's all English text, which fits in the 7-bit ASCII range so it's
> not needed).

Yes, I guess most probably it's because my messages are in Italian. I have:

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


> It's the valid variations like these that "quick-and-dirty"
> parsing is prone to break [1] (and yes, I can sympathize with you wanting a
> library to handle all this for you).
>
>> (2) Is a biggie, though. Bear in mind that I don't need full UTF-8
>> support, because the message part I'm looking for seems to contain only
>> latin-1 characters, so they are all in the Unicode Basic Multilingual
>> Plane (that's why I would be content with a CP-1252 encoding as well; I
>> would prefer UTF-8 because it's nicer and interoperable, though :-)
>
>   It may very well be in UTF-8.  It should have the characterset encoding
> listed in the headers somewhere.
>
>> Anyway, since the data I'm looking for is mostly numerical, I could also
>> live with some data loss in the few textual data I need (if a product
>> description contained, say, a chinese character, I would happily skip
>> it). So maybe I have hope there is some simpler library (or algorithm)
>> that covers that.
>
>   I wish you well.

Thanks!

>
>   -spc
>
> [1] I recently wrote an HTML parser using LPEG.  I started out with a
> "quick-n-dirty" one but quickly realized I was going to be worse off
> than with a proper parser.  So I broke out the DTD [2] for the
> version of HTML I had to parse, and wrote one [3].  Works perfectly,
> handles the optional closing tags (and the one opening tag).  It
> helped that all the HTML I need to parse is well formed and
> validated.

I wish I had time to learn to use LPEG. I gave it a go a couple of times
in past decade, but it's theoretical background is way over my head to
be "grokked" in a couple of days. I have little formal education in
compiler and grammar theory, and I realize having a firm understanding
of how a formal grammar "behaves" really would help understanding LPEG
and how to use it for practical tasks.

I /can/ read the EBNF form of a grammar and reason about it in a
practical way, but I really can't /design/ a grammar to do what I want,
and that would help a lot to use LPEG effectively, I guess.

So every time I gave up for lack of time and I forgot almost everything
I learned. I found it has quite a steep learning curve, alas. I also
tried a small tutorial written by Gavin Wright (IIRC), but it wasn't
enough to bring me to that "AHA!" moment when you really grasp how to
use the tool effectively.


>
> [2] Docuemnt Type Definition
>
> [3] Two actually---I started out using the re module from LPEG and that
> hit some limitations, so I switch to actual LPEG.
>

-- Lorenzo
Reply | Threaded
Open this post in threaded view
|

Re: Request for advice: pure Lua Library to parse mail messages.

Petite Abeille
In reply to this post by Lorenzo Donati-3


> On Jul 20, 2020, at 17:19, Lorenzo Donati <[hidden email]> wrote:
>
> The requirements are more or less the following:

As mentioned by Sean, let us know once you find that mythical MIME parser :)

Meanwhile, perhaps a combination of reformime, reformail, addrlist, and iconv may get you through the day.


http://manpages.ubuntu.com/manpages/trusty/man1/reformime.1.html
http://manpages.ubuntu.com/manpages/xenial/en/man1/reformail.1.html
https://cr.yp.to/immhf/addrlist.html

Reply | Threaded
Open this post in threaded view
|

Re: Request for advice: pure Lua Library to parse mail messages.

Lorenzo Donati-3
On 21/07/2020 16:26, Petite Abeille wrote:

>
>
>> On Jul 20, 2020, at 17:19, Lorenzo Donati
>> <[hidden email]> wrote:
>>
>> The requirements are more or less the following:
>
> As mentioned by Sean, let us know once you find that mythical MIME
> parser :)
>

Well, in my ignorance (before Sean's hints) I believed such a thing
existed. Nice to see that even this new millennium has its mythical
beasts, though :-D


> Meanwhile, perhaps a combination of reformime, reformail, addrlist,
> and iconv may get you through the day.
>
>
> http://manpages.ubuntu.com/manpages/trusty/man1/reformime.1.html
> http://manpages.ubuntu.com/manpages/xenial/en/man1/reformail.1.html
> https://cr.yp.to/immhf/addrlist.html
>
>

Ouch, really too much effort for a supposedly small project aimed at
automating some boring manual process. Moreover, as I get it, those are
*nix tools, whereas I'm on Windows (7). Maybe there is a Windows port,
but this hunting for tools is beginning to appear a time-sucker, which I
don't wont to fall into. I'd never get that time back from the script I
intended to create and in itself is not even a very fun project.

Thanks for the info, anyway. Always good to have nice pointers.

Cheers!

-- Lorenzo
Reply | Threaded
Open this post in threaded view
|

Re: Request for advice: pure Lua Library to parse mail messages.

Sean Conner
In reply to this post by Lorenzo Donati-3
It was thus said that the Great Lorenzo Donati once stated:

> On 21/07/2020 10:51, Sean Conner wrote:
> >[1] I recently wrote an HTML parser using LPEG.  I started out with a
> > "quick-n-dirty" one but quickly realized I was going to be worse off
> > than with a proper parser.  So I broke out the DTD [2] for the
> > version of HTML I had to parse, and wrote one [3].  Works perfectly,
> > handles the optional closing tags (and the one opening tag).  It
> > helped that all the HTML I need to parse is well formed and
> > validated.
>
> I wish I had time to learn to use LPEG. I gave it a go a couple of times
> in past decade, but it's theoretical background is way over my head to
> be "grokked" in a couple of days.

  I'm sad to hear that, because I found LPEG to be *way* easier to use than
the old lex and yacc (or flex and bison as the modern replacements) which
were even *more* theoretical in nature (I always hated those "shift and
reduce errors" from yacc).

> I have little formal education in
> compiler and grammar theory, and I realize having a firm understanding
> of how a formal grammar "behaves" really would help understanding LPEG
> and how to use it for practical tasks.

  If you can use Lua patterns (or general regex) I think you can learn LPEG.
Yes, there's a bit of a learning curve, but I don't think it's that big, and
I don't think you need any formal education to understand it.

> I /can/ read the EBNF form of a grammar and reason about it in a
> practical way, but I really can't /design/ a grammar to do what I want,
> and that would help a lot to use LPEG effectively, I guess.

  Well, the RFCs do give BNF for the headers, so you aren't entirely left to
your own devices.  Most will even collect all the BNF in a section at the
end so you don't have to page around the document trying to find all the
rules.

> So every time I gave up for lack of time and I forgot almost everything
> I learned. I found it has quite a steep learning curve, alas. I also
> tried a small tutorial written by Gavin Wright (IIRC), but it wasn't
> enough to bring me to that "AHA!" moment when you really grasp how to
> use the tool effectively.

  Anything you can do with Lua patterns you can do with LPEG (there was a
thread on this mailing list a few years ago about that).  But the neat thing
about LPEG is that you can construct a "pattern" from smaller pieces.
You can see all that if you check out my LPEG parsers repo:

        https://github.com/spc476/LPeg-Parsers

but you can also check out some simplified examples in another repo I have:

        https://github.com/spc476/LPeg-talk

  For that one, I would go through the examples in the order listed.
date1.lua is about as simple as they come, a pattern to match a date like
"Wed, 2 Dec 2015 20:51:17 +0100".

  Anyway, I blather ...

  -spc