Can LPeg parse PNG?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Can LPeg parse PNG?

Soni "They/Them" L.
Here's a little challenge for y'all: Can good ol' LPeg parse the 4
critical chunks of a PNG?

I was told PNG is a regular language, in the sense that you can validate
any PNG chunk with a VERY VERY VERY LONG regex.

Well, LPeg is better than regular, so it might be able to do it better
than regex. Can LPeg beat regex at PNG validation?

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: Can LPeg parse PNG?

Duncan Cross
On Sat, Dec 10, 2016 at 12:10 AM, Soni L. <[hidden email]> wrote:
> I was told PNG is a regular language, in the sense that you can validate any
> PNG chunk with a VERY VERY VERY LONG regex.

Including the 32-bit CRC at the end of the chunk? That's got to be a
hell of a regex.

-Duncan

Reply | Threaded
Open this post in threaded view
|

Re: Can LPeg parse PNG?

KHMan
On 12/10/2016 9:46 AM, Duncan Cross wrote:
> On Sat, Dec 10, 2016 at 12:10 AM, Soni L. wrote:
>> I was told PNG is a regular language, in the sense that you can validate any
>> PNG chunk with a VERY VERY VERY LONG regex.
>
> Including the 32-bit CRC at the end of the chunk? That's got to be a
> hell of a regex.

I'm really curious about the "I was told PNG is a regular
language" though. It directs the blame for the idea on an
anonymous someone else.

Perhaps he has VERY VERY VERY LARGE amounts of free time this
weekend, or perhaps he wants to trap some unsuspecting folks
trying it. I think both.

--
Cheers,
Kein-Hong Man (esq.)
Selangor, Malaysia


Reply | Threaded
Open this post in threaded view
|

Re: Can LPeg parse PNG?

Soni "They/Them" L.
In reply to this post by Duncan Cross


On 09/12/16 11:46 PM, Duncan Cross wrote:
> On Sat, Dec 10, 2016 at 12:10 AM, Soni L. <[hidden email]> wrote:
>> I was told PNG is a regular language, in the sense that you can validate any
>> PNG chunk with a VERY VERY VERY LONG regex.
> Including the 32-bit CRC at the end of the chunk? That's got to be a
> hell of a regex.
>
> -Duncan
>

Only 256^2^2^32 alternations or something. While not practical, it is,
mathematically speaking, possible. However, that's just for the raw
chunks - regex wouldn't work for chunk contents or compression, which's
why this is an LPeg challenge, not a regex challenge.

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: Can LPeg parse PNG?

Sean Conner
In reply to this post by KHMan
It was thus said that the Great KHMan once stated:

> On 12/10/2016 9:46 AM, Duncan Cross wrote:
> >On Sat, Dec 10, 2016 at 12:10 AM, Soni L. wrote:
> >>I was told PNG is a regular language, in the sense that you can validate
> >>any
> >>PNG chunk with a VERY VERY VERY LONG regex.
> >
> >Including the 32-bit CRC at the end of the chunk? That's got to be a
> >hell of a regex.
>
> I'm really curious about the "I was told PNG is a regular
> language" though. It directs the blame for the idea on an
> anonymous someone else.
>
> Perhaps he has VERY VERY VERY LARGE amounts of free time this
> weekend, or perhaps he wants to trap some unsuspecting folks
> trying it. I think both.

  LPeg isn't the right tool for this job though.  Just plain Lua is good
enough:

        function png(name)
          local f = io.open(name,"rb")
          if not f then
            return nil
          end

          local hdr = f:read(8)
          if hdr ~= "\137PNG\13\10\26\10" then
            return nil
          end

          local function next(png)
            local ch = {}
           
            ch.len = png:read(4)
            if ch.len == nil then
              return nil
            end
           
            ch.len  = string.unpack(">I4",ch.len)
            ch.type = png:read(4)
            ch.data = png:read(ch.len)
            ch.crc  = string.unpack(">I4",png:read(4))
           
            -- ----------------------------------------------------
            -- The CRC check is left as an exercise for the reader
            -- ----------------------------------------------------
           
            ch.critical = ch.type:match("^%u")    ~= nil
            ch.private  = ch.type:match("^.%l")   ~= nil
            ch.ignore   = ch.type:match("^..%l")  ~= nil
            ch.safecopy = ch.type:match("^...%l") ~= nil
           
            return ch
          end

          return next,f
        end

        for chunk in png("somepngfile.png") do
          -- process each chunk
        end

But yes, it *can* be done in LPeg (probably took me an hour or so):

        local lpeg = require "lpeg"
        local Cmt  = lpeg.Cmt
        local Ct   = lpeg.Ct
        local Cg   = lpeg.Cg
        local Cb   = lpeg.Cb
        local Cc   = lpeg.Cc
        local C    = lpeg.C
        local R    = lpeg.R
        local P    = lpeg.P

        -- ---------------------------------------------------------------
        -- Convert 4-byte network ordered values.  If using Lua 5.1 or Lua
        -- 5.2, change the call to string.unpack() as needed.
        -- ---------------------------------------------------------------

        local value = P(4)
                     / function(len)
                         return string.unpack(">I4",len)
                       end

        -- ----------------------------------------------------------------
        -- PNGs are comprised of chunks of typed data.  The name of a chunk
        -- is four ASCII characters in the range of A-Z or a-z.  The case of
        -- each letter also indicates some other features of that chunk (see
        -- https://en.wikipedia.org/wiki/Portable_Network_Graphics for
        -- details).  First, we check the case of each letter in the ID for
        -- the flags.
        -- ----------------------------------------------------------------
       
        local type1 = R"AZ" * Cg(Cc(true),'critical')
                    + R"az" * Cg(Cc(false),'critical')

        local type2 = R"AZ" * Cg(Cc(false),'private')
                    + R"az" * Cg(Cc(true),'private')

        local type3 = R"AZ" * Cg(Cc(false),'ignore')
                    + R"az" * Cg(Cc(true),'ignore')

        local type4 = R"AZ" * Cg(Cc(false),'safecopy')
                    + R"az" * Cg(Cc(true),'safecopy')
                   
        -- ----------------------------------------------------------------
        -- To check the type, we first scan the four bytes and set the
        -- flags, then, using a match-time capture, we grab the characters
        -- we just scanned and return them as well.
        -- ----------------------------------------------------------------

        local type  = type1 * type2 * type3 * type4
                    * Cg(Cmt(Cc(true),function(subject,pos,_)
                        return pos,subject:sub(pos - 4,pos - 1)
                      end),'type')

        -- -----------------------------------------------------------------
        -- To obtain the data, we need the length.  Fortunately, by the time
        -- this is called, we've already captured the length in a named
        -- capture.  We use lpeg.Cb() to retrieve the previously calculated
        -- length, and by using a match-time capture, snarf the appropriate
        -- amount of data and return it.
        -- -----------------------------------------------------------------

        local data  = Cmt(Cb('length'),function(subject,pos,capture)
                        return pos + capture,subject:sub(pos,pos + capture - 1)
                      end)

        local chunk = Cg(value,'length')
                    * type
                    * Cg(data,'data')
                    * Cg(value,"crc")
                   
        -- ----------------------------------------------------------------
        -- The CRC check is left as an exercise for the reader (and yes, it
        -- can be done via LPeg patterns and a function---hint: lpeg.Cmt()
        -- with lpeg.Cb())
        -- ----------------------------------------------------------------        

        local header = P"\137PNG\13\10\26\10"
        local png    = header * Ct((Ct(chunk))^1)

        local f = io.open("somepngfile.png","rb")
        local chunks = png:match(f:read("*a"))
        f:close()
       
  The major problem I see for using LPeg is that you need to load the entire
file first before processing.  That shouldn't be an issue these days, but
it's hard to do stream processing with LPeg.  I'm not saying it can't be
done, but LPeg really doesn't buy you much over straight Lua.

  I think the major reason for the difficulty in using a regex is that the
amount of data is dependent not upon a pattern but a set value that is part
of the data itself (which is why a used a match-time capture of the data).

  -spc (Enhancements can include checking for required and known chunks, but
        again, I'm leaving that as an exercise for the reader; I've spent
        enough time on this already)
       

Reply | Threaded
Open this post in threaded view
|

Re: Can LPeg parse PNG?

Sean Conner
It was thus said that the Great Sean Conner once stated:
> It was thus said that the Great KHMan once stated:
> > On 12/10/2016 9:46 AM, Duncan Cross wrote:
> > >On Sat, Dec 10, 2016 at 12:10 AM, Soni L. wrote:
> > >>I was told PNG is a regular language, in the sense that you can validate
> > >>any
> > >>PNG chunk with a VERY VERY VERY LONG regex.

  Well, I reread the original "challenge" and okay, I need to validate the
four critical chunks using LPeg.  Okay, not an issue.  Change:

        local type  = #-P"IEND"
                    * type1 * type2 * type3 * type4
                    * Cg(Cmt(Cc(true),function(subject,pos,_)
                        return pos,subject:sub(pos - 4,pos - 1)
                      end),'type')

        local png    = header * Ct(Ct(IHDR) * (Ct(PLTE + chunk))^1 * Ct(IEND))

then add the following to parse IHDR, PLTE and IEND:

        local bits  = P"\1"  * Cc( 1)
                    + P"\2"  * Cc( 2)
                    + P"\4"  * Cc( 4)
                    + P"\8"  * Cc( 8)
                    + P"\16" * Cc(16)
                   
        local color = P"\0" * Cc "greyscale"
                    + P"\2" * Cc "RGB"
                    + P"\3" * Cc "palette"
                    + P"\4" * Cc "greyscale-alpha"
                    + P"\6" * Cc "RGB-alpha"

        local compression = P"\0" * Cc "deflate"
        local filter      = P"\0" * Cc "adaptive"
        local interlace   = P"\0" * Cc "none"
                          + P"\1" * Cc "Adam7"

        local IHDR = P"\0\0\0\13IHDR"
                   * Cg(Cc "IHDR",'type')
                   * Cg(Cc(true),'critical')
                   * Cg(Cc(false),'private')
                   * Cg(Cc(false),'ignore')
                   * Cg(Cc(false),'safecopy')
                   * Cg(value4,'width')
                   * Cg(value4,'height')
                   * Cg(bits,'bits')
                   * Cg(color,'color')
                   * Cg(compression,'compression')
                   * Cg(filter,'filter')
                   * Cg(interlace,'interlace')
                   * Cg(value4,'crc')
                   * Cmt(
                          Cb('color') * Cb('bits') * Cb('crc'),
                          function(subject,pos,color,bits,crc)
                            -- check crc, skipped for now
                            if color == 'greyscale' then
                              return pos
                            end
                           
                            if color == 'RGB' and bits == 8 or bits == 16 then
                              return pos
                            end
                           
                            if color == 'palette' and bits ~= 16 then
                              return pos
                            end
                           
                            if color == 'greyscale-alpha'
                            and bits == 8 or bits == 16 then
                              return pos
                            end
                           
                            if color == 'RGB-alpha'
                            and bits == 8 or bits == 16 then
                              return pos
                            end
                          end
                        )

        local PLTE = Cg(value,'length')
                   * Cg(P"PLTE",'type')
                   * Cg(data,'data')
                   * Cg(value,'crc')
                   * Cmt(
                          Cb('length') * Cb('crc'),
                          function(subject,pos,length,data,crc)

                            if length % 3 ~= 0 then
                              return nil
                            end
                           
                            -- --------------------------------------------
                            -- pull the colortype and bits right out of the
                            -- data stream.  It's there in the subject,
                            -- might as well take advantage of it being
                            -- here.
                            -- --------------------------------------------
             
                            local colortype = subject:sub(26):byte()
                            local bits      = subject:sub(25):byte()
                           
                            if colortype == 0 or colortype == 4 then
                              return nil
                            end
                     
                            if length // 3 > 2^bits then
                              return nil
                            end

                            return pos
                          end
                        )

        local IEND = P"\0\0\0\0IEND"
                   * Cg(Cc "IEND",'type')
                   * Cg(Cc(true),'critical')
                   * Cg(Cc(false),'private')
                   * Cg(Cc(false),'ignore')
                   * Cg(Cc(false),'safecopy')
                   * Cg(value4,'crc')

  There's not much to validating IDAT---you can either decompress the data,
or you can't.  But hey, there's enough here to add explicit code for IDAT.

  -spc (There, enough LPeg for the day ... )

Reply | Threaded
Open this post in threaded view
|

Re: Can LPeg parse PNG?

Dirk Laurie-2
2016-12-10 6:56 GMT+02:00 Sean Conner <[hidden email]>:

>   Well, I reread the original "challenge" and okay, I need to validate the
> four critical chunks using LPeg.

You had a pure Lua solution. That is trivially an LPeg solution:
pattern captures all its input, passes the capture to a Lua function.

Reply | Threaded
Open this post in threaded view
|

Re: Can LPeg parse PNG?

Sean Conner
It was thus said that the Great Dirk Laurie once stated:
> 2016-12-10 6:56 GMT+02:00 Sean Conner <[hidden email]>:
>
> >   Well, I reread the original "challenge" and okay, I need to validate the
> > four critical chunks using LPeg.
>
> You had a pure Lua solution. That is trivially an LPeg solution:
> pattern captures all its input, passes the capture to a Lua function.

  Well, you know ... moving goal posts and all that.

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: Can LPeg parse PNG?

Soni "They/Them" L.
In reply to this post by KHMan


On 10/12/16 12:12 AM, KHMan wrote:

> On 12/10/2016 9:46 AM, Duncan Cross wrote:
>> On Sat, Dec 10, 2016 at 12:10 AM, Soni L. wrote:
>>> I was told PNG is a regular language, in the sense that you can
>>> validate any
>>> PNG chunk with a VERY VERY VERY LONG regex.
>>
>> Including the 32-bit CRC at the end of the chunk? That's got to be a
>> hell of a regex.
>
> I'm really curious about the "I was told PNG is a regular language"
> though. It directs the blame for the idea on an anonymous someone else.
>
> Perhaps he has VERY VERY VERY LARGE amounts of free time this weekend,
> or perhaps he wants to trap some unsuspecting folks trying it. I think
> both.
>

Each PNG chunk is a finite sequence of bytes. Thus, a long enough regex
can parse a stream of PNG chunks.

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: Can LPeg parse PNG?

KHMan
On 12/10/2016 7:18 PM, Soni L. wrote:

> On 10/12/16 12:12 AM, KHMan wrote:
>> On 12/10/2016 9:46 AM, Duncan Cross wrote:
>>> On Sat, Dec 10, 2016 at 12:10 AM, Soni L. wrote:
>>>> I was told PNG is a regular language, in the sense that you
>>>> can validate any
>>>> PNG chunk with a VERY VERY VERY LONG regex.
>>>
>>> Including the 32-bit CRC at the end of the chunk? That's got to
>>> be a
>>> hell of a regex.
>>
>> I'm really curious about the "I was told PNG is a regular
>> language" though. It directs the blame for the idea on an
>> anonymous someone else.
>>
>> Perhaps he has VERY VERY VERY LARGE amounts of free time this
>> weekend, or perhaps he wants to trap some unsuspecting folks
>> trying it. I think both.
>
> Each PNG chunk is a finite sequence of bytes. Thus, a long enough
> regex can parse a stream of PNG chunks.

Sure, sure.

Oh, don't mind me, I'm just shooting the breeze here. If I sounded
too negative, it's because I'm a dinosaur in Internet years and I
mostly sit back and gawk at the younger folks who enthusiastically
apply themselves to interesting problems such as this one... :-)
;-) It's a generation gap thing, no biggie.

--
Cheers,
Kein-Hong Man (esq.)
Selangor, Malaysia


Reply | Threaded
Open this post in threaded view
|

Re: Can LPeg parse PNG?

Dirk Laurie-2
2016-12-10 15:09 GMT+02:00 KHMan <[hidden email]>:

> On 12/10/2016 7:18 PM, Soni L. wrote:
>>
>> On 10/12/16 12:12 AM, KHMan wrote:
>>>
>>> On 12/10/2016 9:46 AM, Duncan Cross wrote:
>>>>
>>>> On Sat, Dec 10, 2016 at 12:10 AM, Soni L. wrote:
>>>>>
>>>>> I was told PNG is a regular language, in the sense that you
>>>>> can validate any
>>>>> PNG chunk with a VERY VERY VERY LONG regex.
>>>>
>>>>
>>>> Including the 32-bit CRC at the end of the chunk? That's got to
>>>> be a
>>>> hell of a regex.
>>>
>>>
>>> I'm really curious about the "I was told PNG is a regular
>>> language" though. It directs the blame for the idea on an
>>> anonymous someone else.
>>>
>>> Perhaps he has VERY VERY VERY LARGE amounts of free time this
>>> weekend, or perhaps he wants to trap some unsuspecting folks
>>> trying it. I think both.
>>
>>
>> Each PNG chunk is a finite sequence of bytes. Thus, a long enough
>> regex can parse a stream of PNG chunks.
>
>
> Sure, sure.
>
> Oh, don't mind me, I'm just shooting the breeze here. If I sounded too
> negative, it's because I'm a dinosaur in Internet years and I mostly sit
> back and gawk at the younger folks who enthusiastically apply themselves to
> interesting problems such as this one... :-) ;-) It's a generation gap
> thing, no biggie.

A gap of many centuries, in fact, when the burning question
"Can several angels be in the same place?" was hotly debated.