A Lua pattern question about optional patterns

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

A Lua pattern question about optional patterns

Sean Conner

  Normally I reach for LPEG to handle my parsing chores, but I have a
project where that ... well, I'd be reimplementing a form of regex anyway,
so why not use Lua patterns and avoid that mess.

  But I have an issue that I do not know the answer to---I suspect there
isn't an answer but "use a real regex or LPEG".  But I thought I would ask
anyway.

  I have a pattern that looks like this BNF:

        TEXT = 'A' - 'Z' / 'a' - 'z'
        DIGIT = '0' - '9'

        pattern = 1*TEXT [ ';' 1*DIGIT ]

  In English, text, optionally followed by a semicolon and some digits.  So
some valid examples:

        foo
        foo;1
        foo;444

  Invalid examples are

        foo;
        foo23

  If there's a semicolon, it must be followed by a digit; if there's no
semicolon, no digits.  This is trivial (to me) in LPEG.  No so with Lua
patterns.  There's:

        "%a%;?(%d+)

but that allows "foo23" to slip through.  What I would like would be:

        %a(%;(%d+))?

although that doesn't work, since the '?' can (if I'm reading the
documentation right) only follow a character class, not a grouping.

  Am I missing something?

  -spc (I mean, besides "using LPEG"?)

Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Jonathan Goble
On Mon, Sep 9, 2019 at 11:22 PM Sean Conner <[hidden email]> wrote:

>
>   Normally I reach for LPEG to handle my parsing chores, but I have a
> project where that ... well, I'd be reimplementing a form of regex anyway,
> so why not use Lua patterns and avoid that mess.
>
>   But I have an issue that I do not know the answer to---I suspect there
> isn't an answer but "use a real regex or LPEG".  But I thought I would ask
> anyway.
>
>   I have a pattern that looks like this BNF:
>
>         TEXT    = 'A' - 'Z' / 'a' - 'z'
>         DIGIT   = '0' - '9'
>
>         pattern = 1*TEXT [ ';' 1*DIGIT ]
>
>   In English, text, optionally followed by a semicolon and some digits.  So
> some valid examples:
>
>         foo
>         foo;1
>         foo;444
>
>   Invalid examples are
>
>         foo;
>         foo23
>
>   If there's a semicolon, it must be followed by a digit; if there's no
> semicolon, no digits.  This is trivial (to me) in LPEG.  No so with Lua
> patterns.  There's:
>
>         "%a%;?(%d+)
>
> but that allows "foo23" to slip through.  What I would like would be:
>
>         %a(%;(%d+))?
>
> although that doesn't work, since the '?' can (if I'm reading the
> documentation right) only follow a character class, not a grouping.

Correct.

>   Am I missing something?
>
>   -spc (I mean, besides "using LPEG"?)

I feel like the typical way of doing this kind of thing in Lua is a
two-step process.

Untested example:

result = teststr:match "%a(%;?%d*)"

if result then
    if #result == 0 then
        print "match with no semicolon or number"
    elseif result:match "%;%d+" then
        print "match with semicolon and number"
    else
        print "no match (semicolon without number or vice versa)"
    end
else
    print "no match"
end

Adjust to suit your specific needs.

Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Peter W A Wood

On 10 Sep 2019, at 18:41, Jonathan Goble <[hidden email]> wrote:

On Mon, Sep 9, 2019 at 11:22 PM Sean Conner <[hidden email]> wrote:

 Normally I reach for LPEG to handle my parsing chores, but I have a
project where that ... well, I'd be reimplementing a form of regex anyway,
so why not use Lua patterns and avoid that mess.

 But I have an issue that I do not know the answer to---I suspect there
isn't an answer but "use a real regex or LPEG".  But I thought I would ask
anyway.

 I have a pattern that looks like this BNF:

       TEXT    = 'A' - 'Z' / 'a' - 'z'
       DIGIT   = '0' - '9'

       pattern = 1*TEXT [ ';' 1*DIGIT ]

 In English, text, optionally followed by a semicolon and some digits.  So
some valid examples:

       foo
       foo;1
       foo;444

 Invalid examples are

       foo;
       foo23

 If there's a semicolon, it must be followed by a digit; if there's no
semicolon, no digits.  This is trivial (to me) in LPEG.  No so with Lua
patterns.  There's:

       "%a%;?(%d+)

but that allows "foo23" to slip through.  What I would like would be:

       %a(%;(%d+))?

although that doesn't work, since the '?' can (if I'm reading the
documentation right) only follow a character class, not a grouping.

Correct.

 Am I missing something?

 -spc (I mean, besides "using LPEG"?)

I feel like the typical way of doing this kind of thing in Lua is a
two-step process.

Untested example:

result = teststr:match "%a(%;?%d*)"

if result then
   if #result == 0 then
       print "match with no semicolon or number"
   elseif result:match "%;%d+" then
       print "match with semicolon and number"
   else
       print "no match (semicolon without number or vice versa)"
   end
else
   print "no match"
end

Adjust to suit your specific needs.


I also came up with a two-step method but wasn’t sure if it was really applicable in the actual use case. My solution does not seem very elegant but handles Sean’s five examples properly:

> function double_match (s)
>> local text = s:match('^(%a+)$')
>>   if text then
>>     return text, nil
>>   else
>>     return s:match('^(%a+);(%d+)')
>>   end
>> end
> = double_match('foo')
foo nil
> = double_match('foo;1')
foo 1
> = double_match('foo;444')
foo 444
> = double_match('foo;')
nil
> = double_match('foo23')
nil

Peter
Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Albert Chan

On Sep 10, 2019, at 7:04 AM, Peter W A Wood <[hidden email]> wrote:

I also came up with a two-step method but wasn’t sure if it was really applicable in the actual use case. My solution does not seem very elegant but handles Sean’s five examples properly:

> function double_match (s)
>> local text = s:match('^(%a+)$')
>>   if text then
>>     return text, nil
>>   else
>>     return s:match('^(%a+);(%d+)')
>>   end
>> end
> = double_match('foo')
foo nil
> = double_match('foo;1')
foo 1
> = double_match('foo;444')
foo 444
> = double_match('foo;')
nil
> = double_match('foo23')
nil

Peter

This version avoided "double matching", perhaps slightly more efficient:

function match2(s)
    if #s == 0 then return end
    local i = string.find(s, "%A")
    if i == nil then return s, nil end
    i = string.find(s, "^;%d+$", i)
    if i then return s:sub(1,i-1), s:sub(i+1) end
end
Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

aryajur
In reply to this post by Sean Conner
Another way that works for the 5 samples but still needs 2 matches:

> p1 = "%a*%;(%d+)"
> p2 = "^%a*$" 
> s1 = "foo"
> s2 = "foo;1"
> s3 = "foo;444"
> s4 = "foo;"
> s5 = "foo23"
> s = s1
> s:match(p1) or s:match(p2)
foo
> s = s2
> s:match(p1) or s:match(p2)
1
> s = s3
> s:match(p1) or s:match(p2)
444
> s = s4
> s:match(p1) or s:match(p2)
nil
> s = s5

On Mon, Sep 9, 2019 at 8:22 PM Sean Conner <[hidden email]> wrote:

  Normally I reach for LPEG to handle my parsing chores, but I have a
project where that ... well, I'd be reimplementing a form of regex anyway,
so why not use Lua patterns and avoid that mess.

  But I have an issue that I do not know the answer to---I suspect there
isn't an answer but "use a real regex or LPEG".  But I thought I would ask
anyway.

  I have a pattern that looks like this BNF:

        TEXT    = 'A' - 'Z' / 'a' - 'z'
        DIGIT   = '0' - '9'

        pattern = 1*TEXT [ ';' 1*DIGIT ]

  In English, text, optionally followed by a semicolon and some digits.  So
some valid examples:

        foo
        foo;1
        foo;444

  Invalid examples are

        foo;
        foo23

  If there's a semicolon, it must be followed by a digit; if there's no
semicolon, no digits.  This is trivial (to me) in LPEG.  No so with Lua
patterns.  There's:

        "%a%;?(%d+)

but that allows "foo23" to slip through.  What I would like would be:

        %a(%;(%d+))?

although that doesn't work, since the '?' can (if I'm reading the
documentation right) only follow a character class, not a grouping.

  Am I missing something?

  -spc (I mean, besides "using LPEG"?)

Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Roberto Ierusalimschy
In reply to this post by Sean Conner
>   I have a pattern that looks like this BNF:
>
> TEXT = 'A' - 'Z' / 'a' - 'z'
> DIGIT = '0' - '9'
>
> pattern = 1*TEXT [ ';' 1*DIGIT ]

Maybe like this?

  local text, sep, digit = string.match(s, "^(%a+)(;?)(%d*)$")
  if not text or (sep == "") ~= (digit == "") then
    error("bla bla bla")
  end
  return text, tonumber(digit)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Egor Skriptunoff-2
In reply to this post by Sean Conner
On Tue, Sep 10, 2019 at 6:22 AM Sean Conner wrote:

  I have a pattern that looks like this BNF:

        TEXT    = 'A' - 'Z' / 'a' - 'z'
        DIGIT   = '0' - '9'

        pattern = 1*TEXT [ ';' 1*DIGIT ]

  In English, text, optionally followed by a semicolon and some digits.  So
some valid examples:

        foo
        foo;1
        foo;444

  Invalid examples are

        foo;
        foo23

  If there's a semicolon, it must be followed by a digit; if there's no
semicolon, no digits.

 Am I missing something?


This is the Lua pattern you're searching for:  ^(%a+)%f[;%z];?(%d*)%f[;%z]$


for _, s in ipairs{"foo", "foo;1", "foo;444", "foo;", "foo23"} do
   local text, digit = string.match(s, "^(%a+)%f[;%z];?(%d*)%f[;%z]$")
   print(s, text, digit)
end


Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Sean Conner
In reply to this post by Roberto Ierusalimschy
It was thus said that the Great Roberto Ierusalimschy once stated:

> >   I have a pattern that looks like this BNF:
> >
> > TEXT = 'A' - 'Z' / 'a' - 'z'
> > DIGIT = '0' - '9'
> >
> > pattern = 1*TEXT [ ';' 1*DIGIT ]
>
> Maybe like this?
>
>   local text, sep, digit = string.match(s, "^(%a+)(;?)(%d*)$")
>   if not text or (sep == "") ~= (digit == "") then
>     error("bla bla bla")
>   end
>   return text, tonumber(digit)

  I'll have to test it.  The LPEG pattern I would use would be:

        text    = R("AZ","az")
        digit   = R"09"
        pattern = C(text^1) * (P";" * (digit^1 / tonumber) + Cc(77))
        -- 77 is the default value if no number is given yada yada ...

but that's hard to express in the context of what I'm trying to do (I mean,
*I'm* comfortable using LPEG for this, but not many other people woule be),
but so far, your suggestion is about the best I've seen.

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Sean Conner
In reply to this post by Peter W A Wood
It was thus said that the Great Peter W A Wood once stated:

> > On 10 Sep 2019, at 18:41, Jonathan Goble <[hidden email]> wrote:
> > On Mon, Sep 9, 2019 at 11:22 PM Sean Conner <[hidden email]> wrote:
> >>
> >>  In English, text, optionally followed by a semicolon and some digits.  So
> >> some valid examples:
> >>
> >>        foo
> >>        foo;1
> >>        foo;444
> >>
> >>  Invalid examples are
> >>
> >>        foo;
> >>        foo23
>
> I also came up with a two-step method but wasn’t sure if it was really
> applicable in the actual use case. My solution does not seem very elegant
> but handles Sean’s five examples properly:
>
> > function double_match (s)
> >> local text = s:match('^(%a+)$')
> >>   if text then
> >>     return text, nil
> >>   else
> >>     return s:match('^(%a+);(%d+)')
> >>   end
> >> end
> > = double_match('foo')
> foo nil

  This is incorrect, "foo" by itself is valid (see above).

> > = double_match('foo;1')
> foo 1
> > = double_match('foo;444')
> foo 444
> > = double_match('foo;')
> nil
> > = double_match('foo23')
> nil

  -spc (But yeah, a two-step process it looking like the solution ... sigh)

Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Sean Conner
In reply to this post by Egor Skriptunoff-2
It was thus said that the Great Egor Skriptunoff once stated:

> On Tue, Sep 10, 2019 at 6:22 AM Sean Conner wrote:
>
> >   I have a pattern that looks like this BNF:
> >
> >         TEXT    = 'A' - 'Z' / 'a' - 'z'
> >         DIGIT   = '0' - '9'
> >
> >         pattern = 1*TEXT [ ';' 1*DIGIT ]
> >
> >   In English, text, optionally followed by a semicolon and some digits.  So
> > some valid examples:
> >
> >         foo
> >         foo;1
> >         foo;444
> >
> >   Invalid examples are
> >
> >         foo;
> >         foo23
> >
> >   If there's a semicolon, it must be followed by a digit; if there's no
> > semicolon, no digits.
> >
> >  Am I missing something?
> >
> This is the Lua pattern you're searching for:  ^(%a+)%f[;%z];?(%d*)%f[;%z]$

  WOW!  

> for _, s in ipairs{"foo", "foo;1", "foo;444", "foo;", "foo23"} do
>    local text, digit = string.match(s, "^(%a+)%f[;%z];?(%d*)%f[;%z]$")
>    print(s, text, digit)
> end

  Yeah ... that does work.  And I'll probably use it but ... wow!  I just
wish it wasn't so hideous looking.  Thank you.

  -spc (And now I remember why I like LPEG over Lua patterns ... )

Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Jonathan Goble
In reply to this post by Sean Conner
On Tue, Sep 10, 2019 at 6:11 PM Sean Conner <[hidden email]> wrote:

>
> It was thus said that the Great Peter W A Wood once stated:
> > > On 10 Sep 2019, at 18:41, Jonathan Goble <[hidden email]> wrote:
> > > On Mon, Sep 9, 2019 at 11:22 PM Sean Conner <[hidden email]> wrote:
> > >>
> > >>  In English, text, optionally followed by a semicolon and some digits.  So
> > >> some valid examples:
> > >>
> > >>        foo
> > >>        foo;1
> > >>        foo;444
> > >>
> > >>  Invalid examples are
> > >>
> > >>        foo;
> > >>        foo23
> >
> > I also came up with a two-step method but wasn’t sure if it was really
> > applicable in the actual use case. My solution does not seem very elegant
> > but handles Sean’s five examples properly:
> >
> > > function double_match (s)
> > >> local text = s:match('^(%a+)$')
> > >>   if text then
> > >>     return text, nil
> > >>   else
> > >>     return s:match('^(%a+);(%d+)')
> > >>   end
> > >> end
> > > = double_match('foo')
> > foo   nil
>
>   This is incorrect, "foo" by itself is valid (see above).

No, it's correct. It shows two return values: "foo" and nil. nil as
the second return value indicates no semicolon or number, but "foo" as
the first return value clearly indicates that it was found. Note that
the invalid strings further down return just a single nil. So with
this solution, truthiness of the first return value indicates a valid
or invalid string, and in the case of a valid string, truthiness of
the second return value indicates the presence or lack of a semicolon
and number.

Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Sean Conner
It was thus said that the Great Jonathan Goble once stated:
> >   This is incorrect, "foo" by itself is valid (see above).
>
> No, it's correct.

  Yes, you are correct.  Sorry about that.

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Phil Leblanc
In reply to this post by Egor Skriptunoff-2
On Tue, Sep 10, 2019 at 10:08 PM Egor Skriptunoff
<[hidden email]> wrote:
(...)
> This is the Lua pattern you're searching for:  ^(%a+)%f[;%z];?(%d*)%f[;%z]$
>

I looked for the %z pattern in the manual (Lua 5.3, 5.4-alpha-rc2) but
didn't find it.

Did I miss something obvious?

Phil

Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Thorkil Naur
Hello Phil,

On Wed, Sep 11, 2019 at 05:49:01PM +0000, Phil Leblanc wrote:
> I looked for the %z pattern in the manual (Lua 5.3, 5.4-alpha-rc2) but
> didn't find it.

In

  https://www.lua.org/manual/5.2/manual.html
  8.2 – Changes in the Libraries

you'll find

  Character class %z in patterns is deprecated, as now patterns may
  contain '\0' as a regular character.

Nevertheless, %z is still supported by at least the Lua 5.3.4 I checked:

> $ awk '/static int match_class/,/^}/' lstrlib.c
> static int match_class (int c, int cl) {
>   int res;
>   switch (tolower(cl)) {
>     case 'a' : res = isalpha(c); break;
>     case 'c' : res = iscntrl(c); break;
>     case 'd' : res = isdigit(c); break;
>     case 'g' : res = isgraph(c); break;
>     case 'l' : res = islower(c); break;
>     case 'p' : res = ispunct(c); break;
>     case 's' : res = isspace(c); break;
>     case 'u' : res = isupper(c); break;
>     case 'w' : res = isalnum(c); break;
>     case 'x' : res = isxdigit(c); break;
>     case 'z' : res = (c == 0); break;  /* deprecated option */
>     default: return (cl == c);
>   }
>   return (islower(cl) ? res : !res);
> }
>
> $

Best regards
Thorkil

Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Phil Leblanc
On Wed, Sep 11, 2019 at 6:55 PM Thorkil Naur <[hidden email]> wrote:
>
>   Character class %z in patterns is deprecated, as now patterns may
>   contain '\0' as a regular character.
>
> Nevertheless, %z is still supported by at least the Lua 5.3.4 I checked:
>
Ah yes, now that you  mention it, it rings a bell...! :-)  I tried the
(nice!) pattern with Lua 5.3, and just didn't check older versions'
manual. ..

Thanks for your answer, and sorry for the noise

Phil

Reply | Threaded
Open this post in threaded view
|

Re: A Lua pattern question about optional patterns

Egor Skriptunoff-2
In reply to this post by Phil Leblanc
On Wed, Sep 11, 2019 at 8:49 PM Phil Leblanc wrote:
I looked for the %z pattern in the manual (Lua 5.3, 5.4-alpha-rc2) but
didn't find it.



The %z allows us to write Lua patterns working both in Lua 5.1 and Lua 5.3.
It's very good that %z is still supported by Lua 5.4 (although not mentioned in the manual).