Static validation of Lua String Patterns?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Static validation of Lua String Patterns?

steve donovan
Hi all,

I have a use case where it's important to guarantee that a lua string
pattern is valid up front.
(The use case is providing a Rust binding to the Lua pattern code -
like Go, panicking in library code is considered very bad form, and
passing on errors complicates the whole API.)

Generally we only know when the pattern is actually used, but has
there been any prior work on static validation?  Before I start
hacking, that is ;)

steve d.

Reply | Threaded
Open this post in threaded view
|

Re: Static validation of Lua String Patterns?

Dirk Laurie-2
2017-04-21 10:09 GMT+02:00 steve donovan <[hidden email]>:

> I have a use case where it's important to guarantee that a lua string
> pattern is valid up front.
> (The use case is providing a Rust binding to the Lua pattern code -
> like Go, panicking in library code is considered very bad form, and
> passing on errors complicates the whole API.)
>
> Generally we only know when the pattern is actually used, but has
> there been any prior work on static validation?  Before I start
> hacking, that is ;)

Is there such a thing as a pattern whose validity depends on what
string it is being applied to? If not, pcall(string.match, pattern, "")
should be adequate to test validity.

Reply | Threaded
Open this post in threaded view
|

Re: Static validation of Lua String Patterns?

Egor Skriptunoff-2
On Fri, Apr 21, 2017 at 11:40 PM, Dirk Laurie wrote:
2017-04-21 10:09 GMT+02:00 steve donovan:

> I have a use case where it's important to guarantee that a lua string
> pattern is valid up front.

Is there such a thing as a pattern whose validity depends on what
string it is being applied to? If not, pcall(string.match, pattern, "")
should be adequate to test validity.


"(.)%2" is an invalid pattern, but it could be applied to empty string without an error.
Reply | Threaded
Open this post in threaded view
|

Re: Static validation of Lua String Patterns?

steve donovan
On Fri, Apr 21, 2017 at 11:43 PM, Egor Skriptunoff
<[hidden email]> wrote:
> "(.)%2" is an invalid pattern, but it could be applied to empty string
> without an error.

Yes, exactly. There are a number of these which go beyond simple
mechanical validation (brackets matching up etc).

My feeling is that it is possible to do this (after all, regexps can
be compiled) but not without going through a similar, recursive
process to existing matching - so you know how many captures you have,
etc.

Reply | Threaded
Open this post in threaded view
|

Re: Static validation of Lua String Patterns?

Dirk Laurie-2
2017-04-22 9:30 GMT+02:00 steve donovan <[hidden email]>:
> On Fri, Apr 21, 2017 at 11:43 PM, Egor Skriptunoff
> <[hidden email]> wrote:
>> "(.)%2" is an invalid pattern, but it could be applied to empty string
>> without an error.

I.e. you should do "pcall(string.match,str,pat)" instead of str:match(pat)
whenever either "str" or "pat" is not under your control.
   https://xkcd.com/327/

Reply | Threaded
Open this post in threaded view
|

Re: Static validation of Lua String Patterns?

Martin
In reply to this post by steve donovan


On 04/22/2017 12:30 AM, steve donovan wrote:

> On Fri, Apr 21, 2017 at 11:43 PM, Egor Skriptunoff
> <[hidden email]> wrote:
>> "(.)%2" is an invalid pattern, but it could be applied to empty string
>> without an error.
>
> Yes, exactly. There are a number of these which go beyond simple
> mechanical validation (brackets matching up etc).
>
> My feeling is that it is possible to do this (after all, regexps can
> be compiled) but not without going through a similar, recursive
> process to existing matching - so you know how many captures you have,
> etc.

I've implemented Lua regexps parser some months ago. Just to have a
mechanized way to get meaning of tricky regexps strings. It parses
string to folded table where keys are strings with structure names
or sequence integers. And values are strings or tables with same
structure.

This AST may be used to detect usage of invalid reference in "(.)%2".
(Are there are cases of "incorrect" pattern strings?)

Below is a link to code describing regexp structure. It uses my own
generic parser which gets structure to parse in form of lua table,
not string (which is very handy in passing self-linked structures).
There is no frontend for this code (as I saw no need in this).

If you need function that gets regexp string and returns folded table
in described format I may implement it.

https://github.com/martin-eden/workshop/blob/master/formats/lua_regexps/load/syntax.lua#L76

-- Martin

Reply | Threaded
Open this post in threaded view
|

Re: Static validation of Lua String Patterns?

steve donovan
In reply to this post by Dirk Laurie-2
On Sat, Apr 22, 2017 at 11:46 AM, Dirk Laurie <[hidden email]> wrote:
> I.e. you should do "pcall(string.match,str,pat)" instead of str:match(pat)
> whenever either "str" or "pat" is not under your control.

Well, there is just the original C string matching code (lifted
carefully using a MIT license) and so the Rust equivalent would be
'all matches may fail with an error'. Currently it's just panicking
which is bad, man.  Not too bad really, but if I had static validation
then I _know_ the matcher will not fail - and so you would fail up
front when making the matcher struct, rather than any match on that
struct.

>    https://xkcd.com/327/
Excellent, sir! Required reading for anybody passing around unescaped SQL.