Time constraint in Lua pattern functions

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Time constraint in Lua pattern functions

Egor Skriptunoff-2
Hi !

After I've upgraded Lua to 5.3.2, one of my scripts terminates with
"pattern too complex" error message.

Probably, this is because of gmatch using non-optimal pattern
(having quadratic time complexity), which may require up to 2 sec
to complete.

Of course, it is possible to rewrite that script to make its time
complexity linear (at the cost of extra LOC and more complex logic
of code).

But the are two reasons for NOT rewriting it:
1) I don't want to spent my time on rewriting my old script
because I'm quite happy with its current performance (2-3 seconds
is OK for me).
2) I don't want to bring extra complexity to the script.
As for now, it is one-liner regexp, and I'd like to stay it
as simple as it is.

Frankly speaking, I'd like to see time constraint be removed from
pattern functions, as it unpleasantly reduces their abilities.
Currently I've rolled back to previous Lua version to solve
my problem, but I want to know how to invoke Lua pattern functions
providing unlimited execution time for them?

Is there more convenient way than rebuilding Lua with modified B_REPS
in lstrlib.c?
Can I set B_REPS using -D compiler switch to keep the sources intact?

Of course, in some situations it may be very useful to apply time
constraint on some Lua code (especially code containing pattern
functions, "while" loops, recursion, etc.)
IMO, more correct way to achieve this is to introduce new function
time_restricted_pcall(max_CPU_time, func, args)
instead of hard-coding such limitations.

-- Egor

Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Roberto Ierusalimschy
> After I've upgraded Lua to 5.3.2, one of my scripts terminates with
> "pattern too complex" error message.
>
> Probably, this is because of gmatch using non-optimal pattern
> (having quadratic time complexity), which may require up to 2 sec
> to complete.
>
> Of course, it is possible to rewrite that script to make its time
> complexity linear (at the cost of extra LOC and more complex logic
> of code).
>
> But the are two reasons for NOT rewriting it:
> 1) I don't want to spent my time on rewriting my old script
> because I'm quite happy with its current performance (2-3 seconds
> is OK for me).
> 2) I don't want to bring extra complexity to the script.
> As for now, it is one-liner regexp, and I'd like to stay it
> as simple as it is.

Can you show your regexp/subject?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Egor Skriptunoff-2
On Mon, Mar 14, 2016 at 10:02 PM, Roberto Ierusalimschy <[hidden email]> wrote:
> After I've upgraded Lua to 5.3.2, one of my scripts terminates with
> "pattern too complex" error message.
>
> Probably, this is because of gmatch using non-optimal pattern
> (having quadratic time complexity), which may require up to 2 sec
> to complete.
>
> Of course, it is possible to rewrite that script to make its time
> complexity linear (at the cost of extra LOC and more complex logic
> of code).
>
> But the are two reasons for NOT rewriting it:
> 1) I don't want to spent my time on rewriting my old script
> because I'm quite happy with its current performance (2-3 seconds
> is OK for me).
> 2) I don't want to bring extra complexity to the script.
> As for now, it is one-liner regexp, and I'd like to stay it
> as simple as it is.

Can you show your regexp/subject?


I have a code similar to this one:

local pattern =
  'id="post(%d+)".-class="Post Header".-<h2>(.-)</h2>.-(/forum/post%1%.htm#details)'
for id, title, link in main_forum_page:gmatch(pattern) do
  analyze_post(id, title, link)
end


Once in a while a post does not have a "View details" link (that is, third capture does not match).
In such rare cases non-linear behavior is observed due to chain of four ".-" in the pattern.
I prefer waiting in runtime for 2 seconds to losing simplicity of the code.

Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Soni "They/Them" L.


On 14/03/16 05:26 PM, Egor Skriptunoff wrote:

> On Mon, Mar 14, 2016 at 10:02 PM, Roberto Ierusalimschy
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     > After I've upgraded Lua to 5.3.2, one of my scripts terminates with
>     > "pattern too complex" error message.
>     >
>     > Probably, this is because of gmatch using non-optimal pattern
>     > (having quadratic time complexity), which may require up to 2 sec
>     > to complete.
>     >
>     > Of course, it is possible to rewrite that script to make its time
>     > complexity linear (at the cost of extra LOC and more complex logic
>     > of code).
>     >
>     > But the are two reasons for NOT rewriting it:
>     > 1) I don't want to spent my time on rewriting my old script
>     > because I'm quite happy with its current performance (2-3 seconds
>     > is OK for me).
>     > 2) I don't want to bring extra complexity to the script.
>     > As for now, it is one-liner regexp, and I'd like to stay it
>     > as simple as it is.
>
>     Can you show your regexp/subject?
>
>
> I have a code similar to this one:
>
> local pattern =
>   'id="post(%d+)".-class="Post
> Header".-<h2>(.-)</h2>.-(/forum/post%1%.htm#details)'
> for id, title, link in main_forum_page:gmatch(pattern) do
>   analyze_post(id, title, link)
> end
>
> Once in a while a post does not have a "View details" link (that is,
> third capture does not match).
> In such rare cases non-linear behavior is observed due to chain of
> four ".-" in the pattern.
> I prefer waiting in runtime for 2 seconds to losing simplicity of the
> code.
>
So this is a variant to the parsing HTML with regex problem...

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Egor Skriptunoff-2
On Mon, Mar 14, 2016 at 11:29 PM, Soni L. <[hidden email]> wrote:
So this is a variant to the parsing HTML with regex problem...

The problem is not HTML-specific.

Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Sean Conner
In reply to this post by Egor Skriptunoff-2
It was thus said that the Great Egor Skriptunoff once stated:
>
> Of course, in some situations it may be very useful to apply time
> constraint on some Lua code (especially code containing pattern functions,
> "while" loops, recursion, etc.) IMO, more correct way to achieve this is
> to introduce new function time_restricted_pcall(max_CPU_time, func, args)
> instead of hard-coding such limitations.

  If the code was purely Lua, then time_restricted_pcall() would be easily
written.  However, if any calls to C are made, then it gets complicated and
requires system specific code to handle.  On a POSIX system, I can see two
solutions, neither one decent.

  First solution: setitimer().  But this also involves catching SIGALRM and
it's considered bad form for a library to catch signals [1].  This is
especially true for SIGALRM as there can be only one outstanding interval
timer in a process.  Also, signals are just very nasty to work with (the
hardest bugs I've had to debug have always dealt with signals) and can
interact wierdly with code [3].

  Second solution: setrlimit().  We can limit the amount of CPU time we use.
But this too, requires a signal handler, SIGXCPU or else if the process
exceeds the CPU time, it's terminated.  A library routine catching SIGXCPU
might be less of an issue than catching SIGALARM but there's still a
potential bug---setrlimit() is not an asynchronous-safe function, meaning
it's not safe to call it from a signal handler, and we *have* to call it
from the SIGXCPU signal handler or else our process goes bye-bye as it
exceeded its allowable CPU limit.

  -spc (It's not an easy problem ... )

[1] The least of which is it violates the rule of least surprise [2]. If
        a library catches, say, SIGPIPE, that might cause issues if the
        programmer is unaware of that and attempts to catch SIGPIPE
        elsewhere.  The programmer than ends up with a confusing set of
        bugs.

[2] http://www.catb.org/~esr/writings/taoup/html/ch01s06.html#id2878339

[3] I had to write a signal handler in C [4] because I couldn't get it
        working as I wanted in Lua.  Here's the main comment from the C
        code:

                This entire file exists *because* I can't properly handle
                SIGCHLD in the server process.  What I want to do is just
                get the reason for a handler process terminating, but not
                wanting to deal with EINTR, so I set the handler to restart
                system calls.  But while I can do that in Lua, the Lua
                signal handler doesn't get control until the Lua VM gets
                control, and if we're sitting in a system call, that might
                take some time.  So, we do this in C, which can get the job
                done.

[4] gopher://gopher.conman.org/0Gopher:Src:reapchild.c

Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Sean Conner
It was thus said that the Great Sean Conner once stated:
>
>   Second solution: setrlimit().  We can limit the amount of CPU time we use.
> But this too, requires a signal handler, SIGXCPU or else if the process
> exceeds the CPU time, it's terminated.  A library routine catching SIGXCPU
> might be less of an issue than catching SIGALARM but there's still a
> potential bug---setrlimit() is not an asynchronous-safe function, meaning
> it's not safe to call it from a signal handler, and we *have* to call it
> from the SIGXCPU signal handler or else our process goes bye-bye as it
> exceeded its allowable CPU limit.

  Oh, some other issues with this solution.  One, even *if* setrlimit() was
an aynchronous-safe function, we would need to track the previous setting
and up it each time the signal handler fired.  We can keep doing this until
we hit the hard CPU limit and then ... well ... the process comes to an
abrupt end.

  The second issue---setrlimit() sets the amount of time our process can run
on the CPU.  It's *NOT* wall-clock time.  Let's assume we set a limit of one
second of CPU time.  The following C code:

        char c;
        read(0,&c,1);

can sit there forever [1] and never accumulate CPU time, assuming no input
is forthcoming.  So depending upon what you are trying to timeout, this
method might not work that well either.

  -spc (Man, it just gets uglier and uglier ... )

[1] Well, not technically "forever" as there will be a heat-death of the
        universe to contend with, assuming the computer will survive our sun
        dying but you get the idea.



Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Viacheslav Usov
In reply to this post by Sean Conner
On Mon, Mar 14, 2016 at 9:54 PM, Sean Conner <[hidden email]> wrote:

> If the code was purely Lua, then time_restricted_pcall() would be easily written.  However, if any calls to C are made, then it gets complicated and requires system specific code to handle.

That is hardly different from a native application being blocked on a kernel call, and we all know that being able to break user-mode execution is still very useful for debugging. By the same token, it would be very useful to be able to break just the Lua VM when it is not running any native code.

We have debug hooks, which almost do the right thing with the "break on N instructions" feature, but they need to be set in advance, and choosing the N is not straightforward because it implies a performance vs responsiveness trade-off. It would be nice to have something like lua_break() that can be called on a Lua environment (which may be running some Lua code in another native thread), which would essentially call a user-supplied function before the next VM instruction gets executed, or at some other convenient VM execution milestone not very distant in future, in the original native thread.

Speaking of blocking native code, I do not think a fully generic portable solution is possible. Library writers, however, know (in principle) when their native code can possibly take a while (or block on a kernel call), and could then make that interruptible and/or cancellable. Then all it takes is a way to communicate to Lua that a given native function is interruptible/cancellable, and some convention to request interruption/cancellation.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Roberto Ierusalimschy
> We have debug hooks, which almost do the right thing with the "break on N
> instructions" feature, but they need to be set in advance, and choosing the
> N is not straightforward because it implies a performance vs responsiveness
> trade-off. It would be nice to have something like lua_break() that can be
> called on a Lua environment (which may be running some Lua code in another
> native thread), which would essentially call a user-supplied function
> before the next VM instruction gets executed, or at some other convenient
> VM execution milestone not very distant in future, in the original native
> thread.

You can implement a sort of 'lua_break' with hooks and signals; that
is how the stand-alone lua interpreter handles Ctrl-C, for instance.
(See file 'lua.c' in the main distribuition for details.) However,
like hooks, that cannot break loops in C code (e.g., library functions).

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Soni "They/Them" L.
In reply to this post by Viacheslav Usov


On 30/03/16 08:11 AM, Viacheslav Usov wrote:

> On Mon, Mar 14, 2016 at 9:54 PM, Sean Conner <[hidden email]
> <mailto:[hidden email]>> wrote:
>
> > If the code was purely Lua, then time_restricted_pcall() would be
> easily written.  However, if any calls to C are made, then it gets
> complicated and requires system specific code to handle.
>
> That is hardly different from a native application being blocked on a
> kernel call, and we all know that being able to break user-mode
> execution is still very useful for debugging. By the same token, it
> would be very useful to be able to break just the Lua VM when it is
> not running any native code.
>
> We have debug hooks, which almost do the right thing with the "break
> on N instructions" feature, but they need to be set in advance, and
> choosing the N is not straightforward because it implies a performance
> vs responsiveness trade-off. It would be nice to have something like
> lua_break() that can be called on a Lua environment (which may be
> running some Lua code in another native thread), which would
> essentially call a user-supplied function before the next VM
> instruction gets executed, or at some other convenient VM execution
> milestone not very distant in future, in the original native thread.
>
> Speaking of blocking native code, I do not think a fully generic
> portable solution is possible. Library writers, however, know (in
> principle) when their native code can possibly take a while (or block
> on a kernel call), and could then make that interruptible and/or
> cancellable. Then all it takes is a way to communicate to Lua that a
> given native function is interruptible/cancellable, and some
> convention to request interruption/cancellation.
>
> Cheers,
> V.
Allowing native code to trigger debug hooks and stuff would be nice.

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.


Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Viacheslav Usov
In reply to this post by Roberto Ierusalimschy
On Wed, Mar 30, 2016 at 2:47 PM, Roberto Ierusalimschy <[hidden email]> wrote:

> You can implement a sort of 'lua_break' with hooks and signals; that is how the stand-alone lua interpreter handles Ctrl-C, for instance. (See file 'lua.c' in the main distribuition for details.)

The code has the following remark:

/*
** Function to be called at a C signal. Because a C signal cannot
** just change a Lua state (as there is no proper synchronization),
** this function only sets a hook that, when called, will stop the
** interpreter.
*/

However, lua_sethook is not documented to be free of the need to synchronize. Looking at its code, setting a hook is definitely not a change in a single atomic variable, so I wonder whether calling lua_sethook without synchronization is truly portable

To this day, I have thought that the only correct way to interrupt execution in the Lua VM is by registering a debug hook in advance.

> However, like hooks, that cannot break loops in C code (e.g., library functions).

Yeah, that's why I wrote about the interruption/cancellation extension.

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Roberto Ierusalimschy
> However, lua_sethook is not documented to be free of the need to
> synchronize. Looking at its code, setting a hook is definitely not a change
> in a single atomic variable, so I wonder whether calling lua_sethook
> without synchronization is truly portable

The new version of 'lua_sethook' (not yet released :-) has the
following explanation:

/*
** This function can be called asynchronous (e.g. during a signal).
** Fields 'oldpc', 'basehookcount', and 'hookcount' (set by
** 'resethookcount') are for debug only, and it is no problem if they
** get arbitrary values (causes at most one wrong hook call). 'hookmask'
** is an atomic value. We assume that pointers are atomic too (e.g., gcc
** ensures that for all platforms where it runs). Moreover, 'hook' is
** always checked before being called (see 'luaD_hook').
*/

So, if by "truly portable" you mean we can prove it is correct following
the ANSI C documentation (which is our usual meaning, too), then you
are right. But if we assume that "truly portable" means it will run
correctly in any existing machine, than I guess that function is OK.

(The type of 'hookmask' also changed from 'lu_byte' to 'l_signalT'
in the [yet to be released :-)] new version; I don't know whether
an access to 'char' can be non-atomic.)

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Coda Highland
On Wed, Mar 30, 2016 at 9:14 AM, Roberto Ierusalimschy
<[hidden email]> wrote:

>> However, lua_sethook is not documented to be free of the need to
>> synchronize. Looking at its code, setting a hook is definitely not a change
>> in a single atomic variable, so I wonder whether calling lua_sethook
>> without synchronization is truly portable
>
> The new version of 'lua_sethook' (not yet released :-) has the
> following explanation:
>
> /*
> ** This function can be called asynchronous (e.g. during a signal).
> ** Fields 'oldpc', 'basehookcount', and 'hookcount' (set by
> ** 'resethookcount') are for debug only, and it is no problem if they
> ** get arbitrary values (causes at most one wrong hook call). 'hookmask'
> ** is an atomic value. We assume that pointers are atomic too (e.g., gcc
> ** ensures that for all platforms where it runs). Moreover, 'hook' is
> ** always checked before being called (see 'luaD_hook').
> */
>
> So, if by "truly portable" you mean we can prove it is correct following
> the ANSI C documentation (which is our usual meaning, too), then you
> are right. But if we assume that "truly portable" means it will run
> correctly in any existing machine, than I guess that function is OK.
>
> (The type of 'hookmask' also changed from 'lu_byte' to 'l_signalT'
> in the [yet to be released :-)] new version; I don't know whether
> an access to 'char' can be non-atomic.)
>
> -- Roberto
>

Given that the code is not yet released, I can't inspect it myself. ;)
But are you using compiler intrinsic atomics when they're available?

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Viacheslav Usov
In reply to this post by Roberto Ierusalimschy

On Wed, Mar 30, 2016 at 6:14 PM, Roberto Ierusalimschy <[hidden email]> wrote:

> So, if by "truly portable" you mean we can prove it is correct following the ANSI C documentation (which is our usual meaning, too), then you are right. But if we assume that "truly portable" means it will run correctly in any existing machine, than I guess that function is OK.

That is very good news, Roberto! Having it this way eliminates the headache of how one could break into and/or stop some silly Lua script written by a clueless user without killing the whole application.

Any opinion on the interruptable/cancellable idea? Or should I expand more?

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Roberto Ierusalimschy
In reply to this post by Coda Highland
> Given that the code is not yet released, I can't inspect it myself. ;)

The code did not change (except for the type of 'hookmask' that,
as I mentioned, went from 'lu_byte' to 'l_signalT'). I simply added
an explanation of why the code as it is should work.


> But are you using compiler intrinsic atomics when they're available?

No.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Coda Highland
On Wed, Mar 30, 2016 at 10:50 AM, Roberto Ierusalimschy
<[hidden email]> wrote:

>> Given that the code is not yet released, I can't inspect it myself. ;)
>
> The code did not change (except for the type of 'hookmask' that,
> as I mentioned, went from 'lu_byte' to 'l_signalT'). I simply added
> an explanation of why the code as it is should work.
>
>
>> But are you using compiler intrinsic atomics when they're available?
>
> No.
>
> -- Roberto
>

I suppose that's not that big of a deal, the more I think about it;
true atomics are only necessary when the process is multithreaded,
which isn't true for the reference shell.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Viacheslav Usov
In reply to this post by Roberto Ierusalimschy
On Wed, Mar 30, 2016 at 7:50 PM, Roberto Ierusalimschy <[hidden email]> wrote:

> The code did not change (except for the type of 'hookmask' that, as I mentioned, went from 'lu_byte' to 'l_signalT'). I simply added an explanation of why the code as it is should work.

Shouldn't we have a note to that effect in the reference manual, too?

Something like "breaking into executing Lua code can be achieved by calling lua_sethook from a signal handler or from another thread, which, unlike other Lua API calls, allows that without additional synchronization"?

Cheers,
V.
Reply | Threaded
Open this post in threaded view
|

Re: Time constraint in Lua pattern functions

Viacheslav Usov
In reply to this post by Roberto Ierusalimschy
On Wed, Mar 30, 2016 at 6:14 PM Roberto Ierusalimschy <[hidden email]> wrote:
 
The new version of 'lua_sethook' (not yet released :-) has the
following explanation:

/*
** This function can be called asynchronous (e.g. during a signal).
** Fields 'oldpc', 'basehookcount', and 'hookcount' (set by
** 'resethookcount') are for debug only, and it is no problem if they
** get arbitrary values (causes at most one wrong hook call). 'hookmask'
** is an atomic value. We assume that pointers are atomic too (e.g., gcc
** ensures that for all platforms where it runs). Moreover, 'hook' is
** always checked before being called (see 'luaD_hook').
*/

So, if by "truly portable" you mean we can prove it is correct following
the ANSI C documentation (which is our usual meaning, too), then you
are right. But if we assume that "truly portable" means it will run
correctly in any existing machine, than I guess that function is OK.

A further problem when having to "break into" running Lua code is that lua_sethook() seems to apply only to the specified Lua thread (coroutine). At the very least, the current documentation does not seem to mention whether hooks are per-thread or otherwise, so this is safest to assume (for the purpose of "breaking in") and it is most versatile for general needs, too. However, this makes it more difficult for the hosting application to implement the "break in" functionality, if it has to assume that coroutines might be used, and especially if it has to assume that that coroutines might be created via the C-level API. One could use a custom memory allocator, which keeps track of all the Lua threads for a given Lua state, and then setting the hook in all the known Lua threads, but that, even though it works, does not look like a very simple solution.

The question is, is there a simpler way?

Cheers,
V.