Proposal for lpeg

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Proposal for lpeg

Albert Chan
lpeg.P always match at beginning of string, maybe an efficient matching anywhere function ?

Example:
keywords = P'while' + 'repeat' + 'for'

to search keywords anywhere efficiently, skip all bad head chars

skip = 1 - S"wrf"    -- set of skipped chars

--> lpeg.M(keywords) == P{ keywords + 1 * skip ^ 0 * V(1) }

lpeg.M(Cp() * keywords * Cp()) == position of first keyword

lpeg.M(C(keywords))^1 == capture all the keywords

it will be convenient to have lpeg go thru the lpeg object tree to build skip set,
and produce an efficient matching object.

If we do it by hand, and more keywords were added, skip have to be updated too.
with lpeg.M (if available), skip set is built on the fly, and always up-to-date






Reply | Threaded
Open this post in threaded view
|

Re: Proposal for lpeg

Sean Conner
It was thus said that the Great albertmcchan once stated:
> lpeg.P always match at beginning of string, maybe an efficient matching anywhere function ?

  You do realize that all LPeg functions take an optional starting position.  

> Example:
> keywords = P'while' + 'repeat' + 'for'
>
> to search keywords anywhere efficiently, skip all bad head chars
>
> skip = 1 - S"wrf"    -- set of skipped chars

  Have you timed the above vs

        skip = P(1) - keywords

  -spc


Reply | Threaded
Open this post in threaded view
|

Re: Proposal for lpeg

Albert Chan
On Feb 6, 2018, at 1:46 PM, Sean Conner <[hidden email]> wrote:

> It was thus said that the Great albertmcchan once stated:
>> lpeg.P always match at beginning of string, maybe an efficient matching anywhere function ?
>
>  You do realize that all LPeg functions take an optional starting position.  

but i dont know the position ! (it could be anywhere)

>> Example:
>> keywords = P'while' + 'repeat' + 'for'
>>
>> to search keywords anywhere efficiently, skip all bad head chars
>>
>> skip = 1 - S"wrf"    -- set of skipped chars
>
>  Have you timed the above vs
>
>    skip = P(1) - keywords
>
>  -spc

Roberto wrote an article on this: http://www.inf.puc-rio.br/~roberto/docs/peg.pdf

His article about searching words in the bible suggested skipping bad head
characters is 2x to 4x faster (similar to PCRE performance, see p25)

Using skip = 1 - keywords defeated the purpose of optimization

In other words, this is probably faster: P{ keywords + 1 * V(1) }

Both skip ideas had been benchmarked: page 27, figure 5
column 2: skip = P(1) - 'transparent'
column 3: skip = P(1) - 't'

skipping head chars is twice as fast (the benchmark is only for 1 word)
Multiple keywords search is going to be even faster.

I just thought this optimization stuff may be better done inside lpeg (in compiling phase)
Just wrap all these optimization in lpeg.M, and we can capture all keywords like this:

M( C(keywords) )^1