Looking for ideas to lower binding overhead for lua-re2. in plain-text)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Looking for ideas to lower binding overhead for lua-re2. in plain-text)

lua.greatwolf
I'm resubmitting my original post in plain-text this time. Hopefully it'll show up well enough and
the mailing-list doesn't ruin it too badly:

I've been working on some bindings for lua to google's RE2 regex library
and I got most of the important functions working. At the moment, I'm
working on optimizing and reducing the overhead for the bindings.

I've been benching the lua bindings rigorously using the regexdna
benchmark from:

http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=regexdna&lang=all&data=u32

comparing it against a pure C++ implementation as well as pyre2 bindings.

What I found is that my lua bindings seem to have slightly higher
overhead compared to pyre2's bindings at least when performing pattern
match and count, eg. using 'RE2::FindAndConsume'.

For example, when benchmarking with 1 iteration:

         Time (seconds)
C++      ~1.401s
pyre2    ~1.424s
lua-re2  ~1.51s

with 10 iterations:

          Time (seconds)
C++       ~14.061s
pyre2     ~14.2720s
lua-re2   ~15.2618s


What I'm trying to figure out is why lua-re2 is being slow here. Here's
the minimal relevant code snippets I used to test:


   -- regexdna_re2.lua
   re2 = require 'lua-re2'
   function printelapse(start)
     local elapse = os.clock() - start
     print(elapse..'s')
   end
   -- omitted code
   -- 'variants' is a table of regex dna patterns to match
   -- 'seq' is a *really* long dna sequence string loaded from stdin
   local start = os.clock()
   for i = 1, 10 do|
     for _, p in ipairs(variants) do|
       io.write( string.format('%s %d\n', p, re2(p):countmatch(seq)) )
     end
   end
   printelapse(start)


   // lua-re2.cpp
   // a temporary function in my binding code used in the benchmark above
   static int re2_countmatch(lua_State *L)
   {
     lua_settop(L, 2);
     RE2 *re2obj = luaRE2_checkobject(L, 1);
     StringPiece subject( luaL_checkstring(L, 2) );

     int c = 0;
     while(RE2::FindAndConsume(&subject, *re2obj))
       ++c;

     lua_pushinteger(L, c);
     return 1;
   }

------------------------------------------------------------------------

   # regexdna.py
   from time import time
   from sys import stdin, stdout
   from re2 import sub, findall

   def printelapse(start):
     elapse = time() - start
     print(str(elapse) + 's')

   # omitted code
   start = time()
   for i in xrange(10):
     for f in variants:
       write(f.decode("utf-8") + ' ' + str(len(findall(f.decode("utf-8"), seq))) + '\n')
   printelapse(start)

------------------------------------------------------------------------

   // regexdna.cpp
   #include "re2/re2.h"
   #include <iostream>
   #include <memory>
   #include <stdio.h>
   #include <time.h>

   int main(void)
   {
     // lots of omitted code
     clock_t start = clock();
     for(int it = 0; it < 10; ++it)
     {
       for (int i = 0; i < (int)(sizeof(pattern1) / sizeof(string)); ++i)
       {
         int count = 0;
         RE2 pat(pattern1[i]);
         StringPiece piece = str;

         while (RE2::FindAndConsume(&piece, pat)) ++count;

         printf("%s %d\n", pattern1[i].c_str(), count);
       }
     }
     double elapse = (clock() - start)*1.0 / CLOCKS_PER_SEC;
     cout << elapse << "s\n";
   }


After a lot of experimenting, it seems the slowdown is happening when I
call 'luaL_checkstring(L, 2)' on the long dna 'seq' passed in from lua
code. For whatever reason, it seems to take a 10^th of a second to
process on each iteration and it really becomes visible when done for 10
iterations.

For instance, if I modify my 're2_countmatch' slightly to use a static
string for testing:

   static int re2_countmatch(lua_State *L)
   {
     lua_settop(L, 2);
     RE2 *re2obj = luaRE2_checkobject(L, 1);
     static const string overhead_test = luaL_checkstring(L, 2);
     StringPiece subject( overhead_test );

     int c = 0;
     while(RE2::FindAndConsume(&subject, *re2obj))
       ++c;

     lua_pushinteger(L, c);
     return 1;
   }


This will only call 'luaL_checkstring(L, 2)' once to initialize
`overhead_test` on the very first call of `re2_countmatch`. Obviously I
can't do this in a real implementation since the 'haystack' string can
be anything. But running the benchmark again with all other parameters
being equal, those same 10 iterations now complete in 14.1778 seconds.

So does anyone have any ideas on how to lower the overhead for the lua
bindings so that they're comparable with pyre2?

Testing setup and tools used:

   * Mingw-gcc 4.6.3
   * LuaJIT 2.02
   * Python 2.7.3
   * RE2 compiled from head using mingw above
   * Windows 7 64-bit on Intel Q6600 quad-core.

Thank you for reading and sorry for the long post.



Reply | Threaded
Open this post in threaded view
|

Re: Looking for ideas to lower binding overhead for lua-re2. in plain-text)

William Ahern
On Fri, Aug 23, 2013 at 06:12:59AM -0700, [hidden email] wrote:
> I'm resubmitting my original post in plain-text this time. Hopefully it'll
> show up well enough and the mailing-list doesn't ruin it too badly:
>
> I've been working on some bindings for lua to google's RE2 regex library
> and I got most of the important functions working. At the moment, I'm
> working on optimizing and reducing the overhead for the bindings.
>
<snip>
>   // lua-re2.cpp
>   // a temporary function in my binding code used in the benchmark above
>   static int re2_countmatch(lua_State *L)
>   {
>     lua_settop(L, 2);
>     RE2 *re2obj = luaRE2_checkobject(L, 1);
>     StringPiece subject( luaL_checkstring(L, 2) );
>

You should be using luaL_checklstring() here. You can pass a pointer _and_
the length to the StringPiece constructor:

        size_t n;
        const char *p = luaL_checklstring(L, 2, &n);
        StringPiece subject(p, (int)n);

Here's a snippet from the RE2 bindings I've written. These routines
implement the gmatch iterator:


static int luare2_nextmatch(lua_State *L) {
        RE2* pat = luare2_checkre2(L, lua_upvalueindex(1))->pattern;
        const char *txt;
        size_t len, pos;
        luare2_captures captures(pat->NumberOfCapturingGroups());
        bool match;

        txt = luaL_checklstring(L, lua_upvalueindex(2), &len);
        pos = luaL_checkinteger(L, lua_upvalueindex(3));

        if (pos >= len)
                return 0;

        StringPiece next(&txt[pos], (int)(len - pos));

        match = RE2::FindAndConsumeN(&next, *pat, captures, captures.count());

        if (!match)
                return 0; /* must return nil to terminate for-loop */

        pos = next.data() - txt;
        lua_pushinteger(L, pos);
        lua_replace(L, lua_upvalueindex(3));

        if (!captures.count()) {
                lua_pushboolean(L, true);

                return 1;
        } else
                return captures.push(L);
} /* luare2_nextmatch() */


static int luare2_gmatch(lua_State *L) {
        luare2_checkre2(L, 1);
        luaL_checkstring(L, 2);

        lua_settop(L, 2);

        lua_pushinteger(L, 0); /* initial offset into the string */

        lua_pushcclosure(L, luare2_nextmatch, 3);

        return 1;
} /* luare2_gmatch() */