Using Lua's string interning for faster compares?

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Lua's string interning for faster compares?

Rena
When I write modules to wrap C++ objects, they usually have a __index
method that does a lot of string comparing, like:
if(!strcmp(key, "foo")) lua_pushcfunction(L, obj_method_foo);
else if(!strcmp(key, "bar")) lua_pushcfunction(L, obj_method_bar);
etc... I'm not sure if compilers can optimize this, but doing all
those strcmp()s every time a method/field is looked up seems terribly
inefficient.

Of course there are various ways to improve on this design (return a
table containing the fields instead of using __index, redirecting the
lookup to a table, etc), but they do have various drawbacks. This
method is nice and simple - just not very fast.

I find myself wondering, why is my module doing dumb string compares
all the time, when Lua fixes this using string interning and hashing?
It could be much more efficient if I could just push the names of my
methods into a table in the registry or some such, and use a method
like lua_tohash() to get their hashes. The series of strcmp()s could
then be changed to a series of much faster integer compares.

--
Sent from my toaster.

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Josh Simmons
On Tue, Jan 17, 2012 at 6:12 PM, HyperHacker <[hidden email]> wrote:

> When I write modules to wrap C++ objects, they usually have a __index
> method that does a lot of string comparing, like:
> if(!strcmp(key, "foo")) lua_pushcfunction(L, obj_method_foo);
> else if(!strcmp(key, "bar")) lua_pushcfunction(L, obj_method_bar);
> etc... I'm not sure if compilers can optimize this, but doing all
> those strcmp()s every time a method/field is looked up seems terribly
> inefficient.
>
> Of course there are various ways to improve on this design (return a
> table containing the fields instead of using __index, redirecting the
> lookup to a table, etc), but they do have various drawbacks. This
> method is nice and simple - just not very fast.
>
> I find myself wondering, why is my module doing dumb string compares
> all the time, when Lua fixes this using string interning and hashing?
> It could be much more efficient if I could just push the names of my
> methods into a table in the registry or some such, and use a method
> like lua_tohash() to get their hashes. The series of strcmp()s could
> then be changed to a series of much faster integer compares.
>
> --
> Sent from my toaster.
>

This wouldn't work exactly as you describe, since you could have
strings hashing to the same bucket.

However, you could call something like const char *lua_intern(L, const
char *str); which would intern your string and return the internal
pointer which you could then use for comparison.

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Luiz Henrique de Figueiredo
> However, you could call something like const char *lua_intern(L, const
> char *str); which would intern your string and return the internal
> pointer which you could then use for comparison.

lua_tostring and lua_tolstring do exactly that.

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Josh Simmons
On Tue, Jan 17, 2012 at 9:01 PM, Luiz Henrique de Figueiredo
<[hidden email]> wrote:
>> However, you could call something like const char *lua_intern(L, const
>> char *str); which would intern your string and return the internal
>> pointer which you could then use for comparison.
>
> lua_tostring and lua_tolstring do exactly that.
>

Wow that's kinda mindblowing, not quite sure how I never noticed the
return value.

Cool.

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Rena
In reply to this post by Luiz Henrique de Figueiredo
On Tue, Jan 17, 2012 at 03:01, Luiz Henrique de Figueiredo
<[hidden email]> wrote:
>> However, you could call something like const char *lua_intern(L, const
>> char *str); which would intern your string and return the internal
>> pointer which you could then use for comparison.
>
> lua_tostring and lua_tolstring do exactly that.
>

So we can rely on this behaviour and safely compare the pointer
instead of the string itself?

--
Sent from my toaster.

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Luiz Henrique de Figueiredo
> So we can rely on this behaviour and safely compare the pointer
> instead of the string itself?

Yes, but you have to make sure that the strings still exist in Lua,
that is, are not collected. One way to do that is to use them as keys
or values in a table.

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Rob Kendrick-2
On Tue, Jan 17, 2012 at 08:32:59AM -0200, Luiz Henrique de Figueiredo wrote:
> > So we can rely on this behaviour and safely compare the pointer
> > instead of the string itself?
>
> Yes, but you have to make sure that the strings still exist in Lua,
> that is, are not collected. One way to do that is to use them as keys
> or values in a table.

Would it also be safe to simply not do any allocation, and thus not
trigger automatic collection?

B.

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Rena
On Tue, Jan 17, 2012 at 04:32, Rob Kendrick <[hidden email]> wrote:

> On Tue, Jan 17, 2012 at 08:32:59AM -0200, Luiz Henrique de Figueiredo wrote:
>> > So we can rely on this behaviour and safely compare the pointer
>> > instead of the string itself?
>>
>> Yes, but you have to make sure that the strings still exist in Lua,
>> that is, are not collected. One way to do that is to use them as keys
>> or values in a table.
>
> Would it also be safe to simply not do any allocation, and thus not
> trigger automatic collection?
>
> B.
>

...and don't allow any loaded scripts or modules to trigger manual collection.

--
Sent from my toaster.

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Peter Cawley
In reply to this post by Rena
On Tue, Jan 17, 2012 at 7:12 AM, HyperHacker <[hidden email]> wrote:

> When I write modules to wrap C++ objects, they usually have a __index
> method that does a lot of string comparing, like:
> if(!strcmp(key, "foo")) lua_pushcfunction(L, obj_method_foo);
> else if(!strcmp(key, "bar")) lua_pushcfunction(L, obj_method_bar);
> etc... I'm not sure if compilers can optimize this, but doing all
> those strcmp()s every time a method/field is looked up seems terribly
> inefficient.
>
> Of course there are various ways to improve on this design (return a
> table containing the fields instead of using __index, redirecting the
> lookup to a table, etc), but they do have various drawbacks. This
> method is nice and simple - just not very fast.
>
> I find myself wondering, why is my module doing dumb string compares
> all the time, when Lua fixes this using string interning and hashing?
> It could be much more efficient if I could just push the names of my
> methods into a table in the registry or some such, and use a method
> like lua_tohash() to get their hashes. The series of strcmp()s could
> then be changed to a series of much faster integer compares.
>
> --
> Sent from my toaster.
>

In one of my current projects, I have a similar need to frequently
switch over a Lua string. I opted for a solution of hash-based
dispatch combined with code generation, which I don't think I've seen
proposed on lua-l recently. I've written up a description of the
approach as a blog post:

http://www.corsix.org/content/switch-lua-strings

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Gé Weijers

On Tue, Jan 17, 2012 at 4:04 AM, Peter Cawley <[hidden email]> wrote:

In one of my current projects, I have a similar need to frequently
switch over a Lua string. I opted for a solution of hash-based
dispatch combined with code generation, which I don't think I've seen
proposed on lua-l recently. I've written up a description of the
approach as a blog post:
 
The assumption in your design is that the hash function will not change. It's likely that future Lua versions will have some kind of randomized hash function to prevent some types of denial-of-service attacks that were discussed here recently, and then this scheme no longer works. The really good part of the design is that you use a code generation tool, so you can easily change how things work without rewriting half you code base.



Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Roberto Ierusalimschy
In reply to this post by Luiz Henrique de Figueiredo
> > So we can rely on this behaviour and safely compare the pointer
> > instead of the string itself?
>
> Yes, but you have to make sure that the strings still exist in Lua,
> that is, are not collected. One way to do that is to use them as keys
> or values in a table.

To be 'religious' about the specification, you should not use a string
pointer unless the string is anchored in the stack. But I guess it is
quite safe to do the above in current implementations.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Rena
On Tue, Jan 17, 2012 at 17:42, Roberto Ierusalimschy
<[hidden email]> wrote:

>> > So we can rely on this behaviour and safely compare the pointer
>> > instead of the string itself?
>>
>> Yes, but you have to make sure that the strings still exist in Lua,
>> that is, are not collected. One way to do that is to use them as keys
>> or values in a table.
>
> To be 'religious' about the specification, you should not use a string
> pointer unless the string is anchored in the stack. But I guess it is
> quite safe to do the above in current implementations.
>
> -- Roberto
>

The registry table wouldn't be suitable for this?

--
Sent from my toaster.

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Philipp Janda
In reply to this post by Rena
Hi!

On 17.01.2012 08:12, HyperHacker wrote:
> When I write modules to wrap C++ objects, they usually have a __index
> method that does a lot of string comparing, like:
> if(!strcmp(key, "foo")) lua_pushcfunction(L, obj_method_foo);
> else if(!strcmp(key, "bar")) lua_pushcfunction(L, obj_method_bar);
> etc... I'm not sure if compilers can optimize this, but doing all
> those strcmp()s every time a method/field is looked up seems terribly
> inefficient.
>

Try Ragel[1].

   [1]: http://www.complang.org/ragel/

I've attached the template I use for such occasions.

HTH,
Philipp


str2id.rh (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

William Ahern
On Wed, Jan 18, 2012 at 07:07:10AM +0100, Philipp Janda wrote:

> Hi!
>
> On 17.01.2012 08:12, HyperHacker wrote:
> >When I write modules to wrap C++ objects, they usually have a __index
> >method that does a lot of string comparing, like:
> >if(!strcmp(key, "foo")) lua_pushcfunction(L, obj_method_foo);
> >else if(!strcmp(key, "bar")) lua_pushcfunction(L, obj_method_bar);
> >etc... I'm not sure if compilers can optimize this, but doing all
> >those strcmp()s every time a method/field is looked up seems terribly
> >inefficient.
> >
>
> Try Ragel[1].
>
>   [1]: http://www.complang.org/ragel/
>
> I've attached the template I use for such occasions.

I second that. And in this case use the -G1 or -G2 switch to get really fast
matching. -G1 isn't always feasible for larger machines (table driven
machines are more compact), but perfect for this very common scenario. It
might also remove the need to quiet GCC about unused variables. Though, for
ragel compilation units I usually just turn those off with `-Wno-unused'
because the next ragel version might have changed the symbol names, breaking
the void statement hack.

<snip example machine>

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Peter Odding-3
In reply to this post by Philipp Janda
>> When I write modules to wrap C++ objects, they usually have a __index
>> method that does a lot of string comparing, like:
>> if(!strcmp(key, "foo")) lua_pushcfunction(L, obj_method_foo);
>> else if(!strcmp(key, "bar")) lua_pushcfunction(L, obj_method_bar);
>> etc... I'm not sure if compilers can optimize this, but doing all
>> those strcmp()s every time a method/field is looked up seems terribly
>> inefficient.
>
> Try Ragel[1].
>
> [1]: http://www.complang.org/ragel/
>
> I've attached the template I use for such occasions.

Alternatively, re2c does the same thing: http://re2c.org/ (for simple
use cases like matching a string against one of several predefined
constants).

  - Peter

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Andre Leiradella-2
In reply to this post by Rena
> Try Ragel[1].
>
> [1]: http://www.complang.org/ragel/

> Alternatively, re2c does the same thing: http://re2c.org/ (for simple

Wow, the generated code is so branch-y!

What I do is compute the hash of the key (I usually use djb2) and use that in the comparisons. You can use Lua's internal hash if you don't mind changing Lua's source code and recompiling. Both techniques are discussed here:

http://altdevblogaday.com/2011/08/06/a-little-trick-to-make-faster-lua-bindings/

Cheers,

Andre



Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Jay Carlson
In reply to this post by Peter Cawley
[Obligatory suggestion for Lua core, if people insist on playing with C strings. Like pushlstring: lua_findlstring(L,s,l), which will return false if s would have had to be allocated. If s would have to be allocated it cannot possibly be a key in a raw table. This optimization could be applied to a lua_rawgetfield() without anybody noticing.]

[Additional string functions in Lua could support this as well: I have a long string from the network; give me my command to look up, or give me nil if it's pointless to try to look up in any table. Well, rawget from any table. __index can find anything. I find rawget annoying, and I don't know if I want to encourage it.]

On Jan 17, 2012, at 7:04 AM, Peter Cawley wrote:

> In one of my current projects, I have a similar need to frequently
> switch over a Lua string. I opted for a solution of hash-based
> dispatch combined with code generation, which I don't think I've seen
> proposed on lua-l recently.

I have two reactions:

First, Lua tables seem to work well in most circumstances. I would consider restructuring my code after writing about the second C string switch statement. I come to Lua to escape from C and C strings, and I intend to fully enjoy my vacation.

My second reaction is that, uh, actually I've been in that switch statement situation with Lua tables before. My ancient FLTK binding used tolua++, which produces a lua-side metatable for every class. You can just eat that on desktop platforms, but I was working on an 8M Linux PDA; every app built with the FLTK binding had ~300k of *unsharable* allocated memory by the time it made it to main. A lot of it might not be touched. With swap, that kind of thing is an annoyance; without, private memory gradually squeezes the rest of the system to death. And the payload, the actual working Lua FLTK app, was often like ~100k of private memory. All of that was with custom gc settings to minimize the amount of sbrk.

IMO that's the level of measurement and desperation appropriate for bailing out on tables. I didn't, but I did have a plan.

I had an advantage: most of my tables were already being generated from tolua++ source. I didn't need to scan my own source code to find strings, and there already was a code generation step. What I thought I needed was rommable hash tables, some const structure which could live in the .text/.rodata section and be sharable (and a candidate for page-out) across all the apps loading the luafltk shared object.

My plan was to create my own user type with an __index which could look up methods by name in a (perhaps perfect) const hash table generated by tolua++. This C hash would not be visible to users; a regular writable Lua table would be hit first. At that point I could play around with memoizing the C-side hash results, weak tables etc.

Jay
Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Matthew Wild
On 20 January 2012 02:03, Jay Carlson <[hidden email]> wrote:

> [Obligatory suggestion for Lua core, if people insist on playing with C strings. Like pushlstring: lua_findlstring(L,s,l), which will return false if s would have had to be allocated. If s would have to be allocated it cannot possibly be a key in a raw table. This optimization could be applied to a lua_rawgetfield() without anybody noticing.]
>
> [Additional string functions in Lua could support this as well: I have a long string from the network; give me my command to look up, or give me nil if it's pointless to try to look up in any table. Well, rawget from any table. __index can find anything. I find rawget annoying, and I don't know if I want to encourage it.]
>
> On Jan 17, 2012, at 7:04 AM, Peter Cawley wrote:
>
>> In one of my current projects, I have a similar need to frequently
>> switch over a Lua string. I opted for a solution of hash-based
>> dispatch combined with code generation, which I don't think I've seen
>> proposed on lua-l recently.
>
> I have two reactions:
>
> First, Lua tables seem to work well in most circumstances. I would consider restructuring my code after writing about the second C string switch statement. I come to Lua to escape from C and C strings, and I intend to fully enjoy my vacation.
>
> My second reaction is that, uh, actually I've been in that switch statement situation with Lua tables before. My ancient FLTK binding used tolua++, which produces a lua-side metatable for every class. You can just eat that on desktop platforms, but I was working on an 8M Linux PDA; every app built with the FLTK binding had ~300k of *unsharable* allocated memory by the time it made it to main. A lot of it might not be touched. With swap, that kind of thing is an annoyance; without, private memory gradually squeezes the rest of the system to death. And the payload, the actual working Lua FLTK app, was often like ~100k of private memory. All of that was with custom gc settings to minimize the amount of sbrk.
>
> IMO that's the level of measurement and desperation appropriate for bailing out on tables. I didn't, but I did have a plan.
>
> I had an advantage: most of my tables were already being generated from tolua++ source. I didn't need to scan my own source code to find strings, and there already was a code generation step. What I thought I needed was rommable hash tables, some const structure which could live in the .text/.rodata section and be sharable (and a candidate for page-out) across all the apps loading the luafltk shared object.
>
> My plan was to create my own user type with an __index which could look up methods by name in a (perhaps perfect) const hash table generated by tolua++. This C hash would not be visible to users; a regular writable Lua table would be hit first. At that point I could play around with memoizing the C-side hash results, weak tables etc.

Similar idea to eLua's rotables:
http://www.eluaproject.net/doc/v0.8/en_arch_ltr.html

Regards,
Matthew

PS. Congratulations to the eLua team on the new website, first time
I've seen it... looks good, comprehensive, and I still managed to find
that page in just a few clicks :)

Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

dcharno
In reply to this post by Philipp Janda
On 01/18/2012 01:07 AM, Philipp Janda wrote:

> Hi!
>
> On 17.01.2012 08:12, HyperHacker wrote:
>> When I write modules to wrap C++ objects, they usually have a __index
>> method that does a lot of string comparing, like:
>> if(!strcmp(key, "foo")) lua_pushcfunction(L, obj_method_foo);
>> else if(!strcmp(key, "bar")) lua_pushcfunction(L, obj_method_bar);
>> etc... I'm not sure if compilers can optimize this, but doing all
>> those strcmp()s every time a method/field is looked up seems terribly
>> inefficient.
>>
>
> Try Ragel[1].
>
> [1]: http://www.complang.org/ragel/
>
> I've attached the template I use for such occasions.

What about gperf?

http://www.gnu.org/software/gperf/


%includes
struct str2id_s { const char *name; int id; };
%%
"foo",    0
"bar",    1
"baz",    2
"foobar", 3
"hello",  4
"bye",    5
%%

#include <stdio.h>

int str2id(const char *s)
{
        struct str2id_s *p = in_word_set(s, strlen(s));
        return p ? p->id : -1;
}

int main(int argc, char *argv[])
{
        printf( "%d \n", str2id("foo") );
        printf( "%d \n", str2id("bar") );
        printf( "%d \n", str2id("baz") );
}



Reply | Threaded
Open this post in threaded view
|

Re: Using Lua's string interning for faster compares?

Jay Carlson
In reply to this post by Matthew Wild
On Fri, Jan 20, 2012 at 3:41 AM, Matthew Wild <[hidden email]> wrote:
> On 20 January 2012 02:03, Jay Carlson <[hidden email]> wrote:

>> I had an advantage: most of my tables were already being generated from tolua++ source. I didn't need to scan my own source code to find strings, and there already was a code generation step. What I thought I needed was rommable hash tables, some const structure which could live in the .text/.rodata section and be sharable (and a candidate for page-out) across all the apps loading the luafltk shared object.
>>
>> My plan was to create my own user type with an __index which could look up methods by name in a (perhaps perfect) const hash table generated by tolua++. This C hash would not be visible to users; a regular writable Lua table would be hit first. At that point I could play around with memoizing the C-side hash results, weak tables etc.
>
> Similar idea to eLua's rotables:
> http://www.eluaproject.net/doc/v0.8/en_arch_ltr.html

Similar problem, similar solution. Hypothetically. eLua has the
distinct advantage of, you know, actually existing. I'd prefer the
weirdo tables not to be visible to users, only as fallbacks from
regular tables, but that's trivial. I don't know what can be done at
compile-time these days but I bet a non-C pass would help.

I like it when other people solve the problems I thought few others
had. I'll look at eLua now...

Jay