Question about accessing Lua tables and arrays faster

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Question about accessing Lua tables and arrays faster

Leo Romanoff
Hi,

Here comes a third part of my questions about using Lua as a target language/runtime for other dynamic and static langauges.

This time I'd like to discuss the peformance of accessing Lua tables from Lua when tables are used as maps or as arrays.

Rationale:
- Many static languages with a type system allow for definition of structural types (e.g. structs in C, classes in Java, etc). Access to the fields of such types is usually very cheap in those languages. If one maps such structural types to Lua tables (and this is the most obvious solution, I guess), then sematically everything is fine. But now every access to a field results in a table lookup, which introduces some overhead. It would be nice to have a faster way of accessing fields of tables.

- Many high-performance applications work intensively with arrays. Currently, Lua uses the same datatype, namely tables, for both arrays and maps. Accessing arrays from Lua is not extremely fast currently. (Lua interpreter does not use even those lua_rawseti/lua_rawgeti functions, IIRC. Only C libs for Lua can do it). It would be nice to have a faster way of operating on arrays in Lua.

BTW, I did some experiments where I create an array of 1000000 elements and then perform a 1000 iterations over it, touching each element and assigning a new value to it or incrementing its current value. I implemented the same algorithm in pure Lua, as a C library for Lua which uses lua_rawseti/lua_rawgeti, and as a C program. The outcome is that pure Lua solution is the slowest, the C library for Lua is roughly 1.5 times faster and C version is about 14.5 times faster than pure Lua solution. 



Questions about arrays:
- Would it make sense to have an additional dedicated type for arrays as it is usually the case in most programming languages? Would it make things any faster, e.g. because it could use a more efficient internal representation? Or would it still be rather slow, because the problem is not the internal representation of arrays but Lua VM itself?

- Are there any well-known tricks besides what I used above in my experiments to process Lua arrays faster? And I don't mean here a purely FFI-based solutions, which most likely could be almost as efficient as C version. I mean Lua tricks.

Questions about tables used as maps:
- The interesting thing about structs/classes in static programming languages is that their structure is known in advance after their definition and it never changes. In Lua terms it would mean that a corresponding table has a predefined set of keys and this set of keys never changes. This is a very useful information which could be used for optimizations, I'd say.
So, my question is: could Lua make use of this fact to make access to such tables faster, e.g. by eliminating key lookups and replacing them by a more efficient indexed access (i.e. we know that field X is at offset 0, field Y at offset 4, etc) similar to what C compiler does or something like that? I think some JavaSctipt JITs and probably LuaJIT do something like this. But could Lua VM do a similar thing when it determines that a structure of the table never changes after a certain point in time?


Thanks,
  Leo

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Tim Hill

On Jul 25, 2013, at 8:00 AM, Leo Romanoff <[hidden email]> wrote:


Rationale:
- Many static languages with a type system allow for definition of structural types (e.g. structs in C, classes in Java, etc). Access to the fields of such types is usually very cheap in those languages. If one maps such structural types to Lua tables (and this is the most obvious solution, I guess), then sematically everything is fine. But now every access to a field results in a table lookup, which introduces some overhead. It would be nice to have a faster way of accessing fields of tables.

- Many high-performance applications work intensively with arrays. Currently, Lua uses the same datatype, namely tables, for both arrays and maps. Accessing arrays from Lua is not extremely fast currently. (Lua interpreter does not use even those lua_rawseti/lua_rawgeti functions, IIRC. Only C libs for Lua can do it). It would be nice to have a faster way of operating on arrays in Lua.

BTW, I did some experiments where I create an array of 1000000 elements and then perform a 1000 iterations over it, touching each element and assigning a new value to it or incrementing its current value. I implemented the same algorithm in pure Lua, as a C library for Lua which uses lua_rawseti/lua_rawgeti, and as a C program. The outcome is that pure Lua solution is the slowest, the C library for Lua is roughly 1.5 times faster and C version is about 14.5 times faster than pure Lua solution. 

I think this is rather missing the point of Lua. As a scripting language, Lua has a different balance between performance and convenience than languages such as C. Pretty much anything you do in C will be faster than Lua (but will need many more lines of code), and if you need the performance of C, well, er .. then do it in C!

Things like duck typing, garbage collection, and generic tables are (mostly) aimed at denser code where the runtime does more work on behalf of the developer, and this extra work will slow the program somewhat. To my mind the beauty of Lua is that these features make the development of complex logic far far easier than in languages that focus on raw compiled execution speed, with a rather minimal loss in performance.

No-one is going to write a production video codec in Lua, it's just not designed for that. However, they MIGHT wrap such a codec (written in C) into a library that can be called from Lua, thus letting C do the heavy lifting but Lua make all the business decisions around the use of the codec.

Have you looked at LuaJIT?

--Tim

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

steve donovan
In reply to this post by Leo Romanoff
On Thu, Jul 25, 2013 at 5:00 PM, Leo Romanoff <[hidden email]> wrote:
replacing them by a more efficient indexed access (i.e. we know that field X is at offset 0, field Y at offset 4, etc) similar to what C compiler does or something like that? I think some JavaSctipt JITs and probably LuaJIT do something like this. But could Lua VM do a similar thing when it determines that a structure of the table never changes after a certain point in time?

That's exactly what LuaJIT can do. LuaJIT is the 'optimizing compiler' for Lua 5.1 and will find any opportunity to generate straightforward machine code.

But for Lua itself, it would complicate matters without changing the semantics.  Not a win, I think.

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Steve Litt
In reply to this post by Leo Romanoff
On Thu, 25 Jul 2013 08:00:00 -0700 (PDT)
Leo Romanoff <[hidden email]> wrote:


> - Many high-performance applications work intensively with arrays.
> Currently, Lua uses the same datatype, namely tables, for both arrays
> and maps. Accessing arrays from Lua is not extremely fast currently.
> (Lua interpreter does not use even those lua_rawseti/lua_rawgeti
> functions, IIRC. Only C libs for Lua can do it). It would be nice to
> have a faster way of operating on arrays in Lua.
>
> BTW, I did some experiments where I create an array of 1000000
> elements and then perform a 1000 iterations over it, touching each
> element and assigning a new value to it or incrementing its current
> value. I implemented the same algorithm in pure Lua, as a C library
> for Lua which uses lua_rawseti/lua_rawgeti, and as a C program. The
> outcome is that pure Lua solution is the slowest, the C library for
> Lua is roughly 1.5 times faster and C version is about 14.5 times
> faster than pure Lua solution. 

Hi Leo,

Comparing any interpreter to C is certain to yield disappointing
results. A better test would have been to compare pure Lua to Perl
arrays, Python lists, and whatever Ruby uses for arrays (I long since
forgot). If *those* are faster than Lua in the algorithm you mention,
then we have something to talk about.

But I don't think they will be. My understanding is that Lua was
engineered from the ground up so that tables are lightning fast,
whether the keys are strings, consecutive integers, or something else.
Even if the Lua "everything's a table" philosophy yields a small speed
disadvantage over the other "scripting languages", the huge authoring
benefits of a single container type would be worth it, at least to me.

SteveT

Steve Litt                *  http://www.troubleshooters.com/
Troubleshooting Training  *  Human Performance

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Leo Romanoff
In reply to this post by Tim Hill
>________________________________
> Von: Tim Hill <[hidden email]>
>An: Leo Romanoff <[hidden email]>; Lua mailing list <[hidden email]>
>Gesendet: 17:14 Donnerstag, 25.Juli 2013
>Betreff: Re: Question about accessing Lua tables and arrays faster
>
>
>
>
>
>On Jul 25, 2013, at 8:00 AM, Leo Romanoff <[hidden email]> wrote:
>
>
>>Rationale:
>>- Many static languages with a type system allow for definition of structural types (e.g. structs in C, classes in Java, etc). Access to the fields of such types is usually very cheap in those languages. If one maps such structural types to Lua tables (and this is the most obvious solution, I guess), then sematically everything is fine. But now every access to a field results in a table lookup, which introduces some overhead. It would be nice to have a faster way of accessing fields of tables.
>>
>>- Many high-performance applications work intensively with arrays. Currently, Lua uses the same datatype, namely tables, for both arrays and maps. Accessing arrays from Lua is not extremely fast currently. (Lua interpreter does not use even those lua_rawseti/lua_rawgeti functions, IIRC. Only C libs for Lua can do it). It would be nice to have a faster way of operating on arrays in Lua.
>>
>>BTW, I did some experiments where I create an array of 1000000 elements and then perform a 1000 iterations over it, touching each element and assigning a new value to it or incrementing its current value. I implemented the same algorithm in pure Lua, as a C library for Lua which uses lua_rawseti/lua_rawgeti, and as a C program. The outcome is that pure Lua solution is the slowest, the C library for Lua is roughly 1.5 times faster and C version is about 14.5 times faster than pure Lua solution. 
>>
>
>I think this is rather missing the point of Lua. As a scripting language, Lua has a different balance between performance and convenience than languages such as C. Pretty much anything you do in C will be faster than Lua (but will need many more lines of code), and if you need the performance of C, well, er .. then do it in C!
>
>
>Things like duck typing, garbage collection, and generic tables are (mostly) aimed at denser code where the runtime does more work on behalf of the developer, and this extra work will slow the program somewhat. To my mind the beauty of Lua is that these features make the development of complex logic far far easier than in languages that focus on raw compiled execution speed, with a rather minimal loss in performance.
>

Yes, I agree with you on most points. I know very well that typically scripting langauges aim for things you mention and not primarily for ultimate peformance. This is a typical tradeoff. But see my comments below.

>No-one is going to write a production video codec in Lua, it's just not designed for that. However, they MIGHT wrap such a codec (written in C) into a library that can be called from Lua, thus letting C do the heavy lifting but Lua make all the business decisions around the use of the codec.
>
>
>Have you looked at LuaJIT?


Of course I have looked at it. I like Lua as a langauge and Lua VM as its implementation, but LuaJIT is a real gem!!! IMHO, Lua with LuaJIT is very competitive even with native apps compiled from C/C++. To some extent this is a reason why I ask many of my questions.  Let me explain:

- Without LuaJIT, Lua is a nice, small, very elegant and rather efficient scripting language with a very good C integration. But it is not so exceptionally spectacular compared to other scripting languages. After all, many of them can do roughly the same, but may be not so elegantly. Some of those languages are slower, more heavy-weight, some do not have such a good integration with C. But most would have tables, functions, closures, duck typing, garbage collection, etc.

- Now, Lua with LuaJIT can be a game changer! Its performance is exceptional!!! It beats almost any other scripting language and sometimes even natively compiled progams and JVMs due to its tracing JIT technology and all that while preserving all the advantages of Lua that we all know and which you mentioned above.  Since it has such a good performance, it is very tempting to start thinking how to extend the reach of Lua+LuaJIT beyond a typical domain of scripting languages. One can start thinking of putting more of the overall code and business logic in Lua+LuaJIT. You can express your logic much easier, you have more freedom and you don't loose your performance. It is a win/win situation. LuaJIT is the "unique selling point", but if you "buy" LuaJIT, you also "buy" Lua. So, this is good for the language as it increases its adoption.

- But when going more and more into "native" performance domain, you notice that some optimizations optimizations are missing or impossible due to Lua semantics. Since LuaJIT tries to be as compliant as possible to Lua specification, it cannot perform certain kind of optimizations, I guess, because it would break Lua's semantics. Moreover, LuaJIT probably does not want to diverge too far away from Lua, even if it could. Because it would essentially result in a new incompatible language or a new incompatible set of libraries. Therefore, IMHO, any changes that affect the language (data types, public interfaces for C integration, public interfaces for GC integration (should they exist), etc) should be agreed between Lua and LuaJIT. Any changes in these parts should happen on both sides to keep Lua uniform. 

My questions and views on LuaJIT just scratch the surface, I guess. I would be interested to hear an opinion from Mike Pall on this. Are there any features in Lua which make it harder for LuaJIT to get the best performance? Are there any missing features that would allow LuaJIT to become even more efficient?  If there are such things, it would be interesting to hear about them. In particular, if there are some required extensions to Lua that do not hurt Lua/LuaVM, but would make LuaJIT much more efficient, I'd really consider adding them to Lua.  

Somehow, this discussion reminds me of Javascript/asm.js and similar experiements. Turning things the otherway around. Compiling native apps to dynamic languages or their subsets and yet running them with almost native speed. Running dynamic languages with almost native speed. Why not?

-Leo

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Coda Highland
On Thu, Jul 25, 2013 at 8:50 AM, Leo Romanoff <[hidden email]> wrote:
> - But when going more and more into "native" performance domain, you notice that some optimizations optimizations are missing or impossible due to Lua semantics. Since LuaJIT tries to be as compliant as possible to Lua specification, it cannot perform certain kind of optimizations, I guess, because it would break Lua's semantics. Moreover, LuaJIT probably does not want to diverge too far away from Lua, even if it could. Because it would essentially result in a new incompatible language or a new incompatible set of libraries. Therefore, IMHO, any changes that affect the language (data types, public interfaces for C integration, public interfaces for GC integration (should they exist), etc) should be agreed between Lua and LuaJIT. Any changes in these parts should happen on both sides to keep Lua uniform.

LuaJIT already does this, via its FFI -- a language extension Lua
doesn't have (unless you count the LuaFFI module, which is good for
compatibility but not so much for speed). One typical LuaJIT
optimization trick is to define an FFI struct with statically-typed
fields even if you're not going to ever interface it with C code,
because it gives you the predefined offsets and data types you were
thinking about, which improves performance.

Code written this way, however, doesn't work in Lua anymore.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Leo Romanoff




----- Ursprüngliche Message -----

> Von: Coda Highland <[hidden email]>
> An: Leo Romanoff <[hidden email]>; Lua mailing list <[hidden email]>
> CC:
> Gesendet: 19:31 Donnerstag, 25.Juli 2013
> Betreff: Re: Question about accessing Lua tables and arrays faster
>
> On Thu, Jul 25, 2013 at 8:50 AM, Leo Romanoff <[hidden email]> wrote:
>>  - But when going more and more into "native" performance domain,
> you notice that some optimizations optimizations are missing or impossible due
> to Lua semantics. Since LuaJIT tries to be as compliant as possible to Lua
> specification, it cannot perform certain kind of optimizations, I guess, because
> it would break Lua's semantics. Moreover, LuaJIT probably does not want to
> diverge too far away from Lua, even if it could. Because it would essentially
> result in a new incompatible language or a new incompatible set of libraries.
> Therefore, IMHO, any changes that affect the language (data types, public
> interfaces for C integration, public interfaces for GC integration (should they
> exist), etc) should be agreed between Lua and LuaJIT. Any changes in these parts
> should happen on both sides to keep Lua uniform.
>
> LuaJIT already does this, via its FFI -- a language extension Lua
> doesn't have (unless you count the LuaFFI module, which is good for
> compatibility but not so much for speed). One typical LuaJIT
> optimization trick is to define an FFI struct with statically-typed
> fields even if you're not going to ever interface it with C code,
> because it gives you the predefined offsets and data types you were
> thinking about, which improves performance.

Yes. I know.

> Code written this way, however, doesn't work in Lua anymore.

Exactly! It doesn't work in Lua anymore, which is to be avoided whenever possible. And this was my point about the importance of alignment between Lua and LuaJIT. I was wondering if there are  certain things  that could be done/accepted/aligned on both sides (e.g. some kind of annotations, which can be ignored by Lua VM currently, but may help LuaJIT; or may be something else, which is allowed by both Lua and LuaJIT, but where only LuaJIT could currently benefit from it, and so on), so that it is a valid Lua code running on any Lua implementation, but performing particularly well under LuaJIT. It would be nice to write as little LuaJIT specific code as possible, but yet be able to benefit from LuaJIT as much as possible.

-Leo


Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Coda Highland
On Thu, Jul 25, 2013 at 11:31 AM, Leo Romanoff <[hidden email]> wrote:
> Exactly! It doesn't work in Lua anymore, which is to be avoided whenever possible.

And this is where we disagree. Why should it be avoided?

The languages have already diverged. LuaJIT and Lua 5.2 have both
added features that weren't in their common ancestor, Lua 5.1. It's
certainly possible (in fact, not even difficult) to write code that
supports all three targets; most Lua 5.1 code will run unmodified on
both LuaJIT and Lua 5.2.

However, if you're to the point where you're looking for ways to go
past the implementation details of the VM for performance reasons,
you've already given up on compatibility. Supposing the code were to
FUNCTION on Lua 5.2 (for example, if you're not using setfenv and you
don't depend on nil/NULL equivalence and you're okay with loading
LuaFFI), the performance characteristics would differ.

There's nothing here that says that incompatibility is avoided
whenever possible. Keeping the language CORE compatible is valuable,
but there's nothing practical about taking a dogmatic approach to
avoiding extensions.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Leo Romanoff
----- Ursprüngliche Message -----

> Von: Coda Highland <[hidden email]>
> An: Leo Romanoff <[hidden email]>
> CC: Lua mailing list <[hidden email]>
> Gesendet: 22:13 Donnerstag, 25.Juli 2013
> Betreff: Re: Question about accessing Lua tables and arrays faster
>
> On Thu, Jul 25, 2013 at 11:31 AM, Leo Romanoff <[hidden email]> wrote:
>>  Exactly! It doesn't work in Lua anymore, which is to be avoided
> whenever possible.
>
> And this is where we disagree. Why should it be avoided?
>
> The languages have already diverged. LuaJIT and Lua 5.2 have both
> added features that weren't in their common ancestor, Lua 5.1. 

IMHO, current differences seem to be pretty small.

> It's
> certainly possible (in fact, not even difficult) to write code that
> supports all three targets; most Lua 5.1 code will run unmodified on
> both LuaJIT and Lua 5.2.
>
> However, if you're to the point where you're looking for ways to go
> past the implementation details of the VM for performance reasons,
> you've already given up on compatibility. Supposing the code were to
> FUNCTION on Lua 5.2 (for example, if you're not using setfenv and you
> don't depend on nil/NULL equivalence and you're okay with loading
> LuaFFI), the performance characteristics would differ.

If you mean that they would be anyway incompatible in terms of performance, I agree.
But I had more a source code compatibility in mind. 
Having a similar performance would be even better, but it's not realistic without significant Lua VM improvements or making LuaJIT much slower :-)

> There's nothing here that says that incompatibility is avoided
> whenever possible. Keeping the language CORE compatible is valuable,
> but there's nothing practical about taking a dogmatic approach to
> avoiding extensions.


It is not about being dogmatic. It is more about avoiding being implementation specific where it could be done in a compatible way.
Take asm.js as an example. It is a subset of JavaScript. It still runs on ANY JavaScript implementation. But some implementations recognize the hints provided in asm.js based source code and execute it very efficiently. 

May be something similar in spirit could be done for Lua? E.g. some sort of optional annotations (as special comments? or special elements in metatables?) indicating types or some other hints, like "the structure of this table will never change", "this variable or this object is read-only", etc. Such indicators/hints would be purely optional and can be ignored by Lua implementations, but they can be also taken into account and used for optimizations, if a given Lua implementation supports it.

But of course, if nothing else works, you start using implementation specific extensions like LuaFFI and so on. 

-Leo 

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Jay Carlson
In reply to this post by Leo Romanoff
On Jul 25, 2013, at 11:00 AM, Leo Romanoff wrote:

> Questions about arrays:
> - Would it make sense to have an additional dedicated type for arrays as it is usually the case in most programming languages? Would it make things any faster, e.g. because it could use a more efficient internal representation? Or would it still be rather slow, because the problem is not the internal representation of arrays but Lua VM itself?

For arrays of general Lua values, I don't see how you can do better than the table implementation, except maybe by pre-sizing the array part.  For arrays of uniform scalar types, this is exercise 29.3 in _Programming in Lua_.

Jay
Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Leo Romanoff
In reply to this post by Steve Litt
----- Ursprüngliche Message -----

> Von: Steve Litt <[hidden email]>
> An: [hidden email]
> CC: Leo Romanoff <[hidden email]>
> Gesendet: 17:20 Donnerstag, 25.Juli 2013
> Betreff: Re: Question about accessing Lua tables and arrays faster
>
> On Thu, 25 Jul 2013 08:00:00 -0700 (PDT)
> Leo Romanoff <[hidden email]> wrote:
>>  - Many high-performance applications work intensively with arrays.
>>  Currently, Lua uses the same datatype, namely tables, for both arrays
>>  and maps. Accessing arrays from Lua is not extremely fast currently.
>>  (Lua interpreter does not use even those lua_rawseti/lua_rawgeti
>>  functions, IIRC. Only C libs for Lua can do it). It would be nice to
>>  have a faster way of operating on arrays in Lua.
>>
>>  BTW, I did some experiments where I create an array of 1000000
>>  elements and then perform a 1000 iterations over it, touching each
>>  element and assigning a new value to it or incrementing its current
>>  value. I implemented the same algorithm in pure Lua, as a C library
>>  for Lua which uses lua_rawseti/lua_rawgeti, and as a C program. The
>>  outcome is that pure Lua solution is the slowest, the C library for
>>  Lua is roughly 1.5 times faster and C version is about 14.5 times
>>  faster than pure Lua solution. 
>
> Hi Leo,
>
> Comparing any interpreter to C is certain to yield disappointing
> results. A better test would have been to compare pure Lua to Perl
> arrays, Python lists, and whatever Ruby uses for arrays (I long since
> forgot). If *those* are faster than Lua in the algorithm you mention,
> then we have something to talk about.

Just out of curiosity I created versions of the same algorithm for Perl and Python as you suggested.

The outcome is: They are slightly faster by may be 5%-10% than Lua for this benchmark.

Then I also implemented a C library for Lua, which implements vectors of scalars (doubles).
Actually, I used the code from PIL (http://www.lua.org/pil/28.1.html). I expected a great speedup.
But here the results were much more interesting:

- Yes, C-based version consumes less memory. This is a win
- But using it from Lua results in a very slow execution! Namely:
1) The fastest results are produced by a simple non OOP version, without metatables. And it is about 5 times slower than pure Lua solution.
2) If I start using metatables, it becomes much, much slower due to usage luaL_checkudata. There were already problems reports about exactly this problem on this mailing list in 2012. There is also a workaround provided by LHF (http://lua-users.org/lists/lua-l/2011-12/msg00039.html). But even with this optimization it is almost 1.5 - 2 times slower than the version without metatables
3) And the version where __index and __newindex are overloaded in the metatable to use C-functions from the library is the slowest.

I was actually quite surprised that using a C library version turned out to be so slow. But this is most likely due to the fact that the overhead due to invocation of C functions is too high compared to the actual action performed (i.e. setting/getting a single element). 

But if C-based implementations are so slow when used in a "per element" way from Lua, then the only remaining option to get really fast arrays is probably to have a C library that performs iteration over elements on the C side. I.e. something like a "map" function. But here I expect another problem: if the argument function of "map" (i.e. a function to be applied to each element) is a Lua function, then the overhead of invoking it from C will probably also kill performance. And if both iteration and element transformation is implemented in C, then it probably gets faster, but you loose the possibility to use Lua for specifying what should be done with each element and hard-code it in your C library instead, which is not so good IMHO.

-Leo


Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Steve Litt
On Fri, 26 Jul 2013 09:29:42 -0700 (PDT)
Leo Romanoff <[hidden email]> wrote:

> ----- Ursprüngliche Message -----
>
> > Von: Steve Litt <[hidden email]>
> > An: [hidden email]
> > CC: Leo Romanoff <[hidden email]>
> > Gesendet: 17:20 Donnerstag, 25.Juli 2013
> > Betreff: Re: Question about accessing Lua tables and arrays faster
> >
> > On Thu, 25 Jul 2013 08:00:00 -0700 (PDT)
> > Leo Romanoff <[hidden email]> wrote:
> >>  - Many high-performance applications work intensively with arrays.
> >>  Currently, Lua uses the same datatype, namely tables, for both
> >> arrays and maps. Accessing arrays from Lua is not extremely fast
> >> currently. (Lua interpreter does not use even those
> >> lua_rawseti/lua_rawgeti functions, IIRC. Only C libs for Lua can
> >> do it). It would be nice to have a faster way of operating on
> >> arrays in Lua.
> >>
> >>  BTW, I did some experiments where I create an array of 1000000
> >>  elements and then perform a 1000 iterations over it, touching each
> >>  element and assigning a new value to it or incrementing its
> >> current value. I implemented the same algorithm in pure Lua, as a
> >> C library for Lua which uses lua_rawseti/lua_rawgeti, and as a C
> >> program. The outcome is that pure Lua solution is the slowest, the
> >> C library for Lua is roughly 1.5 times faster and C version is
> >> about 14.5 times faster than pure Lua solution. 
> >
> > Hi Leo,
> >
> > Comparing any interpreter to C is certain to yield disappointing
> > results. A better test would have been to compare pure Lua to Perl
> > arrays, Python lists, and whatever Ruby uses for arrays (I long
> > since forgot). If *those* are faster than Lua in the algorithm you
> > mention, then we have something to talk about.
>
> Just out of curiosity I created versions of the same algorithm for
> Perl and Python as you suggested.
>
> The outcome is: They are slightly faster by may be 5%-10% than Lua
> for this benchmark.

Perl doesn't surprise me a bit. Perl has always been a very fast runtime
interpreter. Python's a surprise I didn't expect.

Just to make sure I understand, were your only keys 1 through number of
elements? No 0, no text keys, just 1-whatever?

Also, did you take steps to minimize your metatables? An apples with
apples comparison would be minimal or no metatables. I don't know, is
it possible to set a table's metatable to nil, and if you do so, does
it affect the speed of your 1000x1000000 incrementer algorithm?

Two more things, and the first one is this: I have no doubt that the
others are faster for going over and incrementing each element. But I'm
wondering if, for more ambitious usages of elements, you could put the
element processing in a metatable, and if that would speed things up.
Obviously there's no way to write that algorithm in Perl or Python, so
you can't really A/B test them. But you could A/B test the metatable
processing method against a brute force, process in a subroutine method.

The second thing is this: Thank goodness the difference is only 5 to
10%. There are very few cases where one cares whether it takes 0.11
seconds instead of 0.10, 1.1 seconds instead of 1.0 seconds, or an hour
and six minutes rather than an hour. Add to this the fact that there
are few computer tasks these days where the software is the bottleneck,
at least for more than 5 seconds at a time, and the decision of
interpreters is probably one of which is easiest, not which one had the
runtime speed. After all, if runtime speed were the biggest priority,
Lua, Python and Perl wouldn't even exist.

If you try the "process the element using the metatable" approach,
please let us know how it works.

Thanks

SteveT

Steve Litt                *  http://www.troubleshooters.com/
Troubleshooting Training  *  Human Performance

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Leo Romanoff
----- Ursprüngliche Message -----

> Von: Steve Litt <[hidden email]>
> An: "[hidden email]" <[hidden email]>
> CC:
> Gesendet: 18:58 Freitag, 26.Juli 2013
> Betreff: Re: Question about accessing Lua tables and arrays faster
>
> On Fri, 26 Jul 2013 09:29:42 -0700 (PDT)
> Leo Romanoff <[hidden email]> wrote:
>
>>  ----- Ursprüngliche Message -----
>>
>>  > Von: Steve Litt <[hidden email]>
>>  > An: [hidden email]
>>  > CC: Leo Romanoff <[hidden email]>
>>  > Gesendet: 17:20 Donnerstag, 25.Juli 2013
>>  > Betreff: Re: Question about accessing Lua tables and arrays faster
>>  >
>>  > On Thu, 25 Jul 2013 08:00:00 -0700 (PDT)
>>  > Leo Romanoff <[hidden email]> wrote:
>>  >>  - Many high-performance applications work intensively with
> arrays.
>>  >>  Currently, Lua uses the same datatype, namely tables, for both
>>  >> arrays and maps. Accessing arrays from Lua is not extremely fast
>>  >> currently. (Lua interpreter does not use even those
>>  >> lua_rawseti/lua_rawgeti functions, IIRC. Only C libs for Lua can
>>  >> do it). It would be nice to have a faster way of operating on
>>  >> arrays in Lua.
>>  >>
>>  >>  BTW, I did some experiments where I create an array of 1000000
>>  >>  elements and then perform a 1000 iterations over it, touching
> each
>>  >>  element and assigning a new value to it or incrementing its
>>  >> current value. I implemented the same algorithm in pure Lua, as a
>>  >> C library for Lua which uses lua_rawseti/lua_rawgeti, and as a C
>>  >> program. The outcome is that pure Lua solution is the slowest, the
>>  >> C library for Lua is roughly 1.5 times faster and C version is
>>  >> about 14.5 times faster than pure Lua solution. 
>>  >
>>  > Hi Leo,
>>  >
>>  > Comparing any interpreter to C is certain to yield disappointing
>>  > results. A better test would have been to compare pure Lua to Perl
>>  > arrays, Python lists, and whatever Ruby uses for arrays (I long
>>  > since forgot). If *those* are faster than Lua in the algorithm you
>>  > mention, then we have something to talk about.
>>
>>  Just out of curiosity I created versions of the same algorithm for
>>  Perl and Python as you suggested.
>>
>>  The outcome is: They are slightly faster by may be 5%-10% than Lua
>>  for this benchmark.
>
> Perl doesn't surprise me a bit. Perl has always been a very fast runtime
> interpreter. Python's a surprise I didn't expect.

> Just to make sure I understand, were your only keys 1 through number of
> elements? No 0, no text keys, just 1-whatever?

Yes. The keys are 1 through the number of elements, i.e. 1 .. 1000000

> Also, did you take steps to minimize your metatables? An apples with
> apples comparison would be minimal or no metatables. 

Well I was simply creating a table in a usual way, without explicitly touching metatables.
I.e.  local t = {}. Then I initialize it with 0 before my loops, and then I perform iterations.

Should I do something in addition to that to minimize the overhead of metatables?

>I don't know, is
> it possible to set a table's metatable to nil, and if you do so, does
> it affect the speed of your 1000x1000000 incrementer algorithm?

I don't know if there is a need to do it, IMHO I use simple tables and they don't have any metadata.

> Two more things, and the first one is this: I have no doubt that the
> others are faster for going over and incrementing each element. But I'm
> wondering if, for more ambitious usages of elements, you could put the
> element processing in a metatable, and if that would speed things up.

Can you elaborate a bit? What do you mean by putting element processing in a metatable?
I'm not sure I understand what you mean. Do you mean that __index method should do the actual processing, e.g. increment a value?

> Obviously there's no way to write that algorithm in Perl or Python, so
> you can't really A/B test them. But you could A/B test the metatable
> processing method against a brute force, process in a subroutine method.
>
> The second thing is this: Thank goodness the difference is only 5 to
> 10%. There are very few cases where one cares whether it takes 0.11
> seconds instead of 0.10, 1.1 seconds instead of 1.0 seconds, or an hour
> and six minutes rather than an hour. Add to this the fact that there
> are few computer tasks these days where the software is the bottleneck,
> at least for more than 5 seconds at a time, and the decision of
> interpreters is probably one of which is easiest, not which one had the
> runtime speed. After all, if runtime speed were the biggest priority,
> Lua, Python and Perl wouldn't even exist.

Absolutely. If we speak about purely interpreted code, we don't care about performance that much.
Performance becomes important when you start applying your language to CPU intensive tasks (image processing, number crunching, emulating another system, etc). 

But in any case, having more efficient arrays would not hurt. 

Currently, when I look at the results coming out of profiler, I can see that most of the time is spent by Lua VM accessing tables. My guess is that each statement like "t[i] = t[i] + 1" requires at least two table lookups. In general case it makes sense, but some sort of peephole optimization could help a lot in such cases. This is probably also true for "i=i+1" statements, which are very common in loops. Overall, some sort of a simple CSE (common subexpression elimination) could be a big win, I'd say. But then again, I'm not sure if Lua's VM with its current set of opcodes allows for it at all, because it would probably require internally the ability to reference memory allocated for variables and tables or something like that (e.g. VM could compute the address p of t[i] and then do something like *p = *p +1 or even *p += 1). Well, this was just an idea...

> If you try the "process the element using the metatable" approach,
> please let us know how it works.

I'd be happy to try out and provide feedback once you explain me what "process the element using the metatable" exactly means ;-)

-Leo

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Steve Litt
On Fri, 26 Jul 2013 11:42:35 -0700 (PDT)
Leo Romanoff <[hidden email]> wrote:

> ----- Ursprüngliche Message -----
> > Von: Steve Litt <[hidden email]>
[clip]

> > Two more things, and the first one is this: I have no doubt that the
> > others are faster for going over and incrementing each element. But
> > I'm wondering if, for more ambitious usages of elements, you could
> > put the element processing in a metatable, and if that would speed
> > things up.
>
> Can you elaborate a bit? What do you mean by putting element
> processing in a metatable? I'm not sure I understand what you mean.
> Do you mean that __index method should do the actual processing, e.g.
> increment a value?

Yes. __index() and __newindex()

[clip]
> > If you try the "process the element using the metatable" approach,
> > please let us know how it works.
>
> I'd be happy to try out and provide feedback once you explain me
> what "process the element using the metatable" exactly means ;-)

Hi Leo,

Just what you mentioned earlier: have the metatable's __index() and
__newindex() do some of the actual computation. So if you're
incrementing each position, have the __index increment it every time
you read. Or every time you read if some flag is set.

By the way, there's a rawget(t, i) and a rawset(t, k, v) that skip
__index() and __newindex() respectively, but PIL says those don't gain
you any performance.


>
> -Leo
>


Thanks,

SteveT

Steve Litt                *  http://www.troubleshooters.com/
Troubleshooting Training  *  Human Performance

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Leo Romanoff
----- Ursprüngliche Message -----

> Von: Steve Litt <[hidden email]>
> An: [hidden email]
> CC: Leo Romanoff <[hidden email]>
> Gesendet: 21:18 Freitag, 26.Juli 2013
> Betreff: Re: Question about accessing Lua tables and arrays faster
>
> On Fri, 26 Jul 2013 11:42:35 -0700 (PDT)
> Leo Romanoff <[hidden email]> wrote:
>
>>  ----- Ursprüngliche Message -----
>>  > Von: Steve Litt <[hidden email]>
> [clip]
>>  > Two more things, and the first one is this: I have no doubt that the
>>  > others are faster for going over and incrementing each element. But
>>  > I'm wondering if, for more ambitious usages of elements, you could
>>  > put the element processing in a metatable, and if that would speed
>>  > things up.
>>
>>  Can you elaborate a bit? What do you mean by putting element
>>  processing in a metatable? I'm not sure I understand what you mean.
>>  Do you mean that __index method should do the actual processing, e.g.
>>  increment a value?
>
> Yes. __index() and __newindex()


OK

> [clip]
>>  > If you try the "process the element using the metatable"
> approach,
>>  > please let us know how it works.
>>
>>  I'd be happy to try out and provide feedback once you explain me
>>  what "process the element using the metatable" exactly means ;-)
>
> Hi Leo,
>
> Just what you mentioned earlier: have the metatable's __index() and
> __newindex() do some of the actual computation. So if you're
> incrementing each position, have the __index increment it every time
> you read. Or every time you read if some flag is set.

May be I'm getting tired: I still do not quite follow. OK, I overload __index() and/or __newindex() with my own lua functions.
But what should they increment? They need a value to increment, right? How do they access it? Via the rawget() and rawset()? Or do you mean something else?

May be put some example code here to avoid any further misunderstandings?

> By the way, there's a rawget(t, i) and a rawset(t, k, v) that skip
> __index() and __newindex() respectively, but PIL says those don't gain
> you any performance.


-Leo

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Leo Romanoff
In reply to this post by Steve Litt
----- Ursprüngliche Message -----

> Von: Steve Litt <[hidden email]>
> An: [hidden email]
> CC: Leo Romanoff <[hidden email]>
> Gesendet: 21:18 Freitag, 26.Juli 2013
> Betreff: Re: Question about accessing Lua tables and arrays faster

[clip]

> By the way, there's a rawget(t, i) and a rawset(t, k, v) that skip
> __index() and __newindex() respectively, but PIL says those don't gain
> you any performance.

More results in the meantime:

I have created a version of iterations, where I use 

  rawset(t, i, rawget(t, i) + 1) 
instead of 
  t[i] = t[i] + 1 
to skip any __index() and __newindex() calls completely.


And according to my tests rawget/rawset slow down array operations significantely compared to a simple Lua's t[i] access without metatables. Basically, rawset and rawget a C functions exposed via a standrad Lua C library. Therefore each invocation of these functions introduces the almost the same overhead as in case of my own C-based vector implementation (and obtained timings confirm it). Using simple t[i] = t[i] + 1 seems to be way more efficient, because it happens completely on Lua side and does not require and Lua->C function invocations.


-Leo

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Steven Johnson

More results in the meantime:

I have created a version of iterations, where I use 

  rawset(t, i, rawget(t, i) + 1) 
instead of 
  t[i] = t[i] + 1 
to skip any __index() and __newindex() calls completely.


And according to my tests rawget/rawset slow down array operations significantely compared to a simple Lua's t[i] access without metatables. Basically, rawset and rawget a C functions exposed via a standrad Lua C library. Therefore each invocation of these functions introduces the almost the same overhead as in case of my own C-based vector implementation (and obtained timings confirm it). Using simple t[i] = t[i] + 1 seems to be way more efficient, because it happens completely on Lua side and does not require and Lua->C function invocations.


-Leo


Since you mention "well-known tricks" [1], did you do something like

  local rawget, rawset = rawget, rawset

before the loop? If not, what you perceived about table lookup before will show up in a different way (looking those functions up in the globals or environment, each and every time). For little one-off stuff you might never notice, but over the course of a million loops...

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Elias Barrionovo

On Jul 27, 2013 11:42 PM, "Steven Johnson" <[hidden email]> wrote:
>   local rawget, rawset = rawget, rawset

<random>

For a moment I read

local rawget, rawset = rawset, rawget

Man, that'd be some first class trolling. Akin to Python's

True, False = False, True

</random>

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Leo Romanoff
In reply to this post by Steven Johnson
>________________________________
> Von: Steven Johnson <[hidden email]>
>An: Leo Romanoff <[hidden email]>; Lua mailing list <[hidden email]>
>Gesendet: 4:42 Sonntag, 28.Juli 2013
>Betreff: Re: Question about accessing Lua tables and arrays faster
>
>
>
>
>
>More results in the meantime:
>>
>>I have created a version of iterations, where I use 
>>
>>  rawset(t, i, rawget(t, i) + 1) 
>>instead of 
>>
>>  t[i] = t[i] + 1 
>>to skip any __index() and __newindex() calls completely.
>>
>>
>>And according to my tests rawget/rawset slow down array operations significantely compared to a simple Lua's t[i] access without metatables. Basically, rawset and rawget a C functions exposed via a standrad Lua C library. Therefore each invocation of these functions introduces the almost the same overhead as in case of my own C-based vector implementation (and obtained timings confirm it). Using simple t[i] = t[i] + 1 seems to be way more efficient, because it happens completely on Lua side and does not require and Lua->C function invocations.
>>
>>
>>-Leo
>>
>>
>
>Since you mention "well-known tricks" [1], did you do something like
>
>  local rawget, rawset = rawget, rawset
>
>before the loop? If not, what you perceived about table lookup before will show up in a different way (looking those functions up in the globals or environment, each and every time). 
> For little one-off stuff you might never notice, but over the course of a million loops...


Yes. Of course I did that. This trick is very well known.
But it is so slow due the reasons explained above. rawget and rawset are not implemented directly by bytecodes. They are C functions provided by the built-in library. Therefore each call of these functions results in a C library function invocation including a corresponding overhead for preparing arguments, pushing them on stack, invoking, getting results from a function, etc. And this overhead is actually quite significant compared to a the simple operation performed by those functions. Therefore this overhead completely overshadows everything else.

BTW, I also measured LuaJIT in the meantime. I used the same code as a pure Lua version. No special tricks were applied to the code.
Results are:
1) LuaJIT with -O3 is only 1.8-2 times slower than a pure C version compiled with "gcc -O3". This optimized C version was the fastest.
2) LuaJIT with "-joff -O0" (i.e. no JIT, no optimizations, only interpreter) is only 19 times slower than the optimized pure C version. 

Which means that LuaJIT jitted and interpreted versions take a second and third place overall. Way ahead of the next competitor.

-Leo

Reply | Threaded
Open this post in threaded view
|

Re: Question about accessing Lua tables and arrays faster

Jorge Visca
On 28/07/13 09:05, Leo Romanoff wrote:
> Which means that LuaJIT jitted and interpreted versions take a second
> and third place overall. Way ahead of the next competitor. -Leo

The conclusion I reached stems from this.

You have a very neat language. It lacks nothing and wastes nothing. And
you get great performance, specially in relation to the hassle-freeness
and portability of the deployment.

If you need more performance, because you happen to need some esoteric
stuff (for a scripting language), you do not start spoiling the
language, but try the JIT compiler. You get tons of performance at the
cost of having to test for hardware support, checking compiler logs, and
stuff like that.

And if even then it doesn't have the performance characteristics you
need, perhaps it was the wrong language to start with.

12