LuaJIT performance

classic Classic list List threaded Threaded
52 messages Options
123
Reply | Threaded
Open this post in threaded view
|

LuaJIT performance

Qu0ll

I am in the early stages of deciding on a fast scripting language for a new C++ project and obviously Lua is a candidate.  When I say fast I mean fast so we’d probably be using LuaJIT as opposed to interpreted Lua.

 

Does anyone know a performance comparison of JIT’ed Lua versus something like V8 JavaScript?  Lua has the reputation as the fastest scripting language (in fact that’s how I came across it) but does the JIT compiler used in modern JavaScript implementations like V8 greatly narrow the performance gap?  JavaScript is clearly a much more comprehensive and complex language so I would be surprised if it could be executed faster than Lua but it’s the speed that we really need over language features.

 

Thanks,

 

-JCT

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

David Given
John C. Turnbull wrote:
[...]
> Does anyone know a performance comparison of JIT’ed Lua versus something
> like V8 JavaScript?  Lua has the reputation as the fastest scripting
> language (in fact that’s how I came across it) but does the JIT compiler
> used in modern JavaScript implementations like V8 greatly narrow the
> performance gap?  JavaScript is clearly a much more comprehensive and
> complex language so I would be surprised if it could be executed faster
> than Lua but it’s the speed that we really need over language features.

I did some quick-and-dirty benchmarks as part of Clue. Back then, V8
wasn't really set up for running command-line Javascript applications so
I was never able to integrate it into the benchmark suite (I should
check again), and the benchmarks are astonishingly artificial, but I saw
that while V8 was way faster than any other Javascript interpreter out
there, LuaJIT was still a lot better.

However, I wasn't quite testing like-for-like, so I'm not sure that this
was a meaningful result. I need to go and have another look to see if V8
has a proper command-line driver these days.

--
┌─── dg@cowlark.com ───── http://www.cowlark.com ─────

│ "They laughed at Newton. They laughed at Einstein. Of course, they
│ also laughed at Bozo the Clown." --- Carl Sagan
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Peter Harris-3
In reply to this post by Qu0ll
On Fri, Aug 7, 2009 at 8:34 AM, John C. Turnbull wrote:
>
> Does anyone know a performance comparison of JIT’ed Lua versus something
> like V8 JavaScript?  Lua has the reputation as the fastest scripting
> language (in fact that’s how I came across it) but does the JIT compiler
> used in modern JavaScript implementations like V8 greatly narrow the
> performance gap?

This was posted recently:
http://gmarceau.qc.ca/blog/2009/05/speed-size-and-dependability-of.html

It shows V8 being slightly faster than stock Lua, and noticeably
slower than LuaJIT.

As usual, take other people's benchmarks with a block of salt. The
only benchmark that matters is your own app.

Peter Harris
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Mike Pall-6
In reply to this post by Qu0ll
John C. Turnbull wrote:
> I am in the early stages of deciding on a fast scripting language for a new
> C++ project and obviously Lua is a candidate.  When I say fast I mean fast
> so we'd probably be using LuaJIT as opposed to interpreted Lua.

You should also consider the size of scripting engine you're
embedding and how easy it is to bind to it. Lua and LuaJIT are
more than ten times smaller than V8 and IMHO much easier to embed.

> Does anyone know a performance comparison of JIT'ed Lua versus something
> like V8 JavaScript?

Well, we can find out ... so I fetched today's V8 trunk and ran
some standard benchmarks. Unfortunately the V8 standalone shell is
very limited and is unable to run quite a few of them. And there's
no JavaScript translation for some others. :-(

All ratios are normalized relative to the performance of the
standard Lua interpreter. E.g. 5.0 means something is five times
faster than Lua. Higher numbers are better:

             |  Lua |  V8* |  LJ1 |  LJ2 |  GCC
-------------+------+------+-------------+------
mandelbrot   |  1.0 |  0.9 |  5.8 | 12.6 | 15.4
fasta        |  1.0 |  1.1 |  2.8 |  4.0 | 13.3
partialsums  |  1.0 |  1.4 |  3.8 |  4.2 |  2.2
spectralnorm |  1.0 |  2.9 |  3.1 | 19.8 | 18.5
nbody        |  1.0 |  3.0 |  5.0 | 15.3 | 33.0
nsieve       |  1.0 |  4.1 |  2.2 |  4.7 | 27.3
nsievebits   |  1.0 |  5.7 |  5.2 | 31.6 | 56.0
recursive    |  1.0 |  6.8 |  6.3 |  3.0~| 33.1
fannkuch     |  1.0 |  6.8 |  7.3 | 21.4 | 34.6
binarytrees  |  1.0 |  8.1 |  1.6 |  3.0~| 11.0

All measurements made on a Core2 E8400, comparing single-threaded,
non-hand-vectorized benchmark versions only.

Lua = Lua 5.1.4
V8  = V8 trunk 2009-08-07 IA32 (* sorted by this column)
LJ1 = LuaJIT 1.1.5 -O
LJ2 = LuaJIT 2.0 (unreleased, preliminary numbers, ~ = not JIT compiled (yet))
GCC = GCC 4.3.3 -m32 -O2 -fomit-frame-pointer (or -O3 where it's faster)

Summary: Ok, so V8 is catching up. But LuaJIT 1.x still beats it
on 6 out of 10 benchmarks. V8 is mainly faster on object allocation.
But, surprisingly, V8 is slower for nbody, even though its complex
logic for managing object shapes should make this go really fast.

Not suprisingly, Lua and LuaJIT still have the lead on numeric
benchmarks (unboxed floating point numbers pay off here). And
LuaJIT 2.x will completely change the game (sorry, still no ETA).

But as others have said: please compare the different VMs with
benchmarks that best match *your* performance needs.

--Mike
Reply | Threaded
Open this post in threaded view
|

RE: LuaJIT performance

Qu0ll
Hi Mike,

> You should also consider the size of scripting engine you're
> embedding and how easy it is to bind to it. Lua and LuaJIT are
> more than ten times smaller than V8 and IMHO much easier to embed.

[JCT] Yes that is certainly a consideration.
 

> > Does anyone know a performance comparison of JIT'ed Lua versus
> something
> > like V8 JavaScript?
>
> Well, we can find out ... so I fetched today's V8 trunk and ran
> some standard benchmarks. Unfortunately the V8 standalone shell is
> very limited and is unable to run quite a few of them. And there's
> no JavaScript translation for some others. :-(
>
> All ratios are normalized relative to the performance of the
> standard Lua interpreter. E.g. 5.0 means something is five times
> faster than Lua. Higher numbers are better:
>
>              |  Lua |  V8* |  LJ1 |  LJ2 |  GCC
> -------------+------+------+-------------+------
> mandelbrot   |  1.0 |  0.9 |  5.8 | 12.6 | 15.4
> fasta        |  1.0 |  1.1 |  2.8 |  4.0 | 13.3
> partialsums  |  1.0 |  1.4 |  3.8 |  4.2 |  2.2
> spectralnorm |  1.0 |  2.9 |  3.1 | 19.8 | 18.5
> nbody        |  1.0 |  3.0 |  5.0 | 15.3 | 33.0
> nsieve       |  1.0 |  4.1 |  2.2 |  4.7 | 27.3
> nsievebits   |  1.0 |  5.7 |  5.2 | 31.6 | 56.0
> recursive    |  1.0 |  6.8 |  6.3 |  3.0~| 33.1
> fannkuch     |  1.0 |  6.8 |  7.3 | 21.4 | 34.6
> binarytrees  |  1.0 |  8.1 |  1.6 |  3.0~| 11.0

[JCT] Thanks very much for this very helpful comparison.  I am surprised
that LJ2 actually beats GCC on some tests.

> Not suprisingly, Lua and LuaJIT still have the lead on numeric
> benchmarks (unboxed floating point numbers pay off here). And
> LuaJIT 2.x will completely change the game (sorry, still no ETA).

[JCT] LJ2 looks positively awesome on those numbers.
 
> But as others have said: please compare the different VMs with
> benchmarks that best match *your* performance needs.

[JCT] Yes, this is true and I intend to do that.  I just wanted to get some
ballpark feel for the relative speeds and you have provided that info so
thanks again.

John

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Alex Davies
In reply to this post by Mike Pall-6
Mike Pall wrote:
> Not suprisingly, Lua and LuaJIT still have the lead on numeric
> benchmarks (unboxed floating point numbers pay off here). And
> LuaJIT 2.x will completely change the game (sorry, still no ETA).

Those times for LuaJIT 2.x are simply mindboggling.  A JIT compiler beating
GCC in more then one standardised/non hand-picked tests?  Also made me smile
to hear it's still being worked on, has been quiet for a while now.  Excited
would be an understatement...

- Alex

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Michael Bauroth
Does there exist eventually an ARM port? Would be great to hear ...

Best Regards
Michael

Alex Davies schrieb:

> Mike Pall wrote:
>> Not suprisingly, Lua and LuaJIT still have the lead on numeric
>> benchmarks (unboxed floating point numbers pay off here). And
>> LuaJIT 2.x will completely change the game (sorry, still no ETA).
>
> Those times for LuaJIT 2.x are simply mindboggling.  A JIT compiler
> beating GCC in more then one standardised/non hand-picked tests?  Also
> made me smile to hear it's still being worked on, has been quiet for a
> while now.  Excited would be an understatement...
>
> - Alex

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

RJP Computing
On Sat, Aug 8, 2009 at 7:34 AM, Michael
Bauroth<[hidden email]> wrote:

> Does there exist eventually an ARM port? Would be great to hear ...
>
> Best Regards
> Michael
>
> Alex Davies schrieb:
>>
>> Mike Pall wrote:
>>>
>>> Not suprisingly, Lua and LuaJIT still have the lead on numeric
>>> benchmarks (unboxed floating point numbers pay off here). And
>>> LuaJIT 2.x will completely change the game (sorry, still no ETA).
>>
>> Those times for LuaJIT 2.x are simply mindboggling.  A JIT compiler
>> beating GCC in more then one standardised/non hand-picked tests?  Also made
>> me smile to hear it's still being worked on, has been quiet for a while now.
>>  Excited would be an understatement...

Is there a 64bit version coming when version 2 is done? That would be
so great!!!
--
Regards,
Ryan
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Asko Kauppi


Lähetetty iPhonesta

RJP Computing <[hidden email]> kirjoitti 8.8.2009 kello 17.51:

> On Sat, Aug 8, 2009 at 7:34 AM, Michael
> Bauroth<[hidden email]> wrote:
>> Does there exist eventually an ARM port? Would be great to hear ...
>>
>> Best Regards
>> Michael
>>
>> Alex Davies schrieb:
>>>
>>> Mike Pall wrote:
>>>>
>>>> Not suprisingly, Lua and LuaJIT still have the lead on numeric
>>>> benchmarks (unboxed floating point numbers pay off here). And
>>>> LuaJIT 2.x will completely change the game (sorry, still no ETA).
>>>
>>> Those times for LuaJIT 2.x are simply mindboggling.  A JIT compiler
>>> beating GCC in more then one standardised/non hand-picked tests?  
>>> Also made
>>> me smile to hear it's still being worked on, has been quiet for a  
>>> while now.
>>>  Excited would be an understatement...
>
> Is there a 64bit version coming when version 2 is done? That would be
> so great!

My customer also would be interested in a x64 LuaJIT. Maybe we can  
come up with some company sponsorship for Mike on this? He'd sure  
deserved some!!

-Asko


> --
> Regards,
> Ryan
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Alexander Gladysh
>> Is there a 64bit version coming when version 2 is done? That would be
>> so great!

> My customer also would be interested in a x64 LuaJIT. Maybe we can come up
> with some company sponsorship for Mike on this? He'd sure deserved some!!

Actually, there is a chance our company would be able to participate
in such sponsorship as well.

We'd love to use 64bit LuaJIT 2 in our products!

Alexander.
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Mike Pall-6
In reply to this post by RJP Computing
RJP Computing wrote:
> Is there a 64bit version coming when version 2 is done?

The goal is to release the x86 version and stabilize it a bit
before starting the x64 port.

Most of the LJ2 VM is already "64 bit ready". E.g. it has an
arch-independent 32 bit pointer abstraction for all GC objects.
This keeps tagged values at 8 bytes on all platforms. But several
major areas need more work: porting DynASM to x64, porting the
interpreter (which is 100% x86 assembler), dealing with the
different x64 calling conventions (WIN64 is different than the
rest of the world) and a couple more open issues.

Michael Bauroth wrote:
> Does there exist eventually an ARM port?

Given the market share and the estimated demand, that's most
likely the next port after the x64 port. But it's much more
complicated, since there is no uniform ARM platform. The choice of
the number type for Lua is the main difficulty.

Using double-precision floating-point numbers is one option. But
it needs really fast FP arithmetics (x86/x64 provides that).
Unfortunately most older ARM devices have no FPU at all and NEON
doesn't do double precision FP. And about VFP ... well, some
vendors like to hide the fact that most of their gadgets only
contain something called "VFPlite". The high latencies and the low
throughput makes softfp suddenly look like an attractive option.

Another option is to use 32 bit integers only. Certainly easier to
implement, but I'm not so sure everyone would be happy with it.

I've also considered using 32.31 fixed-point numbers. Yes, it's a
bit of an awkward choice. But you'd get fractional numbers at the
speed of integer arithmetics. Again, I'm not sure about the needs
of developers who'd like to have LJ2 ported to ARM.

[Note that support for multiple number types per platform is not
an option. A JIT compiler needs to emit very different instruction
sequences for each number type. Switching number types is not as
easy as changing a couple of C macros.]

--Mike
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Mike Pall-6
In reply to this post by Alexander Gladysh
Alexander Gladysh wrote:

> >> Is there a 64bit version coming when version 2 is done? That would be
> >> so great!
>
> > My customer also would be interested in a x64 LuaJIT. Maybe we can come up
> > with some company sponsorship for Mike on this? He'd sure deserved some!!
>
> Actually, there is a chance our company would be able to participate
> in such sponsorship as well.
>
> We'd love to use 64bit LuaJIT 2 in our products!

Sure, I can put up a proposal for a sponsorship program for the x64
port (*after* the x86 release of course). Since I'm an independent
consultant, you'll get a proper bill/invoice/tax receipt or
whatever it's called in your country. Companies should be able to
deduct the expenses.

But before going to the effort, I'd like to know a rough estimate
what your respective companies might be willing to spend on this.
You can email me privately about this and any amounts are of
course not binding (yet). Depending on the outcome I may go for it.

--Mike
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Alexander Gladysh
In reply to this post by Mike Pall-6
> Michael Bauroth wrote:
>> Does there exist eventually an ARM port?

> Given the market share and the estimated demand, that's most
> likely the next port after the x64 port. But it's much more
> complicated, since there is no uniform ARM platform. The choice of
> the number type for Lua is the main difficulty.

<...>

I, personally, am interested in LJ2 for iPhone (which is ARM-based).
As I intend to reuse existing (x86) game logic code on it, I would
need floating point support.

Alexander.
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Alexander Gladysh
In reply to this post by Mike Pall-6
>> We'd love to use 64bit LuaJIT 2 in our products!

> Sure, I can put up a proposal for a sponsorship program for the x64
> port (*after* the x86 release of course). Since I'm an independent
> consultant, you'll get a proper bill/invoice/tax receipt or
> whatever it's called in your country. Companies should be able to
> deduct the expenses.

> But before going to the effort, I'd like to know a rough estimate
> what your respective companies might be willing to spend on this.
> You can email me privately about this and any amounts are of
> course not binding (yet). Depending on the outcome I may go for it.

Cool! But it looks like we'd need to sponsor x86 release first! :-)

You've said there is no ETA yet for x86, but, perhaps, you may share
some information on the amount of work left to do?

Alexander.
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Timm S. Mueller
In reply to this post by Mike Pall-6
On Sun, 9 Aug 2009 17:03:22 +0200
Mike Pall <[hidden email]> wrote:

> Another option is to use 32 bit integers only. Certainly easier to
> implement, but I'm not so sure everyone would be happy with it.

Just for the record: a port of LuaJIT to ARM would be very welcome, and
I'd be perfectly happy with a 32 bit numerical datatype on this
architecture.

- Timm

--
Timm S. Mueller <[hidden email]>
Schulze & Mueller GbR, Gryphiusstr. 2, 10245 Berlin,
Gesellschafter: Franciska Schulze, Timm S. Mueller,
Tel. +49 30 93624410, http://www.schulze-mueller.de/
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Ivan-Assen Ivanov
In reply to this post by Mike Pall-6
> Another option is to use 32 bit integers only. Certainly easier to
> implement, but I'm not so sure everyone would be happy with it.

If you go this way eventually, please backport to x86. We use stock Lua
with 32-bit integers on x86 and PowerPC (Xbox 360) for very peculiar reasons
(synchronicity of calculations across the network) and are generally
happy with it.
I was unpleasantly surprised when I learned there's no support for
that in LuaJIT 1.x.

Best regards,
Assen
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Mike Pall-6
In reply to this post by Alexander Gladysh
Alexander Gladysh wrote:
> I, personally, am interested in LJ2 for iPhone (which is ARM-based).
> As I intend to reuse existing (x86) game logic code on it, I would
> need floating point support.

Ok, but you may be in for a nasty surprise: the 3GS has an ARM
Cortex-A8 CPU which only has VFPlite. This is actually a step back
from the previous models which had an ARM 1176JZ(F)-S with a full
VFP unit. And since the vector mode of VFP is officially deprecated,
you're in for more surprises in the future.

Not to squash your hopes, but I suggest you try to measure whether
the iPhone FP performance can keep up with your requirements.
Maybe try some simple double-precision FP benchmarks in C (don't
compile as Thumb code or you get softfp).

Timm S. Mueller wrote:
> Just for the record: a port of LuaJIT to ARM would be very welcome, and
> I'd be perfectly happy with a 32 bit numerical datatype on this
> architecture.

Umm, so one probably needs at least two different VMs for ARM (FP
vs. int-only). Then combine this with the options for ARM vs.
Thumb vs. Thumb2 code and with ARMv4-ARMv7 support and soon we'll
have an exponential number of targets to support ... *sigh*

Thank you to both of you for the quick feedback!

--Mike
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Mike Pall-6
In reply to this post by Alexander Gladysh
Alexander Gladysh wrote:
> You've said there is no ETA yet for x86, but, perhaps, you may share
> some information on the amount of work left to do?

Well, I'm already cutting corners everywhere wrt. features for the
first alpha. But issues with correctness and completeness keep me
busy (the coordination between the JIT code and the GC is currently
a minefield). And the code needs to be cleaned up a lot before it's
ready for public consumption. Then I'll need to work on the
packaging, the docs, the web site reorganization and so on ...

Thankfully I've recently removed the last major stumbling block
(better trace linking) and the benchmark results demonstrate that
going for a trace compiler was a sound design decision after all.

But I have to say it was an expensive decision: I've considerably
underestimated the amount of research and trial-and-error which
was needed to convert a research toy into a production compiler.
There are some important implementation details which the few
papers about trace compilers completely fail to mention ... :-|

--Mike
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Timm S. Mueller
In reply to this post by Mike Pall-6
On Mon, 10 Aug 2009 02:36:43 +0200
Mike Pall <[hidden email]> wrote:

> > Just for the record: a port of LuaJIT to ARM would be very welcome, and
> > I'd be perfectly happy with a 32 bit numerical datatype on this
> > architecture.
>
> Umm, so one probably needs at least two different VMs for ARM (FP
> vs. int-only). Then combine this with the options for ARM vs.
> Thumb vs. Thumb2 code and with ARMv4-ARMv7 support and soon we'll
> have an exponential number of targets to support ... *sigh*

Please make a sensible decision, the sky isn't falling over if I'm not
getting LuaJIT/ARM for free.

My deployment needs are in the no-FPU, no-2nd-level-cache, ~200MHz
range. That's where throughput for user interfaces is scarce and
desperately needed. But Lua is up to the task. I have designed my
libraries to work with integer (using fixpoint arithmetics in places),
so that I can use them in these contexts, and ARM-7 is an architecture
I am frequently concerned with.

- Timm

--
Timm S. Mueller <[hidden email]>
Schulze & Mueller GbR, Gryphiusstr. 2, 10245 Berlin,
Gesellschafter: Franciska Schulze, Timm S. Mueller,
Tel. +49 30 93624410, http://www.schulze-mueller.de/
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Rob Kendrick
In reply to this post by Mike Pall-6
On Mon, 10 Aug 2009 02:36:43 +0200
Mike Pall <[hidden email]> wrote:

> Umm, so one probably needs at least two different VMs for ARM (FP
> vs. int-only). Then combine this with the options for ARM vs.
> Thumb vs. Thumb2 code and with ARMv4-ARMv7 support and soon we'll
> have an exponential number of targets to support ... *sigh*

Ignore Thumb; I'm not sure anybody would want to run LuaJIT on a
Thumb-only device (like a Cortex M3).  ARM's hardware FP has always
left something to be desired, so a combined approach like Asko's
integer patch might be a solution.  And for the most part (excluding
floating point), it should be quite easy to produce code that will run
on both ARMv3 and 4 and ARMv7.
 
B.
123