ANN: LuaJIT 1.1.0

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

ANN: LuaJIT 1.1.0

Mike Pall-3-2
Hi,

LuaJIT is a Just-In-Time (JIT) Compiler for Lua 5.1.
LuaJIT is light-weight, efficient and extensible.

LuaJIT 1.1.0 is based on Lua 5.1 (final). The performance has
been improved in many areas: more specialization and inlining for
operators and library functions, adaptive deoptimization, better
type hinting, optional SSE2 code generation and many other small
optimizations.

It supports many popular x86 based operating systems: Linux, *BSD,
Mac OS X on Intel, Solaris x86 and Windows (MSVC or MinGW).

Please visit the project home page for more info:
  http://luajit.luaforge.net/

You can find the full changelog and performance comparisons here:
  http://luajit.luaforge.net/luajit_changes.html
  http://luajit.luaforge.net/luajit_performance.html

Here is a direct link to the download page:
  http://luajit.luaforge.net/download.html

Bye,
     Mike
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Alex Queiroz
Hallo,

On 3/13/06, Mike Pall <[hidden email]> wrote:
> Hi,
>
> LuaJIT is a Just-In-Time (JIT) Compiler for Lua 5.1.
> LuaJIT is light-weight, efficient and extensible.
>

     Thank you very much for LuaJIT. Some benchmarks: using plain Lua
I can achieve 122 FPS visualising a 512x512 CLOD terrain. Using LuaJIT
I can achieve 256 FPS with the same terrain.
     Visualising a 3200x3200 terrain runs at 31 FPS with plain Lua and
at 35 FPS with LuaJIT. But with large terrains the CLOD algorithm
dominates, which is entirely in C.

--
-alex
http://www.ventonegro.org/
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Adam D. Moss
In reply to this post by Mike Pall-3-2
Mike Pall wrote:
> LuaJIT is a Just-In-Time (JIT) Compiler for Lua 5.1.
> LuaJIT is light-weight, efficient and extensible.

It's a great piece of work (and a really interesting
architecture, small footprint, nice docs).  In some narrow
cases it even makes Lua rather competitive versus optimized
GCC code.

LuaJIT's maths speed is excellent and probably where it
comes closest to C's performance.  Loops are good too.
Table access is a lot slower than C array access, but to
some extent that's the price to be paid for safe and dynamic
arrays, though it's a shame since of course many interesting
applications of Lua will naturally use table access with
abandon.  I haven't compared string handling or call speed.

I'm comparing LuaJIT to C because for me, LuaJIT is a lot
more interesting for its ability to displace future C code
than for its ability to run existing Lua code faster.  I
think that LuaJIT (this version in particular) starts to
open Lua up to some domains in which Lua was not performant
enough previously.

Regards,
--Adam

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Paul Chiusano
Mike,

What are your future plans for LuaJIT, and how fast do you think a
just-in-time compiler for Lua could be? Also, I'm curious: what are
the real sources of slowness for a dynamically-typed language like Lua
-  is it mostly instruction decoding, or is it having to resolve
things at run time (like figuring out what function to call for the
expression 'a + b'), the lack of inlined functions (I mean pure Lua
functions), function call overhead, or what? What do you think the
performance limits are for just-in-time compilation in Lua?

-Paul
Reply | Threaded
Open this post in threaded view
|

Re: teste Lula

Henrique Manela
Maria Lucia Agostini (LULA) wrote:
> Mike,
>
> What are your future plans for LuaJIT, and how fast do you think a
> just-in-time compiler for Lua could be? Also, I'm curious: what are
> the real sources of slowness for a dynamically-typed language like Lua
> -  is it mostly instruction decoding, or is it having to resolve
> things at run time (like figuring out wha
>  

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Mike Pall-3-2
In reply to this post by Paul Chiusano
Hi,

Paul Chiusano wrote:
> What are your future plans for LuaJIT,

This depends on the feedback I'll receive from LuaJIT users.

* "Make it produce faster code" is one rather obvious goal. But
I have to know which area to target first.

E.g. Adam made some comparisons between Lua code and equivalent C
code. He sent me a few code snippets which show exactly what's
slow and what needs to be tuned. This is very helpful and I can
encourage other users of LuaJIT to do the same. Please note that
I cannot analyze complete applications -- small and up to the
point code snippets (without complex dependencies) are best.

* Another goal is better portability (to non x86 CPUs). I think
embedded CPUs would benefit most. I had a cheap Linux based
DSL/VoIP router here for a few days (switched my parents home
over to VoIP). This cute little thing (size of a sandwich loaf)
runs Linux on a 200 MHz MIPS32 CPU with 8 or 16 MB RAM. It's
adequate when used with compiled C code, but interpreted Lua runs
really slow. The tiny cache and the lack of out-of-order
execution is a killer for interpreters.

IMHO Lua is the only scripting alternative due to severe size
constraints (2 or 4 MB flash is really tight). MIPS32 code would
also run on the PS2 or PSP, which will still play a role in the
game market for a while. ARM is an interesting target for other
embedded devices and PDAs (XScale).

This box and other embedded systems would benefit greatly from
LuaJIT. I'm self-employed and would rather work on LuaJIT than
other (less interesting) projects. So this is the plea:

  I'm actively looking for sponsors who want to see LuaJIT ported
  to their favourite CPU. If you are a big company or have the
  necessary funds to pay a developer for several months, please
  contact me by mail. I will keep all negotiations confidential.
  The result of the port has to be available as open source of
  course.

[Another option is the GPL + commercial license route (like MySQL),
but I'm not sure this would work out.]

> and how fast do you think a just-in-time compiler for Lua could be?

Only the sky is the limit. No, seriously, it's more a matter of
how much work one is able to put into the compiler. GCC and other
top performing compilers have seen many years of coordinated
development effort. And there are lots of research papers on how
to optimize C or Java code. But the good papers on optimizing
dynamic languages are far and few between.

Right now LuaJIT is at the point where all the low hanging fruit
have been picked. Any further performance gains will only be
incremental, but take comparatively more work.

The real limit is how much free (or paid) time I can spend
working on LuaJIT. I just don't know at this point in time.
And I have some other Lua projects on the back-burner, too.

> Also, I'm curious: what are
> the real sources of slowness for a dynamically-typed language like Lua
> -  is it mostly instruction decoding,

This is only relevant for the interpreter.

> or is it having to resolve things at run time (like figuring
> out what function to call for the expression 'a + b'),

This is quite easy in Lua because most opcodes have only one
dominant receiver class. Even the interpreter inlines the number
case for arithmetic opcodes.

The LuaJIT optimizer is pretty good at detecting monomorphism.
The new adaptive deoptimization support in LuaJIT 1.1.0 makes
backing down in case of undetected polymorphism relatively cheap.
Aggressive optimizations can be done without compromising Lua
semantics. I think I've covered all of the commonly used
monomorphic cases for opcodes now.

> the lack of inlined functions (I mean pure Lua functions),

This depends on the coding style. I'm not sure about the overall
effect in most Lua apps. It's probably not so dominant for the
Lua interpreter because other overhead shadows it.

OTOH in typical OO-intensive Smalltalk or Self programs one
really needs to do function inling to reach acceptable speeds.

It's on my TODO list for LuaJIT, but I think other optimizations
would pay off more and should be done first.

Inlining many standard library functions (C functions) in LuaJIT
1.1.0 payed off a lot. But this is partly due to the reduced call
overhead, partly due to specialization and partly because of
direct access to internal structures.

> function call overhead,

This is pretty low for an interpreter (if compared to other
interpreters). But it's relatively high when you compare LuaJIT
to other compilers.

The main reason is that LuaJIT still uses the Lua frame and stack
structures. This makes it easy to switch between interpreted and
compiled code. And most of the debug support can be reused, too.

Reducing the function call overhead any further is hard without
major conceptual changes. Inlining short Lua functions may be
easier (and is potentially faster).

> What do you think the performance limits are for just-in-time
> compilation in Lua?

* Lua has only a single number type. This simplifies many things
and even using a double doesn't make much of a difference for the
interpreter. But now that many other things have been optimized,
it shows in LuaJIT. Array indexing is slow (compared to C)
because it needs too many type conversions (double <-> int) and
bounds checks.

Narrowing numbers to integers with help from the optimizer is one
way to go. Dual number support (int + double) would have benefits
for embedded CPUs (lacking FPU hardware). But it's tricky to get
this fast for the interpreter and even more so for compiled code.
I guess pure integer support is too limiting for most embedded
projects (but would be really fast). [I need feedback on this
topic from people who use Lua on embedded devices.]

* Lua has only a single generic container type (tables). Again this
simplifies many things and has little impact on the interpreter.
But it puts a limit on what can be optimized in a JIT compiler
with only local knowledge. Struct accesses (obj.foo, obj.bar)
always need a hash lookup (unlike in languages with static
typing). The full metamethod semantics come at a price, too.

* Caching globals and method lookups is difficult. A seemingly
trivial statement like y = math.sqrt(x) needs two hash table
lookups and several type checks and contract verifications to
come to the point where the FP square root instruction (fsqrt)
can be safely inlined. This overhead cannot be avoided without
compromising language semantics (maybe the semantics need to be
augmented). Manually caching often used functions is common
practice in Lua (local sqrt = math.sqrt). But this doesn't work
out so well for obj:method() calls.

* Type checks and other contract verifications are cheap on
modern x86 CPUs. They execute in the integer unit parallel to the
FP intensive main code with out-of-order execution. But the
overhead would be noticeable on embedded CPUs. Many redundant
checks could be removed or hoisted out of loops. Arithmetic
operations could be combined.

* Garbage collection and heap allocation put Lua at a speed
disadvantage to languages with manual memory management. The impact
is less in Lua than other dynamic languages because of typed-value
storage and immutable shared strings. Adding a custom memory
allocator to the Lua core could be beneficial. Complex solutions
like escape analysis are not on my radar for LuaJIT (yet).

Bye,
     Mike
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

SevenThunders
In reply to this post by Mike Pall-3-2
Very nice work.  I downloaded it and integrated it into my modified LUA shell in about 5 minutes.  The speed improvement was startling and obvious.  So far no bugs observed.

I've been working on my own compiler back end.  I'm primarily interested in optimization techniques and performance.  I'm somewhat neutral on the front end language as long as it's interactive and has garbage collection.  I'll have to dig into your code some to see what you've been doing.
Reply | Threaded
Open this post in threaded view
|

Speed

Achromat
Hi !

i have Lua5.1 in my application implemented, and have change
double to float in luaconfig.h

my question : have i any changs to speed lua up by change or
patch anything  ?


thanks for suggestions..   (easy english sorry)


----- Original Message -----
From: "SevenThunders" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, March 15, 2006 2:37 AM
Subject: Re: ANN: LuaJIT 1.1.0


>
> Very nice work.  I downloaded it and integrated it into my modified LUA
> shell
> in about 5 minutes.  The speed improvement was startling and obvious.  So
> far no bugs observed.
>
> I've been working on my own compiler back end.  I'm primarily interested
> in
> optimization techniques and performance.  I'm somewhat neutral on the
> front
> end language as long as it's interactive and has garbage collection.  I'll
> have to dig into your code some to see what you've been doing.
> --
> View this message in context:
> http://www.nabble.com/ANN%3A-LuaJIT-1.1.0-t1273815.html#a3408769
> Sent from the Lua - General forum at Nabble.com.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Kein-Hong Man
In reply to this post by Mike Pall-3-2
Mike Pall wrote:

> [snip]
> The main reason is that LuaJIT still uses the Lua frame and stack
> structures. This makes it easy to switch between interpreted and
> compiled code. And most of the debug support can be reused, too.
>
> Reducing the function call overhead any further is hard without
> major conceptual changes. Inlining short Lua functions may be
> easier (and is potentially faster).
>
>> What do you think the performance limits are for just-in-time
>> compilation in Lua?
> [snip]

I think up to a certain point, there is only so much one can do to
speed things up without sacrificing something. If a JIT is to
function exactly like interpreted Lua, one cannot exactly produce
very fast code approaching the speed of C or bare metal code.

It's a tradeoff -- for top speed, we'd have to start cutting some
functionality off. What would achieve top speed? We'd need a
lite-Lua profile that is largely procedural and where most data
types can be made static. Things like metamethod checking will
need to be dropped unless it is explicitly needed. There cannot be
strict semantics; integers must be used where appropriate so that
one can forget about conversion.

Where applications are concerned however, a lite-Lua profile would
not be appropriate where one wants to JIT the entire Lua source
code, but it would greatly accelerate a subset of functions. So it
assumes the application is mostly fast enough on interpreted Lua,
but there are a few processing intensive functions that badly need
accelerating.

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Gavin Wraith
In reply to this post by Mike Pall-3-2
In message <[hidden email]> you wrote:


> > What are your future plans for LuaJIT,
>
> This depends on the feedback I'll receive from LuaJIT users.
 
> * Another goal is better portability (to non x86 CPUs).
> ................
> ARM is an interesting target for other
> embedded devices and PDAs (XScale).

And for Risc OS - the OS which was designed around the ARM by
the group who originally created the ARM architecture. There are
still a few thousand users. I would love to see an integer-only
ARM LuaJIT.

The type-theoretic questions that the LuaJIT suggests are interesting.
What languages are there, in the neighbourhood of Lua (in that "space"
of languages that has not yet been given formal definition), that
have more static typing - enough to make possible faster compiled
code - but yet retain the character and appeal of Lua?

--
Gavin Wraith ([hidden email])
Home page: http://www.wra1th.plus.com/
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Framework Studios: Hugo
In reply to this post by Mike Pall-3-2
> This depends on the feedback I'll receive from LuaJIT users.

I tried to use LuaJIT in our game engine today. Unfortunately, it seems
there is no support for float in stead of double:

#ifndef LUA_NUMBER_DOUBLE
#error "No support for other number types on x86 (yet)"

#endif

At this point, this is keeping me from using LuaJIT since I prefer to use
float-s over double-s. It would be great to see float support in LuaJIT!

    Thanks,
            Hugo

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Alex Queiroz
Hallo,

On 3/15/06, Framework Studios: Hugo <[hidden email]> wrote:
>
> At this point, this is keeping me from using LuaJIT since I prefer to use
> float-s over double-s. It would be great to see float support in LuaJIT!
>

Why is that? Have you seen this:
http://lua-users.org/wiki/FloatingPoint ? Or are you using SSE
instructions?

--
-alex
http://www.ventonegro.org/
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Framework Studios: Hugo
Yes, I am aware of the difference between float and double.

It's because we're using DirectX... it sets the FPU to single precision mode
so eighter we have to use float-s or Lua starts acting really weird after
init of DirectX.

Note the DirectX documentation claims DirectX to be slower when preserving
the FPU flag; about "D3DCREATE_FPU_PRESERVE" it sais:

"Indicates that the application needs either double-precision floating-point
unit (FPU) or FPU exceptions enabled. Microsoft® Direct3D® sets the FPU
state each time it is called.
By default, the pipeline uses single precision. Be sure to use this flag to
get double precision. Setting the flag will reduce Direct3D performance."

The reason to use LuaJIT of course would be to gain speed, not loose it
because of forcing double precision onto DirectX.

    Bye,

            Hugo

----- Original Message -----
From: "Alex Queiroz" <[hidden email]>
To: "Lua list" <[hidden email]>
Sent: Wednesday, March 15, 2006 3:40 PM
Subject: Re: ANN: LuaJIT 1.1.0


Hallo,

On 3/15/06, Framework Studios: Hugo <[hidden email]> wrote:
>
> At this point, this is keeping me from using LuaJIT since I prefer to use
> float-s over double-s. It would be great to see float support in LuaJIT!
>

Why is that? Have you seen this:
http://lua-users.org/wiki/FloatingPoint ? Or are you using SSE
instructions?

--
-alex
http://www.ventonegro.org/ 

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Andras Balogh-4
> Note the DirectX documentation claims DirectX to be slower when  
> preserving the FPU flag;

I'm just curious, does anyone have actual numbers to back this up?


andras
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Mike Pall-3-2
In reply to this post by Framework Studios: Hugo
Hi,

Framework Studios: Hugo wrote:
> Note the DirectX documentation claims DirectX to be slower when preserving
> the FPU flag; about "D3DCREATE_FPU_PRESERVE" it sais:

AFAIK this only applies to really old CPUs. Does your game even
run on a >5 year old PC without a 3D-capable GPU?

I suggest you try to turn on the flag and check whether it makes
any difference.

Bye,
     Mike
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Roberto Ierusalimschy
In reply to this post by Framework Studios: Hugo
On Wed, Mar 15, 2006 at 04:02:16PM +0100, Framework Studios: Hugo wrote:
> Yes, I am aware of the difference between float and double.
>
> It's because we're using DirectX... it sets the FPU to single precision
> mode so eighter we have to use float-s or Lua starts acting really weird
> after init of DirectX.
>
> Note the DirectX documentation claims DirectX to be slower when preserving
> the FPU flag; about "D3DCREATE_FPU_PRESERVE" it sais:

You don't need to switch Lua to floats to avoid this problem. If you
really want to keep DirectX as it is, you can recompile Lua to use the
default definition for lua_number2integer:

luaconf.h:544
- #if defined(LUA_NUMBER_DOUBLE) && !defined(LUA_ANSI) && !defined(__SSE2__) && \
-     (defined(__i386) || defined (_M_IX86) || defined(__i386__))
+ #if 0

-- Roberto
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

SevenThunders
In reply to this post by Mike Pall-3-2
Although I don't know much about DirectX, it is the SSE2 instructions that see a large boost using 32 bit floats, since they can do twice as many multiplies per clock cycle.  Moreover if the modern Directx drivers are using the CPU I would be surprised if they are not using the more modern SSE and SSE2 instructions over the old x86 FPU.  Thus one would never have to employ the nasty switch to single precision on the FPU (which probably sucks up a lot of clock cycles in it's own right).

Perhaps the question is what version of DirectX are you using?  Actually a google search produces this link
http://blogs.msdn.com/tmiller/archive/2004/06/01/145596.aspx

Tell your DirectX to leave the deprecated FPU alone!
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Framework Studios: Hugo
Hi,

I'm using the DirectX9c SDK from Feb. 2006 (the latest). Imho maybe we can
assume the people behind DirectX have a good reason to do something as
potentially 'dangerous' to other libraries as setting the FPU to single
precision.

A google search can also produce pages about the loss of speed when using
the 'preserve FPU' flag, like
http://discuss.microsoft.com/SCRIPTS/WA-MSD.EXE?A2=ind0504b&L=directxdev&D=1&P=4524 
(quote: "For games, you typically don't want the performance hit of having
the FP unit working in double-precision.").

With SSE of course it can be avoided, but what about DX drivers on a CPU
without SSE? Btw, can 32-bit float-s really benefit from SSE2 over 64-bit
double-s? Or maybe double-s are faster than float-s on 64-bit CPUs?

So for now I'm sticking with having DirectX setting the (perhaps not so very
deprecated :) FPU to single precision and prefering float over double in our
game engine until DirectX itself converts to using double-s.

Anyway, today I got LuaJIT to work, big thanks to Roberto for this tip!

luaconf.h:544
- #if defined(LUA_NUMBER_DOUBLE) && !defined(LUA_ANSI) && !defined(__SSE2__)
&& \
-     (defined(__i386) || defined (_M_IX86) || defined(__i386__))
+ #if 0

        cheers,
                Hugo

----- Original Message -----
From: "SevenThunders" <[hidden email]>
To: <[hidden email]>
Sent: Thursday, March 16, 2006 6:27 AM
Subject: Re: ANN: LuaJIT 1.1.0


>
> Although I don't know much about DirectX, it is the SSE2 instructions that
> see a large boost using 32 bit floats, since they can do twice as many
> multiplies per clock cycle.  Moreover if the modern Directx drivers are
> using the CPU I would be surprised if they are not using the more modern
> SSE
> and SSE2 instructions over the old x86 FPU.  Thus one would never have to
> employ the nasty switch to single precision on the FPU (which probably
> sucks
> up a lot of clock cycles in it's own right).
>
> Perhaps the question is what version of DirectX are you using?  Actually a
> google search produces this link
> http://blogs.msdn.com/tmiller/archive/2004/06/01/145596.aspx
>
> Tell your DirectX to leave the deprecated FPU alone!
> --
> View this message in context:
> http://www.nabble.com/ANN%3A-LuaJIT-1.1.0-t1273815.html#a3430294
> Sent from the Lua - General forum at Nabble.com.
>

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Andras Balogh-4
I still don't buy this. Give me numbers! Should be easy, right? Set
the flag and read FPS counter! I'd test it myself, but I'm using
OpenGL, and thus never had to change the FPU control word..


Andras

Thursday, March 16, 2006, 1:35:08 AM, you wrote:

> A google search can also produce pages about the loss of speed when using
> the 'preserve FPU' flag, like
> http://discuss.microsoft.com/SCRIPTS/WA-MSD.EXE?A2=ind0504b&L=directxdev&D=1&P=4524
> (quote: "For games, you typically don't want the performance hit of having
> the FP unit working in double-precision.").


Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Framework Studios: Hugo
Well, one number I found on the decrease of Direct3D's speed with and
without the FPU preserve flag:

http://discuss.microsoft.com/SCRIPTS/WA-MSD.EXE?A2=ind0504b&L=directxdev&D=1&P=3121

with: 560 fps
without: 580 fps

However I think it is a bit beside the point to 'prove' this with numbers
since DirectX more or less already chose single precision for us (for a good
reason, I trust). Also it seems logical for a 3D API to be faster when using
float-s in stead of double-s because twice the data can be pushed to the GPU
with the same bandwidth / stored in VRAM. Isn't this the same for OpenGL?

Looking at the performance of double vs float on modern CPU-s should be
interesting though. Are double-s faster, slower or the same compared to
float-s on 32-bit and 64-bit CPU architecture? What about the CPU-s people
are actually using on average at the moment? (to sell games we need to look
at what is average on the market, not only to what is top-notch :)

        Cheers,
                Hugo

----- Original Message -----
From: "Andras Balogh" <[hidden email]>
To: "Lua list" <[hidden email]>
Sent: Thursday, March 16, 2006 2:51 PM
Subject: Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX


>I still don't buy this. Give me numbers! Should be easy, right? Set
> the flag and read FPS counter! I'd test it myself, but I'm using
> OpenGL, and thus never had to change the FPU control word..
>
>
> Andras
>
> Thursday, March 16, 2006, 1:35:08 AM, you wrote:
>
>> A google search can also produce pages about the loss of speed when using
>> the 'preserve FPU' flag, like
>> http://discuss.microsoft.com/SCRIPTS/WA-MSD.EXE?A2=ind0504b&L=directxdev&D=1&P=4524
>> (quote: "For games, you typically don't want the performance hit of
>> having
>> the FP unit working in double-precision.").
>
>

12