LuaJIT is a Just-In-Time (JIT) Compiler for Lua. It's compatible with
standard Lua 5.1 and can significantly boost the performance of your
Lua programs. LuaJIT is open source software, released under the MIT/X
This is the LuaJIT roadmap for 2011, bringing you up to date on the
current and future developments around LuaJIT. I'm happy to answer
your questions here on the Lua mailing list, on reddit or by mail.
* Current Status
* FFI Library
* Dual-number VM
* Sponsored ARM port
* Outlook on LuaJIT 2.1
* Release Schedule
LuaJIT 2.0.0-beta5 has proven to be quite stable. Thus I've held back
on releasing new betas in the past five months and worked on various
new features and improvements.
Barring unforeseen difficulties, LuaJIT 2.0.0-beta6 will be released
in the next 1 or 2 weeks. It would be helpful to get early feedback
from testers before the release. Thank you in advance! The LuaJIT git
repository is available from: http://luajit.org/download.html
Here are the main changes between beta5 and beta6:
- The sponsored port of the LuaJIT interpreter to the PowerPC/e500v2
cores is now complete. The hand-optimized assembler code of the
interpreter has been rewritten for the PPC/e500 dialect. It takes
advantage of several architectural features, e.g. vectorized loads
and compares are used to speed up dynamic type checks.
The speedup over the Lua interpreter is a factor of 2x - 4x and in
some cases up to 6x. This is similar to the speedups seen on x86/x64
when comparing the pure interpreters (select the interpreters on
http://luajit.org/performance.html ). Further gains are only
possible with a port of the JIT compiler.
Please note that the e500v2 has a different FPU than most other
PowerPC CPUs. This port will *NOT* run on other PPC-based machines
(e.g. game consoles)! A port of the JIT compiler and/or a port to
other PowerPC CPUs may follow later.
As a side-effect of the port, overall portability of the code base
and cross-compilation support has been further improved.
- The long-awaited LuaJIT FFI library has been merged into the code
base. Please see the next section for details.
- Various minor features from Lua 5.2 have been added:
- Hex escapes and '\*'-escape in string literals.
- string.format("%q", str) is fully reversible.
- "%g" character class in patterns.
- Tighter check on table.sort callback compliance.
- os.exit(status|true|false [,close]).
- __pairs/__ipairs metamethods (needs -DLUAJIT_ENABLE_LUA52COMPAT).
Note that LuaJIT 2.0 already supported other features since its
first release, that later went into Lua 5.2. E.g. bit operations or
a fully resumable VM (yield across pcall).
Most other changes in Lua 5.2 cannot be merged into the LuaJIT code
base, because they break compatibility with the Lua 5.1 API/ABI.
This is not acceptable to the majority of my user-base. Given that
Lua 5.2 provides few tangible benefits, adoption will likely be
rather slow. So LuaJIT will stay compatible with the Lua 5.1 API/ABI
in the near future.
- Changes to the core parts of the VM:
- Specialized bytecode for pairs()/next(). Speedup: 3.5x.
- The parser recognizes 64 bit integer literals (1LL, 1ULL) and
complex literals (1.5i) for use by the FFI library.
- The bytecode can embed these literals, too.
- The JIT compiler has seen some smaller improvements:
- Calls to vararg functions are compiled.
- select() is compiled.
- Alias analysis has been improved, esp. for loads from allocations.
- Various compiler heuristics have been tuned.
As you can see from the above list, LuaJIT 2.0.0-beta6 is a 'feature
release'. It'll likely need quite a few fixes and will be followed by
beta7 in Q1/2011, which is focused on stability.
The FFI library allows calling external C functions and the use of C
data structures from pure Lua code.
The FFI library largely obviates the need to write tedious manual
Lua/C bindings in C. It doesn't require learning a separate binding
language -- it parses plain C declarations, which can be cut-n-pasted
from C header files or reference manuals (*). It's up to the task of
binding large libraries without the need for dealing with fragile
The FFI library is tightly integrated into LuaJIT (it's not available
as a separate module). The code generated by the JIT-compiler for
accesses to C data structures from Lua code is on par with the code a
C compiler would generate. Calls to C functions can be inlined in
JIT-compiled code, unlike calls to functions bound via the classic
(*) In case anyone wonders: Yes, this means the FFI library includes
a full-blown C parser (actually C99 + GCC/MSVC extensions). It's
currently missing a C pre-processor. Some C++ features are supported,
too. But complete C++ support is not coming anytime soon. :-)
Preliminary documentation for the FFI library is available in the git
repository. The Lua mailing list recently had some related threads,
too. Here are just a few examples to whet your appetite:
Using standard POSIX library functions, which are not provided by Lua:
local ffi = require("ffi")
int mkdir(const char *pathname, unsigned int mode);
int rmdir(const char *pathname);
Popping up a message box on Windows:
local ffi = require("ffi")
int MessageBoxA(void *w, const char *txt, const char *cap, int type);
ffi.C.MessageBoxA(nil, "Hello world!", "Test", 0)
Wrapping an external library (libz):
local ffi = require("ffi")
int uncompress(uint8_t *dest, unsigned long *destLen,
const uint8_t *source, unsigned long sourceLen);
local zlib = ffi.load("z")
local function uncompress_string(comp, origsize)
local buf = ffi.new("uint8_t[?]", origsize)
local buflen = ffi.new("unsigned long", origsize)
assert(zlib.uncompress(buf, buflen, comp, #comp) == 0)
return ffi.string(buf, tonumber(buflen))
The FFI library allows you to create and access C data structures from
pure Lua code. Of course the main use for this is for interfacing with
C functions. But they can be used stand-alone, too.
E.g. I've converted SciMark for Lua to use the low-level FFI data
structures with a sizeable gain in performance. The results for GCC,
JVM and LuaJIT+FFI are only a few percent apart. More details can be
found here: http://lua-users.org/lists/lua-l/2010-12/msg00924.html
Full support for all C data types implies that LuaJIT now supports
64 bit integers and complex numbers, as well as the corresponding
number literals (1LL, 1ULL, 1.5i).
Please note that some parts of the FFI are still incomplete. Some
issues, like support for 64 bit arithmetic for all backends, will be
fixed before beta6. But others, like complex arithmetic, will have to
wait. Also, the JIT compiler currently doesn't compile every corner
case of FFI operations: it bails out and transparently falls back to
the interpreter, you can check this with the -jv command line option.
The need to allocate heap objects for carrying C data types may cause
some inefficiencies. Most of these will be resolved with the addition
of generalized allocation sinking and store sinking optimizations to
the JIT compiler. However this feature will not make it into beta6.
The FFI library has been carefully designed to be extensible. E.g. the
FFI library will probably gain support for native vector operations or
for parsing a subset of C++. Development will continue in parallel to
other parts of LuaJIT. New features will be prioritized based on
user-demand and sponsoring.
The Lua language is specified to have a single number type. Currently
LuaJIT only supports 64 bit IEEE-754 compliant FP numbers ('double').
This works just fine for x86/x64 platforms with their excellent
floating-point performance. A unified number representation has many
advantages and the JIT compiler can get away with narrowing only some
select operations to integer arithmetic.
However this approach is unlikely to yield acceptable performance on
lower-end CPUs for mobile or non-desktop/non-server platforms. Most of
these CPUs either support only software floating-point arithmetic or
have slow hardware FPUs.
As a prerequisite for the ARM port (see the next section), dual-number
capability will be added to the LuaJIT VM, the LuaJIT interpreter and
the JIT compiler.
Numbers will be internally kept as 32 bit integers, wherever possible,
and transparently widened to floating-point numbers. This change is
invisible at the Lua source code level. It's expected that carefully
written applications for low-end platforms will be able to avoid
floating-point computations with only few changes to the source code.
Adding dual-number support to the LuaJIT VM is a major change. For
stability reasons, this feature needs to be prototyped first for the
existing x86/x64 port of LuaJIT (even though it's not that useful for
this platform). Work on the actual ARM port of LuaJIT can only start
after the dual-number support is complete.
Sponsored ARM port
I'm happy to announce that QUALCOMM Inc. is sponsoring an ARM port
of LuaJIT 2.0. My personal thanks go to Marc Nijdam, who arranged
The initial target for the ARM port are low-to-middle-end ARM-based
devices. The port will require a CPU conforming to the ARMv5
architecture (ARM9E cores or better) with software floating-point
(no FPU needed) and the classic ARM instruction set.
The initial port ought to run on upwards-compatible hardware, but
possibly with suboptimal performance. An ARM port of LuaJIT which
makes use of the VFP unit (hardware FPU) or other instructions set
extensions may follow at a later point in time.
LuaJIT/ARM will compile out-of-the-box for a GCC-based toolchain
targetting Linux/ARM-based systems. Other operating systems will
be supported through an enhanced porting layer which abstracts
away OS-specific functionality. This is mainly about memory
management and the specific needs of a JIT compiler. The goal is
to allow easier embedding of LuaJIT in custom OS environments.
The port will be done in three phases:
Phase #1: Dual-number support for LuaJIT, prototyped for x86/x64.
This is a basic requirement for the softfp ARM port. Please see
the previous section for details.
Phase #2: ARMv5/softfp port of the LuaJIT interpreter.
Phase #3: ARMv5/softfp port of the LuaJIT JIT compiler.
You can follow the progress in the LuaJIT git repository as usual.
The ARM port will take several months, so there may be interim beta
releases which already include part of the functionality.
Outlook on LuaJIT 2.1
LuaJIT 2.0 has been in beta for more than a year now. Not that this
is unheard of in the industry. :-) The main reason is not a lack of
stability -- in fact the beta releases are successfully used in
But the "beta" label gives me the ability to freely add features and
to just go ahead with bigger redesigns of the code base. There are
still a couple of features I want to include in LuaJIT 2.0, but of
course I have to make a cut somewhere.
My current plan is to freeze the LuaJIT 2.0 branch somewhere in 2011
and get a release candidate out. The 2.0 branch will turn into the
stable branch and will receive only bug fixes.
Shortly after that, development on LuaJIT 2.1 will start. All of the
minor changes that didn't make it into 2.0 will go on the TODO list
for 2.1 of course. I'll update you on the details, when the actual
The one major change that will likely happen first is a new garbage
collector for LuaJIT 2.1. I've already experimented with this on 2.0,
but it turned out to cause too much instability for the code base.
The standard Lua 5.1/LuaJIT 2.0 garbage collector is just not up to
the task to handle big heaps. And both it's allocation speed and the
collector throughput leave something to be desired. So I'm planning to
switch to an integrated allocator and garbage collector. It's going to
be an incremental, generational, non-copying GC.
Naturally the main user-visible effect will be performance gains in
allocation-heavy workloads. Some of the related changes, like morphing
metatables into specialized data types on-the-fly or segregated
finalizer handling will allow giving tables or other objects __gc
1-2 weeks - Release of LuaJIT 2.0.0-beta6 (features)
Q1 2011 - Release of LuaJIT 2.0.0-beta7 (stability)
Q1-Q2 2011 - ARM port of LuaJIT 2.0
Q1-Q3 2011 - Some more beta releases for LuaJIT 2.0
Q3 2011 - Release candidate of LuaJIT 2.0
Q3-Q4 2011 - Release of LuaJIT 2.0.0 final
Q4 2011 - Work on LuaJIT 2.1 starts
Please note this is a tentative schedule, for your orientation only!
I cannot give you any guarantee whatsoever for the correctness of the
> On 21/01/2011 20.06, Mike Pall wrote:
>> LuaJIT Roadmap 2011
> Many thanks for the detailed roadmap! I'm not currently a LuaJIT user
> because I've no optimization needs for x86, but I look forward with interest
> to the ARM port... and of course to the FFI.
On Fri, Jan 21, 2011 at 14:06, Mike Pall <[hidden email]> wrote:
> LuaJIT Roadmap 2011
Mike, I am very grateful to you for developing LuaJIT. It is
brilliant and very useful piece of software.
Recently there was a discussion on this list about memory limits of
JuaJIT2 on x86_64 platform .
Currently the memory available to a Lua state in LuaJIT2 is limited to
1GB. Will this limit go away in the future? Will the new integrated
allocator/GC help here?
I am using (abusing) Lua for data analysis and quite often deal with
Lua tables with millions of entries (one can increase MAXBITS to 30
w/o harm). Even though LuaJIT is vastly superior for this task, I have
to use conventional Lua interpreter, for it imposes no such memory
Leo Razoumov wrote:
> Recently there was a discussion on this list about memory limits of
> JuaJIT2 on x86_64 platform .
> Currently the memory available to a Lua state in LuaJIT2 is limited to
> 1GB. Will this limit go away in the future? Will the new integrated
> allocator/GC help here?
It will be 2GB (or maybe 4GB) with the new GC in LuaJIT 2.1.
> I am using (abusing) Lua for data analysis and quite often deal with
> Lua tables with millions of entries (one can increase MAXBITS to 30
> w/o harm). Even though LuaJIT is vastly superior for this task, I have
> to use conventional Lua interpreter, for it imposes no such memory
You could use the FFI. The FFI handles full 64 bit pointers and
you can manually manage huge memory areas, up to the limits of
your OS, by calling malloc()/free() via the FFI.
Since the FFI allows creating arbitrary C structs at runtime, you
can still have a lot of flexibility in your data structures.
> FFI Library
> The FFI library allows calling external C functions and the use of C
> data structures from pure Lua code.
> The FFI library largely obviates the need to write tedious manual
> Lua/C bindings in C. It doesn't require learning a separate binding
> language -- it parses plain C declarations, which can be cut-n-pasted
> from C header files or reference manuals (*). It's up to the task of
> binding large libraries without the need for dealing with fragile
> binding generators.
This is really great!
Some time ago I have ported simple SDL example to Lua using alien:
Today I have made FFI version for comparison. It means:
- using gcc -E instead of manually defining function signatures
- no more defining constants in Lua code
- indexing structs with hash names instead of byte offsets
- no other major changes in the Lua code
It looks like it is far easier than using alien. It is a new era for Lua.
local ffi=require "ffi"
ffi.cdef( io.open('ffi_SDL.h', 'r'):read('*a'))
-- wrapper functions
return SDL.SDL_LoadBMP_RW(SDL.SDL_RWFromFile(file, "rb"), 1)
function SDL_BlitSurface(src, srcrect, dst, dstrect)
return SDL.SDL_UpperBlit(src, srcrect, dst, dstrect)
-- Initialize SDL
-- set the title bar
SDL.SDL_WM_SetCaption("SDL Test","SDL Test")
-- create window
local screen = SDL.SDL_SetVideoMode(SCREEN_WIDTH, SCREEN_HEIGHT, 0, 0)
-- load bitmap to temp surface
local temp = SDL_LoadBMP('sdl_logo.bmp') -- get it from
-- convert bitmap to display format
local bg = SDL.SDL_DisplayFormat(temp)
-- free the temp surface
local gameover = false;
-- message pump
while not gameover do
-- look for an event
if SDL.SDL_PollEvent(event)==1 then
-- an event was found
if etype==SDL.SDL_QUIT then
-- close button clicked
if etype==SDL.SDL_KEYDOWN then
-- handle the keyboard
if sym==SDL.SDLK_q or sym==SDL.SDLK_ESCAPE then
SDL_BlitSurface(bg, nil, screen, nil)
On 01/21/2011 08:06 PM, Mike Pall wrote:
> LuaJIT Roadmap 2011
Wow, this is such cool stuff :)
I've got a question about the intended behaviour of VLA's, though.
What's the canonical way to resize them? I don't get the "Variable
Length" part of "Variable Length Array". Do we use ffi.C.realloc and
friends, or is it smarter than that?
The following segfaults on my box:
local ffi = require("ffi")
int sprintf(char *str, const char *format, ...);
> Leo Razoumov wrote:
>> Recently there was a discussion on this list about memory limits of
>> JuaJIT2 on x86_64 platform .
>> Currently the memory available to a Lua state in LuaJIT2 is limited to
>> 1GB. Will this limit go away in the future? Will the new integrated
>> allocator/GC help here?
> It will be 2GB (or maybe 4GB) with the new GC in LuaJIT 2.1.
>> I am using (abusing) Lua for data analysis and quite often deal with
>> Lua tables with millions of entries (one can increase MAXBITS to 30
>> w/o harm). Even though LuaJIT is vastly superior for this task, I have
>> to use conventional Lua interpreter, for it imposes no such memory
> You could use the FFI. The FFI handles full 64 bit pointers and
> you can manually manage huge memory areas, up to the limits of
> your OS, by calling malloc()/free() via the FFI.
> Since the FFI allows creating arbitrary C structs at runtime, you
> can still have a lot of flexibility in your data structures.
The whole idea is to use Lua table as a universal data structure for
all the data-sets. Exploratory data analysis needs to be quick and
quasi-interactive. Creating data-set specific structures in C would
place an unnecessary burden on a human analyst. That's why
data-analysis has so many domain specific languages like S, R, matlab,
etc. Lua has a good fighting chance to be a general purpose language
fit for the job and LuaJIT2 stellar runtime performance is a major
advantage if not for that memory limit (like 640KB is enough for
On 21/01/11 19:06, Mike Pall wrote:
> As a prerequisite for the ARM port (see the next section), dual-number
> capability will be added to the LuaJIT VM, the LuaJIT interpreter and
> the JIT compiler.
This is excellent news, and much kudos to Qualcomm for sponsoring it! As
well as being really useful elsewhere --- this will make LuaJIT vastly
more usable on low-end devices.
On two unrelated notes:
- can someone point me at instructions on how to get LuaJIT to tell me
what machine code it's producing for a given function? jit.util.* is
undocumented. (I want to figure out which Lua idioms generate specific
instruction sequences; I'm curious about the possibility of using
LuaJIT/FFI to replace some custom machine code generation code we've got.)
On Sat, Jan 22, 2011 at 12:58 PM, David Given <[hidden email]> wrote:
> - can someone point me at instructions on how to get LuaJIT to tell me
> what machine code it's producing for a given function?
The command-line option -jbc gives bytecodes of parsed functions.
-jdump=bimrs gives byetcode (b), IR (i), and machine code (m) of
traces, along with registers (r) and snapshots (s) (
http://luajit.org/running.html ). Machine code is generated for
traces not functions. See also my LuaJit page on the wiki and xref
Richard Hundt wrote:
> I've got a question about the intended behaviour of VLA's, though.
> What's the canonical way to resize them? I don't get the "Variable
> Length" part of "Variable Length Array". Do we use ffi.C.realloc and
> friends, or is it smarter than that?
The length of a VLA is variable at creation time. But it's not
variable afterwards. It's not meant as a resizable buffer.
The rationale for VLA's is to have a single ctype which can be
reused to create arrays of varying sizes. So you can do this:
local charbuf = ffi.typeof("char[?]")
local buf20 = charbuf(20)
local buf100 = charbuf(100)
If you really want a dynamically resizable array, the canonical
way is to do it like in C: manage a pointer plus a length, use
explicit resizes and replace the pointer with the result of
realloc and manually free the memory at the end (the pointer
itself is GC'ed).
You can wrap this in a proxy table and check the index against the
length in __newindex, if you want automatic resizes. Or you could
just use Lua tables. :-)
The FFI only provides low-level structures -- you build the
> The following segfaults on my box:
> local ffi = require("ffi")
> int sprintf(char *str, const char *format, ...);
> local buf = ffi.new("char[?]", 16)
> ffi.C.sprintf(buf, "%s", string.rep("x", 17)) -- buffer overflow?
Most certainly. You're writing 17+1 bytes into a 16 byte buffer.
Umm, did you expect it to somehow magically resize? Only sprintf
knows how many characters are needed. It just gets a pointer to a
buffer and blindly writes to it. Sure, there's snprintf and even
asprintf, but neither of them resize anything.
Note that the buffer returned from asprintf is created with malloc
and needs to be manually released with free(). There's no GC for
that, of course. In general, ffi.new()/GC and malloc()/free() do
not mix -- these are two completely different mechanisms.
This is great! I have to check out FFI soon and rewrite my vector math lib.
But there is one thing that I was hoping to see on the road map and that is support for dumping and loading bytecode. Is this still a planned feature? I doubt we could ship a commercial game with (full) source code even if it's encrypted...
Petri Häkkinen wrote:
> This is great! I have to check out FFI soon and rewrite my vector math lib.
Note that allocations aren't optimized away, yet. Performance may
be disappointing if you create lots of temporary vectors.
> But there is one thing that I was hoping to see on the road map and that is
> support for dumping and loading bytecode. Is this still a planned feature?
I've previously said that it's currently not a priority for me,
since it offers few benefits for open source developers. The
sponsorship page shows alternative means to change my development
> I doubt we could ship a commercial game with (full) source code even if it's
Why? Reverse-engineering bytecode is at about the same difficulty
level as reverse-engineering obfuscated and encrypted source code.
Ask around in the Java community.
Also, if you fear that your competition wants to peek at your
source code: if they really wanted that, then there are much
cheaper ways to do that (how much does your janitor earn?). But
believe me: their source code is as messy as yours and they have
enough brittle frameworks of their own.
Some commercial games even ship with source code, imagine that! :-)
>> But there is one thing that I was hoping to see on the road map and that is
>> support for dumping and loading bytecode. Is this still a planned feature?
> I've previously said that it's currently not a priority for me,
> since it offers few benefits for open source developers.
I think there are some tools which generate bytecode, like MetaLua.
Even if you transform bytecode back to equivalent (but not much more
readable) Lua source code, you still can't set line number
On Sun, 2011-01-23 at 17:33 +0100, Florian Weimer wrote:
> * Mike Pall:
> >> But there is one thing that I was hoping to see on the road map and that is
> >> support for dumping and loading bytecode. Is this still a planned feature?
> > I've previously said that it's currently not a priority for me,
> > since it offers few benefits for open source developers.
> I think there are some tools which generate bytecode, like MetaLua.
> Even if you transform bytecode back to equivalent (but not much more
> readable) Lua source code, you still can't set line number
That's true, another problem with MetaLua is that often you can't
convert the AST back to Lua, because it uses `Label and `Goto, which are
possible in standard Lua bytecode, but not in Lua source.
I'm not sure how LuaJIT "feels" about this, but I guess not too good
until there is a way to load bytecode...
>> I think there are some tools which generate bytecode, like MetaLua.
>> Even if you transform bytecode back to equivalent (but not much more
>> readable) Lua source code, you still can't set line number
> That's true, another problem with MetaLua is that often you can't
> convert the AST back to Lua, because it uses `Label and `Goto, which are
> possible in standard Lua bytecode, but not in Lua source.
This is not a fundamental issue. After all, most compilers contain
logic which reconstructs loops from unstructured gotos. Most of the
time, code generated by MetaLua will have a structured representation
because it originally came from Lua code.
On 23/01/11 17:52, Florian Weimer wrote:
> This is not a fundamental issue. After all, most compilers contain
> logic which reconstructs loops from unstructured gotos.
I do not believe that this is true in the general case (although I have
not yet discovered a proof stating as such, yet). A while back I was
looking for algorithms for for reconstructing structured loops from
arbitrary basic block graphs for Clue, and every one I found was
surrounded by caveats stating that they didn't work in some circumstances.
(Which is one reason why I get irritated by languages that don't have a
working goto --- it makes the life of madmen like me who like playing
with code translation vastly harder.)
┌─── ｄｇ＠ｃｏｗｌａｒｋ．ｃｏｍ ───── http://www.cowlark.com ─────
│ "I have a mind like a steel trap. It's rusty and full of dead mice."
│ --- Anonymous, on rasfc
On Sun, Jan 23, 2011 at 11:38 PM, David Given <[hidden email]> wrote:
> (Which is one reason why I get irritated by languages that don't have a
> working goto --- it makes the life of madmen like me who like playing
> with code translation vastly harder.)
Imagine trying to get 'goto' into a modern dynamic language this late
in the computer century ....
You could say that C was designed to be a good target for code
generation - #line and goto and so forth.
One could add @line directives to Lua with a token filter, but I also
doubt that the token filter patch will ever make mainstream, for
similar reasons to 'goto'.
Certainly, Lua bytecode is not a stable or universal target...