LuaJIT performance

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

LuaJIT performance

Paul Chiusano
I've been playing around with LuaJIT and the performance is quite
good. For most of my applications, I am seeing the running time cut by
30% to 50%! I did notice a few cases where code runs slower with jit
compilation, and in each case it was code which creates a lot of
functions dynamically (to use as iterators). Here's the smallest
example I could come up with, it runs about four times slower with jit
vs stock Lua:

local function count(n)
   return coroutine.wrap(function()
    for i=1,n do coroutine.yield(i) end
  end)
end
local v = {}
for i=1,1e5 do
  for n in count(math.random(1,5)) do v[#v+1]=n end
end

Are ALL functions jit-compiled the very first time they are called?
Would it make any sense to have some higher threshold, as in, after a
function has been called K times, it is jit compiled? Or some other
more clever workaround so that the compiler doesn't spend too much
time compiling and optimizing what are basically throw-away functions?

-Paul
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Adam D. Moss
Paul Chiusano wrote:
> I've been playing around with LuaJIT and the performance is quite
> good. For most of my applications, I am seeing the running time cut by
> 30% to 50%! I did notice a few cases where code runs slower with jit
> compilation, and in each case it was code which creates a lot of
> functions dynamically (to use as iterators).

You can use the LuaJIT API to provide a hint to LuaJIT that
it should not compile a function.

Leaving a function non-compiled until its Nth call sounds
like a fun heuristic; I think that's one of the distinctions
between a traditional JIT engine and a hotspot-style JIT,
the latter trying to be aware of the cost/gains of performing a
particular level of JITting+optimisation on a particular
code area according to how 'hot' that code is.  Really can't
say whether it's worth the bother though, for LuaJIT's
lightweight remit.

--Adam
--
Adam D. Moss   -   [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT performance

Mike Pall-2-2
In reply to this post by Paul Chiusano
Hi,

Paul Chiusano wrote:
> I did notice a few cases where code runs slower with jit
> compilation, and in each case it was code which creates a lot of
> functions dynamically (to use as iterators).

Creation of closures is as cheap as it is in plain Lua. Only the
underlying function _prototype_ is compiled and not each closure.
A closure is just a short block of memory holding pointers to the
upvalues (which may differ) and a pointer to the prototype (which
stays the same).

Rule of thumb: any Lua source code line you are seeing is
compiled at most once (unless wrapped with loadstring()).

The use of coroutines doesn't make a difference (but see below).

> Here's the smallest
> example I could come up with, it runs about four times slower with jit
> vs stock Lua:
>
> local function count(n)
>    return coroutine.wrap(function()
>     for i=1,n do coroutine.yield(i) end
>   end)
> end
> local v = {}
> for i=1,1e5 do
>   for n in count(math.random(1,5)) do v[#v+1]=n end
> end

But this example has only three function prototypes (main, count()
and the wrapped anonymous function). And only these three are ever
compiled. You can easily find out with -j trace:

$ luajit -j trace -O test.lua
[LuaJIT: OK   21   1206  test.lua:0]
[LuaJIT: OK    7    296  test.lua:1]
[LuaJIT: OK   10    390  test.lua:2]

What you are really seeing in this example is the effect of
repeated creation and garbage collection of coroutines. Since
every coroutine in LuaJIT needs an associated C stack and you are
not doing much work in these coroutines, you are effectivly
benchmarking the higher overhead of the memory allocator.

You can test this is the case by adding
  coroutine.cstacksize(1) -- Select minimum default C stack size.
as the first line. Much faster ... (but has side-effects).

It's not recommended to create/destroy coroutines at such a high
rate. Neither for Lua and especially not for LuaJIT. Recycling
coroutines is easy:

local yield, random = coroutine.yield, math.random
local co = coroutine.wrap(function(n)
    while n do for i=1,n do yield(i) end n = yield() end
  end)
local function count(n)
  return co, n
end
local v = {}
for i=1,1e5 do
  for n in count(random(1,5)) do v[#v+1]=n end
end

This program runs 2.2 times faster with plain Lua. LuaJIT adds
another 30% boost (well, there is not much to optimize here).

> Are ALL functions jit-compiled the very first time they are called?

Yes. Except for those you marked as not-to-be-compiled with
jit.off(f). In particular the main function of all Lua modules
loaded via require() is not compiled because it's guaranteed to
be executed only once.

[But this is only useful for functions that are _truly_ compiled
from scratch (e.g. with loadstring()) and not to be confused with
closure creation or coroutine creation. See above.]

> Would it make any sense to have some higher threshold, as in, after a
> function has been called K times, it is jit compiled? Or some other
> more clever workaround so that the compiler doesn't spend too much
> time compiling and optimizing what are basically throw-away functions?

The total compilation time for the above three functions is
around 200 microseconds on my old PIII. This is very little
compared to traditional compilers. And the overhead is needed
only once (I hope I cleared up the misconception above).

Compilation thresholds need runtime instrumentation in the
interpreter, an extensive set of heuristics, migration of live
function state and other ugly things. I guess this only makes
sense for slow compilers.

I'd rather work on making the compiler even faster. :-)

Bye,
     Mike