[ANN] Lua-AOT 5.4

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANN] Lua-AOT 5.4

Hugo Musso Gualandi
For a recent paper that I recently worked on, I created a modified
version of Lua 5.4 that can compile Lua functions into C. On some
number-crunchy benchmarks, it managed to get a speedup of 2x compared
to regular Lua 5.4.

It is still full of bugs, and I won't keep it updated with future Lua
versions, so it probably wouldn't be a good idea to use it for anything
serious. That said, I think that it might be scientifically interesting
for some people, so I'm sharing it here :)

https://github.com/hugomg/lua-aot-5.4

The basic idea of how it works is that each bytecode instruction is
converted to a block of C code, which is nearly identical to the C code
used by the regular Lua interpreter. The main difference is that Lua
jumps become C gotos, and that it doesn't decode or dispatch the VM
instructions at run-time, because those become compile-time constants.

For me, the most interesting takeaway is that it can give a rough
measurement of how much of the running time of a regular Lua program is
due to interpreter overhead. That is, how much time is spent decoding
and dispatching VM instructions.

-- Hugo


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua-AOT 5.4

Luiz Henrique de Figueiredo
> For me, the most interesting takeaway is that it can give a rough
> measurement of how much of the running time of a regular Lua program is
> due to interpreter overhead. That is, how much time is spent decoding
> and dispatching VM instructions.

And how much is it? I mean, the overhead?

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua-AOT 5.4

Hugo Musso Gualandi
> And how much is it? I mean, the overhead?

Using benchmarks from the Computer Language Benchmarks Game,
my measured speedup for Lua-AOT varied from 1.33x to 2.5x. That is, if
regular Lua 5.4 ran in 1 second, the Lua-AOT version took between 0.40
and 0.75 seconds.

The caveat is that these numbers are probably going to be different for
other benchmarks, and are also likely to be different on different if
run on a different computer, with another kind of processor.

Additionally, it isn't exactly a measurement of interpreter overhead.
It would be more accurate to describe it as what can be achieved by an
ahead-of-time compiler that only picked the low-hanging fruit.

-- Hugo


Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua-AOT 5.4

Hisham
In reply to this post by Hugo Musso Gualandi
On Thu, 30 Jan 2020 at 16:04, Hugo Musso Gualandi
<[hidden email]> wrote:

>
> For a recent paper that I recently worked on, I created a modified
> version of Lua 5.4 that can compile Lua functions into C. On some
> number-crunchy benchmarks, it managed to get a speedup of 2x compared
> to regular Lua 5.4.
>
> It is still full of bugs, and I won't keep it updated with future Lua
> versions, so it probably wouldn't be a good idea to use it for anything
> serious. That said, I think that it might be scientifically interesting
> for some people, so I'm sharing it here :)
>
> https://github.com/hugomg/lua-aot-5.4

Nice! I did a similar thing for Lua 5.0 (or rather 5.1 beta) back in 2006:

http://lua-users.org/lists/lua-l/2006-07/msg00144.html

However, I stuck to the public Lua C APIs, so the resulting code ended up
slower than the stock interpreter. The main takeaway then was that the
Lua API back then didn't cover everything that you could do in Lua code
(for some things I had to emit Lua text and eval to preserve the semantics).
Roberto eventually fixed those omissions in later versions of Lua (and I'd
like to believe my work was useful for spotting those omissions, though I
never had confirmation that it was the case!)

David Manura's lua2c was also a very similar project, based on Lua source
instead of bytecode, and was maintained all the way to Lua 5.2. Like mine
it also uses the public APIs:

https://github.com/davidm/lua2c/

Both projects hit limitations with coroutines, upvalues and tail calls. Do any
of these three limitations apply to lua-aot?

As for compatibility: did you try to run the Lua test suite with lua-aot?
If so, how much of it can it handle?

-- Hisham

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua-AOT 5.4

Hugo Musso Gualandi



>However, I stuck to the public Lua C APIs, so the resulting code ended
>up
>slower than the stock interpreter.

Lua-AOT uses the internal Lua APIs, and therefore only works with a modified version of Lua VM that exposes these APIs. It implements each VM instruction using code that is essentially copy pasted from lvm.c.

One consequence of this is that at runtime the compiled functions in Lua-AOT are also represented as "Lua closures" instead of "C closures", as would be done in something that uses the regular C API.

One related work that I can think of right now would be Gabriel Ligneuls bytecode to LLVM compiler
https://github.com/gligneul/FastLua

Another would be the Lua Vermelha JIT https://github.com/Leonardo2718/lua-vermelha

The main difference in Lua-AOT's case is that it is very simple, since it is basically just copy pasting the regular interpreter code. This makes it less suitable as a production tool, but also means it's easy to experiment with the implementation.
 
>Both projects hit limitations with coroutines, upvalues and tail calls.
>Do any
>of these three limitations apply to lua-aot?

Since Lua-AOT is abusing the internal APIs and uses the same code as the regular Lua interpreter, these features are not a problem.

Tail calls and upvalues should work right now. Coroutines are not implemented yet, but it should be possible to make them work without much effort.

>As for compatibility: did you try to run the Lua test suite with
>lua-aot?
>If so, how much of it can it handle?

I haven't tested that test suite yet. But now that you mention it, it would probably be a good idea to do so!

-- Hugo

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua-AOT 5.4

Dibyendu Majumdar


On Thu, 6 Feb 2020, 16:15 Hugo Musso Gualandi, <[hidden email]> wrote:



>However, I stuck to the public Lua C APIs, so the resulting code ended
>up
>slower than the stock interpreter.

Lua-AOT uses the internal Lua APIs, and therefore only works with a modified version of Lua VM that exposes these APIs. It implements each VM instruction using code that is essentially copy pasted from lvm.c.

One consequence of this is that at runtime the compiled functions in Lua-AOT are also represented as "Lua closures" instead of "C closures", as would be done in something that uses the regular C API.

One related work that I can think of right now would be Gabriel Ligneuls bytecode to LLVM compiler
https://github.com/gligneul/FastLua

Another would be the Lua Vermelha JIT https://github.com/Leonardo2718/lua-vermelha

The main difference in Lua-AOT's case is that it is very simple, since it is basically just copy pasting the regular interpreter code. This makes it less suitable as a production tool, but also means it's easy to experiment with the implementation

Ravi's JIT backend is essentially the same except there are additional byte codes that are type specialised. However in 5.3 there were issues caused by differences in how functions are called in different circumstances. Maybe 5.4 is more uniform. Also coroutines are never JITed. In Ravi when JITed tail calls become regular calls.

JITing regular Lua code gives very little benefit in Ravi ... But maybe 5.4 byte codes are more specialised in some cases so you are seeing better performance than I have observed.
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua-AOT 5.4

Dibyendu Majumdar


On Thu, 6 Feb 2020, 16:25 Dibyendu Majumdar, <[hidden email]> wrote:


On Thu, 6 Feb 2020, 16:15 Hugo Musso Gualandi, <[hidden email]> wrote:



>However, I stuck to the public Lua C APIs, so the resulting code ended
>up
>slower than the stock interpreter.

Lua-AOT uses the internal Lua APIs, and therefore only works with a modified version of Lua VM that exposes these APIs. It implements each VM instruction using code that is essentially copy pasted from lvm.c.

One consequence of this is that at runtime the compiled functions in Lua-AOT are also represented as "Lua closures" instead of "C closures", as would be done in something that uses the regular C API.

One related work that I can think of right now would be Gabriel Ligneuls bytecode to LLVM compiler
https://github.com/gligneul/FastLua

Another would be the Lua Vermelha JIT https://github.com/Leonardo2718/lua-vermelha

The main difference in Lua-AOT's case is that it is very simple, since it is basically just copy pasting the regular interpreter code. This makes it less suitable as a production tool, but also means it's easy to experiment with the implementation

Ravi's JIT backend is essentially the same except there are additional byte codes that are type specialised. However in 5.3 there were issues caused by differences in how functions are called in different circumstances. Maybe 5.4 is more uniform. Also coroutines are never JITed. In Ravi when JITed tail calls become regular calls.

JITing regular Lua code gives very little benefit in Ravi ... But maybe 5.4 byte codes are more specialised in some cases so you are seeing better performance than I have observed.

I should add that one optimization in Ravi is that loop index variable in fornum loops uses stack variable.
Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Lua-AOT 5.4

Hugo Musso Gualandi
In reply to this post by Dibyendu Majumdar



>Ravi's JIT backend is essentially the same except there are additional
>byte codes that are type specialised.

Definitely! I should have mentioned Ravi in that list too. I guess I forgot because my mind put Ravi together with Pallene in the bucket of "languages with type annotations".

>JITing regular Lua code gives very little benefit in Ravi ... But maybe
>5.4
>byte codes are more specialised in some cases so you are seeing better
>performance than I have observed.

The initial version of Lua-AOT was for 5.3. Out of the top of my mind, the results were similar, but the speedup for the 5.4 version was a smaller because 5.4 starts with a better performance baseline.

That said, I ran only a limited set of experiments, much less than what you tested with Ravi. The speedups I listed were for benchmarks from the "benchmarks game" set of benchmarks, which are the sort of microbenchmark that even a simple compiler can do well in.

Overall, I think my results were compatible with what you found. Compiling with type information (as done in Ravi and Pallene) goes much farther than compiling without type information.

-- Hugo