Lua performance

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Lua performance

Dibyendu Majumdar
Hi,

It seems that performance of Lua is improving steadily and Ravi
benefits from this as well.

Here are some recent test results.

Benchmark: matrix multiplication

Lua 5.3.4:  25.5 seconds

Lua (github):  18.3 seconds

ravi (computed goto): 16.6 seconds

ravi (computed goto & disabled lua hook):   15 seconds

ravi (computed goto & type annotations): 11.1 seconds

ravi (computed goto & type annotations and disabled lua hook): 10.6 seconds

luajit (v2.1 github, -j off): 9.4 seconds


All are interpreter timings on 64-bit Mac OSX 10.11.6.

I believe that LuaJIT's Interpreter VM has equivalent of computed goto
and disabled Lua hook by default - please correct me if I am mistaken.

I will share results from some other benchmarks - the trend is similar to above.

My impression is that on Mac OSX at least computed goto's are worth
having as an option.

Regards
Dibyendu

Reply | Threaded
Open this post in threaded view
|

Re: Lua performance

Italo Maia
What would a lua hook be?

Em 15 de dez de 2017 9:35 PM, "Dibyendu Majumdar" <[hidden email]> escreveu:
Hi,

It seems that performance of Lua is improving steadily and Ravi
benefits from this as well.

Here are some recent test results.

Benchmark: matrix multiplication

Lua 5.3.4:  25.5 seconds

Lua (github):  18.3 seconds

ravi (computed goto): 16.6 seconds

ravi (computed goto & disabled lua hook):   15 seconds

ravi (computed goto & type annotations): 11.1 seconds

ravi (computed goto & type annotations and disabled lua hook): 10.6 seconds

luajit (v2.1 github, -j off): 9.4 seconds


All are interpreter timings on 64-bit Mac OSX 10.11.6.

I believe that LuaJIT's Interpreter VM has equivalent of computed goto
and disabled Lua hook by default - please correct me if I am mistaken.

I will share results from some other benchmarks - the trend is similar to above.

My impression is that on Mac OSX at least computed goto's are worth
having as an option.

Regards
Dibyendu

Reply | Threaded
Open this post in threaded view
|

Re: Lua performance

Dibyendu Majumdar
On 16 December 2017 at 01:05, Italo Maia <[hidden email]> wrote:
> What would a lua hook be?
>

Apologies I should have been clearer about this. Before a bytecode is
executed Lua checks if it needs to invoke a hook. This check has some
overhead especially when computed gotos are enabled, as you can
imagine at every jump there is an 'if .. ' check.

Regards
Dibyendu

Reply | Threaded
Open this post in threaded view
|

Re: Lua performance

Italo Maia
Oh, thanks for the explanation.

Em 15 de dez de 2017 10:12 PM, "Dibyendu Majumdar" <[hidden email]> escreveu:
On 16 December 2017 at 01:05, Italo Maia <[hidden email]> wrote:
> What would a lua hook be?
>

Apologies I should have been clearer about this. Before a bytecode is
executed Lua checks if it needs to invoke a hook. This check has some
overhead especially when computed gotos are enabled, as you can
imagine at every jump there is an 'if .. ' check.

Regards
Dibyendu

Reply | Threaded
Open this post in threaded view
|

Re: Lua performance

Hugo Musso Gualandi
In reply to this post by Dibyendu Majumdar
> All are interpreter timings on 64-bit Mac OSX 10.11.6. My impression is
that on Mac OSX at least computed goto's are worth having as an option.

A while ago I also tested the effect that computed gotos had on the Lua
interpreter and it depended a lot on the microarchitecture of the CPU. Do
you know what is the CPU model that you ran the tests in?

When I tested in a machine with a Sandy Bridge CPU the default Lua
interpreter had a high branch-misprediction rate, which improved after
switching to computed gotos. However, when I tested on a more recent
recent Haswell machine the indirect branch predictor was able to predict
the branching behavior of the interpreter loop just fine, and the computed
gotos were not needed.

-- Hugo


Reply | Threaded
Open this post in threaded view
|

Re: Lua performance

Dibyendu Majumdar
On 16 December 2017 at 06:46,  <[hidden email]> wrote:
>> All are interpreter timings on 64-bit Mac OSX 10.11.6. My impression is
> that on Mac OSX at least computed goto's are worth having as an option.
>
> A while ago I also tested the effect that computed gotos had on the Lua
> interpreter and it depended a lot on the microarchitecture of the CPU. Do
> you know what is the CPU model that you ran the tests in?
>

Believe it is :
https://ark.intel.com/products/80807/Intel-Core-i7-4790K-Processor-8M-Cache-up-to-4_40-GHz

sysctl -n machdep.cpu.brand_string

Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz

So I am not the most recent architecture.

> When I tested in a machine with a Sandy Bridge CPU the default Lua
> interpreter had a high branch-misprediction rate, which improved after
> switching to computed gotos. However, when I tested on a more recent
> recent Haswell machine the indirect branch predictor was able to predict
> the branching behavior of the interpreter loop just fine, and the computed
> gotos were not needed.
>

How do you measure branch-misprediction rate?

Thanks and Regards
Dibyendu

Reply | Threaded
Open this post in threaded view
|

Re: Lua performance

Frank Kastenholz-2
In reply to this post by Dibyendu Majumdar
As a comparison, what is the time for the benchmark to run if it's coded in C?

Work I did 2-3 years ago showed between a 5x to 10x improvement when going from Lua to c.  If that holds, a C implementation would take between 2.5 and 5 seconds,  comparing very favorably with luajit and ravi

Frank

> On Dec 15, 2017, at 7:35 PM, Dibyendu Majumdar<[hidden email]> wrote:
>
> Hi,
>
> It seems that performance of Lua is improving steadily and Ravi
> benefits from this as well.
>
> Here are some recent test results.
>
> Benchmark: matrix multiplication
>
> Lua 5.3.4:  25.5 seconds
>
> Lua (github):  18.3 seconds
>
> ravi (computed goto): 16.6 seconds
>
> ravi (computed goto & disabled lua hook):   15 seconds
>
> ravi (computed goto & type annotations): 11.1 seconds
>
> ravi (computed goto & type annotations and disabled lua hook): 10.6 seconds
>
> luajit (v2.1 github, -j off): 9.4 seconds
>
>
> All are interpreter timings on 64-bit Mac OSX 10.11.6.
>
> I believe that LuaJIT's Interpreter VM has equivalent of computed goto
> and disabled Lua hook by default - please correct me if I am mistaken.
>
> I will share results from some other benchmarks - the trend is similar to above.
>
> My impression is that on Mac OSX at least computed goto's are worth
> having as an option.
>
> Regards
> Dibyendu
>


Reply | Threaded
Open this post in threaded view
|

Re: Lua performance

Pierre Chapuis
On Sat, Dec 16, 2017, at 14:00, Frank Kastenholz wrote:
> As a comparison, what is the time for the benchmark to run if it's coded
> in C?
>
> Work I did 2-3 years ago showed between a 5x to 10x improvement when
> going from Lua to c.  If that holds, a C implementation would take
> between 2.5 and 5 seconds,  comparing very favorably with luajit and ravi

Note that in this benchmark LuaJIT was run with `-j off`, i.e. with JIT
disabled, in pure interpreter mode...

--
Pierre Chapuis

Reply | Threaded
Open this post in threaded view
|

Re: Lua performance

Dibyendu Majumdar
In reply to this post by Frank Kastenholz-2
On 16 December 2017 at 13:00, Frank Kastenholz <[hidden email]> wrote:
> As a comparison, what is the time for the benchmark to run if it's coded in C?
>
> Work I did 2-3 years ago showed between a 5x to 10x improvement when going from Lua to c.  If that holds, a C implementation would take between 2.5 and 5 seconds,  comparing very favorably with luajit and ravi
>
>> luajit (v2.1 github, -j off): 9.4 seconds
>>

I was comparing the interpreter performance - of course things get
faster when JIT is enabled. Example:

luajit matmul1.lua

time taken 0.882477

Regards
Dibyendu

Reply | Threaded
Open this post in threaded view
|

Re: Lua performance

Hugo Musso Gualandi
In reply to this post by Dibyendu Majumdar

> Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz

Hmm... That 4 means this is a Haswell CPU. There shouldnt have been an improvement with computed gotos according to my prediction...

> How do you measure branch-misprediction rate?

This information comes from hardware performance counters. I used `perf stat` to view them but that is a Linux-kernel tool not available on MacOS. Hopefully someone more knowledgeable about MacOS can help us here.

-- Hugo
Reply | Threaded
Open this post in threaded view
|

Fwd: Lua performance

Dibyendu Majumdar
In reply to this post by Dibyendu Majumdar
I did a quick test of 5.4 work1.

The benchmark below ran in: 12.5 seconds.

Congratulations!

Regards
---------- Forwarded message ----------
From: Dibyendu Majumdar
Date: 16 December 2017 at 00:35
Subject: Lua performance

Hi,

It seems that performance of Lua is improving steadily and Ravi
benefits from this as well.

Here are some recent test results.

Benchmark: matrix multiplication

Lua 5.3.4:  25.5 seconds

Lua (github):  18.3 seconds

ravi (computed goto): 16.6 seconds

ravi (computed goto & disabled lua hook):   15 seconds

ravi (computed goto & type annotations): 11.1 seconds

ravi (computed goto & type annotations and disabled lua hook): 10.6 seconds

luajit (v2.1 github, -j off): 9.4 seconds


All are interpreter timings on 64-bit Mac OSX 10.11.6.

I believe that LuaJIT's Interpreter VM has equivalent of computed goto
and disabled Lua hook by default - please correct me if I am mistaken.

I will share results from some other benchmarks - the trend is similar to above.

My impression is that on Mac OSX at least computed goto's are worth
having as an option.

Regards
Dibyendu

Reply | Threaded
Open this post in threaded view
|

Re: Lua performance

云风 Cloud Wu
In reply to this post by Dibyendu Majumdar

> 在 2017年12月16日,上午9:11,Dibyendu Majumdar <[hidden email]> 写道:
>
>> On 16 December 2017 at 01:05, Italo Maia <[hidden email]> wrote:
>> What would a lua hook be?
>>
>
> Apologies I should have been clearer about this. Before a bytecode is
> executed Lua checks if it needs to invoke a hook. This check has some
> overhead especially when computed gotos are enabled, as you can
> imagine at every jump there is an 'if .. ' check.

So, maybe we can write two version luaV_execute ? With and without hook?



Reply | Threaded
Open this post in threaded view
|

Re: Lua performance

Dibyendu Majumdar
On 17 March 2018 at 06:51, 云风 <[hidden email]> wrote:

>
>> 在 2017年12月16日,上午9:11,Dibyendu Majumdar <[hidden email]> 写道:
>>
>>> On 16 December 2017 at 01:05, Italo Maia <[hidden email]> wrote:
>>> What would a lua hook be?
>>>
>>
>> Apologies I should have been clearer about this. Before a bytecode is
>> executed Lua checks if it needs to invoke a hook. This check has some
>> overhead especially when computed gotos are enabled, as you can
>> imagine at every jump there is an 'if .. ' check.
>
> So, maybe we can write two version luaV_execute ? With and without hook?
>

Yes you can; in LuaJIT I believe each opcode gets decorated by a
wrapper that forwards to the base opcode; this is possible by
modifying the jump table at runtime. In C you cannot do that.

Note that Roberto has tried to minimise the impact of the hook by
using a local variable (trap) to avoid checking the hook - nonetheless
this code gets replicated in every computed goto branch; and is not
only wasteful but also negatively affects optimisation I would guess.

Regards
Dibyendu

Reply | Threaded
Open this post in threaded view
|

Re: Lua performance

Viacheslav Usov
In reply to this post by 云风 Cloud Wu
On Sat, Mar 17, 2018 at 7:51 AM, 云风 <[hidden email]> wrote:

>  So, maybe we can write two version luaV_execute ? With and without hook?

If you are concerned about the impact of hooks on performance, you should definitely do that and measure the results. Then we can talk :)

Cheers,
V.