Feedback on (awesome) performance of lua 5.4 (work1)

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Feedback on (awesome) performance of lua 5.4 (work1)

Rodrigo Azevedo
This message is an update about the performance of the following code below, designed to be a minimum benchmark of some Lua-ish features that also performs an ordinary real-world convolution operation.
 
################# BEGIN  #################
-- teste_gc.lua
--collectgarbage("incremental")
--collectgarbage("generational")
N = 2.0e7

C = {}
for i=1,N do C[i] = i end

local max,min = math.max, math.min
local conv = function(u,v)
  local m,n = #u,#v
  local w = {}
  for k=1,m+n-1,1 do
    local sum = 0.0
    for j = max(1,k+1-n),min(k,m) do sum = sum + u[j]*v[k-j+1] end
    w[k] = sum
  end
  return table.unpack(w)
end

print(string.format("%.1f",collectgarbage("count")))
for i=1,2*N do
    local a,b = {1,2,3,4},{5,6}
    local res = {conv(a,b)}
end
print(string.format("%.1f",collectgarbage("count")))

################# END #################

Methodology:

Default lua5.2 and lua5.3 from Ubuntu.  Lua 5.4 (work1) compiled *without* -D_LUA53COMPAT and -D_NILINTABLE.

Default parameters of gc. I'm not trying to optimize anything, only checking the default options of each Lua version.

Results:

----------------------

time lua5.2 teste_gc.lua
524315.5
1225547.7 -- NOT GOOD

real    1m33.749s
user    1m33.313s
sys    0m0.412s

------------------------

time lua5.3 teste_gc.lua
524315.5
1225487.5 -- STILL NOT GOOD

real    1m32.680s
user    1m32.051s -- NO CHANGE
sys    0m0.588s

-----------------------

INCREMENTAL GC

time ./lua5.4 teste_gc.lua
524309.6
1028321.7 -- BETTER THAN PREVIOUS, BUT NOT YET GOOD

real    1m13.322s
user    1m13.083s -- THANKFUL SURPRISE
sys    0m0.240s

----------------------

GENERATIONAL GC

time ./lua5.4 teste_gc.lua
524309.6
596331.1 -- VERY GOOD, AWESOME

real    1m9.472s
user    1m9.208s -- ALSO AWESOME
sys    0m0.172s

-----------------------------

Discussion:

The (default) generational garbage collector of lua5.4 performs much better than all previous versions, concerning memory usage AND  time! That's awesome.

Conclusion:

Thank you very much Lua team!

Question: Why the incremental Lua 5.4 performs better (CPU TIME) than Lua 5.2 or 5.3? My guess is the new VM instructions (what are the new ones?), am I right? or some optimization of the incremental gc? anything else?

(I've tried (5.4) luac but it seems to be "not yet ready")

--
Rodrigo Azevedo Moreira da Silva
Reply | Threaded
Open this post in threaded view
|

Re: Feedback on (awesome) performance of lua 5.4 (work1)

Albert Chan
On Mar 17, 2018, at 10:53 AM, Rodrigo Azevedo <[hidden email]> wrote:

> Question: Why the incremental Lua 5.4 performs better (CPU TIME) than Lua 5.2 or 5.3? My guess is the new VM instructions (what are the new ones?), am I right? or some optimization of the incremental gc? anything else?
>

Is it because of removal of implicit string to number conversion code ?


Reply | Threaded
Open this post in threaded view
|

Re: Feedback on (awesome) performance of lua 5.4 (work1)

Luiz Henrique de Figueiredo
> Is it because of removal of implicit string to number conversion code ?

I don't see any such conversions in the posted code.

Reply | Threaded
Open this post in threaded view
|

Re: Feedback on (awesome) performance of lua 5.4 (work1)

Albert Chan
On Mar 17, 2018, at 11:48 AM, Luiz Henrique de Figueiredo <[hidden email]> wrote:

>> Is it because of removal of implicit string to number conversion code ?
>
> I don't see any such conversions in the posted code.
>

Even without actual conversion, there is a cost to implicit conversion (lua 5.3)

My guess is lua 5.3 implicit string to number conversion (even if not needed)
may outweigh actual matrix calculations.


Reply | Threaded
Open this post in threaded view
|

Re: Feedback on (awesome) performance of lua 5.4 (work1)

Dibyendu Majumdar
In reply to this post by Rodrigo Azevedo
On 17 March 2018 at 14:53, Rodrigo Azevedo <[hidden email]> wrote:

>
> Question: Why the incremental Lua 5.4 performs better (CPU TIME) than Lua
> 5.2 or 5.3? My guess is the new VM instructions (what are the new ones?), am
> I right? or some optimization of the incremental gc? anything else?
>

Lua 5.4 has several optimizations for bytecodes that is used in
numeric programs - load, indexing, arithmetic ops codes, for num
opcodes etc. are specialized for situations where the parser can
detect numeric constants and infer types. For instance now there are
GETI/SETI opcodes for extracting/updating values table values where
the index key is an integer. The VM also has optimizations such as
avoiding operations not needed or minimizing the impact of those. The
net result is improved performance which is very welcome.

I am not sure whether your test reflects on the GC performance though
- why do you think the type of GC is relevant?

Regards
Dibyendu

Reply | Threaded
Open this post in threaded view
|

Re: Feedback on (awesome) performance of lua 5.4 (work1)

Dibyendu Majumdar
In reply to this post by Rodrigo Azevedo
On 17 March 2018 at 14:53, Rodrigo Azevedo <[hidden email]> wrote:

> This message is an update about the performance of the following code below,
> designed to be a minimum benchmark of some Lua-ish features that also
> performs an ordinary real-world convolution operation.
>
> ################# BEGIN  #################
> -- teste_gc.lua
> --collectgarbage("incremental")
> --collectgarbage("generational")
> N = 2.0e7
>
> C = {}
> for i=1,N do C[i] = i end
>
> local max,min = math.max, math.min
> local conv = function(u,v)
>   local m,n = #u,#v
>   local w = {}
>   for k=1,m+n-1,1 do
>     local sum = 0.0
>     for j = max(1,k+1-n),min(k,m) do sum = sum + u[j]*v[k-j+1] end
>     w[k] = sum
>   end
>   return table.unpack(w)
> end
>

I recommend also testing this without the max(), min() function calls
inside the loop; as this may skew results due to large overhead in
function calls. Make these inline and see what difference you get
between the various versions.

Reply | Threaded
Open this post in threaded view
|

Re: Feedback on (awesome) performance of lua 5.4 (work1)

Tony Papadimitriou-2
In reply to this post by Albert Chan
On the other hand, why does this "a=os.time() for i=1,1000000000 do end
print(os.time()-a)" run much slower on 5.4 compared to 5.3?

Lua 5.3.4  Copyright (C) 1994-2017 Lua.org, PUC-Rio
>a=os.time() for i=1,1000000000 do end print(os.time()-a)
9

Lua 5.4.0 (work1)  Copyright (C) 1994-2018 Lua.org, PUC-Rio
> a=os.time() for i=1,1000000000 do end print(os.time()-a)
14


Reply | Threaded
Open this post in threaded view
|

Re: Feedback on (awesome) performance of lua 5.4 (work1)

Rodrigo Azevedo
2018-03-17 15:06 GMT-03:00 Tony Papadimitriou <[hidden email]>:
On the other hand, why does this "a=os.time() for i=1,1000000000 do end print(os.time()-a)" run much slower on 5.4 compared to 5.3?

Lua 5.3.4  Copyright (C) 1994-2017 Lua.org, PUC-Rio
a=os.time() for i=1,1000000000 do end print(os.time()-a)
9

Lua 5.4.0 (work1)  Copyright (C) 1994-2018 Lua.org, PUC-Rio
a=os.time() for i=1,1000000000 do end print(os.time()-a)
14


I can't reproduce it!

For my machine

##### BEGIN #####
-- teste_time.lua
a=os.time() for i=1,1000000000 do end print(os.time()-a)
##### END ######

ROUND 1

-----------------------

time lua5.2 teste_time.lua
4

real    0m3.675s
user    0m3.675s
sys    0m0.000s

------------------------

time lua5.3 teste_time.lua
4

real    0m3.630s
user    0m3.612s
sys    0m0.004s

------------------------

time ./lua5.4 teste_time.lua
3

real    0m3.022s
user    0m2.996s
sys    0m0.016s

ROUND 2

----------------------------------------


Lua 5.4.0 (work1)  Copyright (C) 1994-2018 Lua.org, PUC-Rio
> a=os.time() for i=1,1000000000 do end print(os.time()-a)
3
> os.exit()

Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> a=os.time() for i=1,1000000000 do end print(os.time()-a)
3
> os.exit()

-----------------------------------

Conclusion: lua5.4 is faster, even doing a pointless loop.

--
Rodrigo Azevedo Moreira da Silva
Reply | Threaded
Open this post in threaded view
|

Re: Feedback on (awesome) performance of lua 5.4 (work1)

Tony Papadimitriou-2
You’re right.  My bad!
(I was using 5.3.4 mistakenly compiled with a different compiler.)
 
When compiled with the same compiler, the results are:
 
Lua 5.3.4  Copyright (C) 1994-2017 Lua.org, PUC-Rio
>a=os.time() for i=1,1000000000 do end print(os.time()-a)
21
 
 
c:\temp>lua54
Lua 5.4.0 (work1)  Copyright (C) 1994-2018 Lua.org, PUC-Rio
> a=os.time() for i=1,1000000000 do end print(os.time()-a)
14
 
 
Sorry for the noise!
 
Sent: Saturday, March 17, 2018 11:24 PM
Subject: Re: Feedback on (awesome) performance of lua 5.4 (work1)
 
2018-03-17 15:06 GMT-03:00 Tony Papadimitriou <[hidden email]>:
On the other hand, why does this "a=os.time() for i=1,1000000000 do end print(os.time()-a)" run much slower on 5.4 compared to 5.3?

Lua 5.3.4  Copyright (C) 1994-2017 Lua.org, PUC-Rio
a=os.time() for i=1,1000000000 do end print(os.time()-a)
9

Lua 5.4.0 (work1)  Copyright (C) 1994-2018 Lua.org, PUC-Rio
a=os.time() for i=1,1000000000 do end print(os.time()-a)
14

 
I can't reproduce it!

For my machine

##### BEGIN #####
-- teste_time.lua
a=os.time() for i=1,1000000000 do end print(os.time()-a)
##### END ######
 
ROUND 1

-----------------------

time lua5.2 teste_time.lua
4

real    0m3.675s
user    0m3.675s
sys    0m0.000s

------------------------

time lua5.3 teste_time.lua
4

real    0m3.630s
user    0m3.612s
sys    0m0.004s

------------------------

time ./lua5.4 teste_time.lua
3

real    0m3.022s
user    0m2.996s
sys    0m0.016s

ROUND 2

----------------------------------------


Lua 5.4.0 (work1)  Copyright (C) 1994-2018 Lua.org, PUC-Rio
> a=os.time() for i=1,1000000000 do end print(os.time()-a)
3
> os.exit()

Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> a=os.time() for i=1,1000000000 do end print(os.time()-a)
3
> os.exit()

-----------------------------------

Conclusion: lua5.4 is faster, even doing a pointless loop.

--
Rodrigo Azevedo Moreira da Silva
Reply | Threaded
Open this post in threaded view
|

Re: Feedback on (awesome) performance of lua 5.4 (work1)

François Perrad
In reply to this post by Dibyendu Majumdar


2018-03-17 17:56 GMT+01:00 Dibyendu Majumdar <[hidden email]>:
On 17 March 2018 at 14:53, Rodrigo Azevedo <[hidden email]> wrote:

>
> Question: Why the incremental Lua 5.4 performs better (CPU TIME) than Lua
> 5.2 or 5.3? My guess is the new VM instructions (what are the new ones?), am
> I right? or some optimization of the incremental gc? anything else?
>

Lua 5.4 has several optimizations for bytecodes that is used in
numeric programs - load, indexing, arithmetic ops codes, for num
opcodes etc. are specialized for situations where the parser can
detect numeric constants and infer types. For instance now there are
GETI/SETI opcodes for extracting/updating values table values where
the index key is an integer. The VM also has optimizations such as
avoiding operations not needed or minimizing the impact of those. The
net result is improved performance which is very welcome.



I could confirm that new opcodes speedup Lua

    $ lua-5.3.4/src/lua harness.lua NBody 7 250000
    Starting NBody benchmark ...
    NBody: iterations=1 runtime: 1207250us
    NBody: iterations=1 runtime: 1194877us
    NBody: iterations=1 runtime: 1188912us
    NBody: iterations=1 runtime: 1205081us
    NBody: iterations=1 runtime: 1203040us
    NBody: iterations=1 runtime: 1216759us
    NBody: iterations=1 runtime: 1210723us
    NBody: iterations=7 average: 1203806us total: 8426642us

    Total Runtime: 8426642us
    $ lua-5.4.0-work1/src/lua harness.lua NBody 7 250000
    Starting NBody benchmark ...
    NBody: iterations=1 runtime: 1024333us
    NBody: iterations=1 runtime: 1013502us
    NBody: iterations=1 runtime: 1014108us
    NBody: iterations=1 runtime: 1013805us
    NBody: iterations=1 runtime: 1012738us
    NBody: iterations=1 runtime: 1013662us
    NBody: iterations=1 runtime: 1019850us
    NBody: iterations=7 average: 1016000us total: 7111998us

    Total Runtime: 7111998us

    $ lua-5.3.4/src/lua -lluasom harness.lua CD 7 1000
    Starting CD benchmark ...
    CD: iterations=1 runtime: 8708780us
    CD: iterations=1 runtime: 9199042us
    CD: iterations=1 runtime: 9239055us
    CD: iterations=1 runtime: 9219889us
    CD: iterations=1 runtime: 9175078us
    CD: iterations=1 runtime: 9289253us
    CD: iterations=1 runtime: 9343105us
    CD: iterations=7 average: 9167743us total: 64174202us

    Total Runtime: 64174202us
    $ lua-5.4.0-work1/src/lua -lluasom harness.lua CD 7 1000
    Starting CD benchmark ...
    CD: iterations=1 runtime: 6767993us
    CD: iterations=1 runtime: 6934665us
    CD: iterations=1 runtime: 6657718us
    CD: iterations=1 runtime: 6816613us
    CD: iterations=1 runtime: 7016126us
    CD: iterations=1 runtime: 6617097us
    CD: iterations=1 runtime: 6846714us
    CD: iterations=7 average: 6808132us total: 47656926us

    Total Runtime: 47656926us

18% faster for NBody
34% faster for CD

These scripts (with some others) are available on https://github.com/fperrad/are-we-fast-yet/tree/alt_lua/benchmarks/Lua
I run them on GNU/Linux Debian 9 x86_64
I built vanilla lua (+ luasocket) with gcc 6.3.0

François