[ANN] SHA2 in pure Lua optimized for every Lua version

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANN] SHA2 in pure Lua optimized for every Lua version

Egor Skriptunoff-2
Hi!
 
I wrote a module with SHA2 functions in pure Lua.
The code tries to use specific benefits of your system to make calculations as fast as possible.
For every Lua version the module contains separate implementation branch optimized for this version.
 
The project is here:
https://github.com/Egor-Skriptunoff/pure_lua_SHA2
 
Bitwise calculations in Lua have painful history of improvements,
every such improvement brings new incompatibility between Lua versions  :-)
 
This is the timeline of SHA256 performance improvements from Lua 5.1 to Lua 5.4:
 
Lua 5.1 -> Lua 5.2
Introducing of "bit32" library improved SHA256 performance in about 5 times.
 
Lua 5.2 -> Lua 5.3
Replacing "bit32" library with native bitwise operators improved SHA256 performance in about 3 times.
 
Lua 5.3 -> Lua 5.4
Optimization of Lua VM improved SHA256 performance in about 1.5 times.
 
 
Benchmark details are on this picture:
https://i.imgur.com/ndhtKuf.png
 
The vertical axis is SHA2 calculation speed in bytes per second.
The scale is logarithmic: one pixel upwards means 1.009 times faster.
For example, 77 pixels upwards means 2 times faster (1.009^77 == 2)
 
Each item is connected to minimum and maximum speed values obtained across different compilers Lua was built.
 
"Unreachable Ideal" corresponds to the speed of "sha256sum"/"sha512sum" utilities, measured with the following commands:
   time dd bs=1M count=1K if=/dev/zero | sha256sum -
   time dd bs=1M count=1K if=/dev/zero | sha512sum -
 
 
Some interesting observations from "sha2.lua" benchmark:
 
1)
SHA512 on 64-bit systems is faster than SHA256.
So, replacing SHA256 with SHA512/256 in your projects would increase both performance and safety.
 
2)
If we compare x86 Lua 5.4 having 32-bit integers (LUA_INT_TYPE=LUA_INT_INT) with "normal" x86 Lua 5.4 having 64-bit integers:
- When calculating SHA256, the former is faster.
- When calculating SHA512, the latter is faster.
Explanation:
Implementing "int64" datatype on 32-bit architecture is costly,
and the profit of introducing "int64" is negative when application doesn't need 64-bit integers
(application is compelled to use expensive "int64" datatype to work with cheap 32-bit values).
 
3)
How SHA2 performance depends on the compiler Lua was built:
Performance was measured on Windows (Visual Studio and MinGW) and Debian.
- For x86 architecture, Visual Studio usually gives better result.
- For x64 architecture, gcc is usually better.
The difference might be very big.
Example: comparing MinGW vs. Visual Studio:
- Lua 5.1 x86: MinGW is 20% faster than Visual Studio;
- Lua 5.3 x86: Visual Studio is 3 times faster than MinGW.
 
4)
LuaJIT's black magic is very unpredictable, especially on x86.
For example, any minor change in the source .lua file (such as relocation of function's definition) might change LuaJIT performance up to 4 times.
 
5)
How performance depends on CPU:
Vanilla Lua on Intel i7 CPU is twice faster as on Intel i5 CPU with the same CPU frequency.
Probably, i7 was designed to greatly improve execution speed of typical C-compiled code (or this is just because i7 L3 cache is twice larger?).
LuaJIT has the same performance on both i5 and i7 (LuaJIT generates optimized machine code directly and doesn't get benefits from this i7 improvement).
 
6)
LuaJIT 2.1 is way better than LuaJIT 2.0:
- On 32-bit calculations LuaJIT 2.1 appears to be up to 1.5 times faster.
- On 64-bit calculations "LuaJIT 2.1 with FFI" is much faster than "LuaJIT 2.0 with FFI" due to significant improvement in "bit" library: in LuaJIT 2.1 bitwise functions can accept "int64_t" arguments, while in LuaJIT 2.0 only 32-bit numbers are allowed.
 
7)
How fast is Fengari:
Fengari in FireFox is 10-15 times slower than Lua 5.3 x86 built with LUA_INT_TYPE=LUA_INT_INT
(Fengari implements "int32" integers instead of "int64")
 
8)
How fast is Ravi:
I downloaded "ravi-distro-0.4-windows-64bit-ravi".
I was interested in testing performance of SHA2 under Ravi JIT.
I didn't use any Ravi extensions, my intention was to use Ravi as "JIT-powered Lua 5.3" for executing usual (non-Ravi) Lua scripts.
In interpreter mode Ravi runs 5-8% faster than Lua 5.3, but significantly slower than Lua 5.4.
I tried to turn JIT on with "ravi.auto(true)", but Ravi crashed
with warning "constant 4294967296 is so big it is long long"
and error "for initial value must be a number"
(although there was no "for" loop at the line mentioned in the error message).
The error could be reproduced by running
ravi.exe -e"ravi.auto(true)" sha2.lua
I'm waiting for next Ravi release to see how fast SHA2 is under Ravi JIT.
 
-- Egor

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] SHA2 in pure Lua optimized for every Lua version

Peter Melnichenko-2
On Sun, Oct 7, 2018 at 2:02 AM Egor Skriptunoff
<[hidden email]> wrote:
>
> Hi!
>
> I wrote a module with SHA2 functions in pure Lua.
> The code tries to use specific benefits of your system to make calculations as fast as possible.
> For every Lua version the module contains separate implementation branch optimized for this version.
> <snip>
> -- Egor
>

Hello Egor!

This is very impressive! I've just released an optimized version of
SHA1 in pure Lua
based on work by Enrique García Cota and I will definitely have some
optimization to lift
(with appropriate credits) from this.

For Lua 5.1 I've added an optimized way to build look-up tables for
8bit operators.
Previous version of SHA1 used a lot of time to build these when the
module was required.
In the case of your module start up is already fast (0.05 seconds on
my machine vs
0.02 with the version from SHA1) so it's not that important, but
perhaps this way
to build the look-up table is at least interesting. It can be found here:

https://github.com/mpeterv/sha1/blob/v0.6.0/src/sha1/pure_lua_ops.lua#L15-L82

-- Best regards,
-- Peter Melnichenko

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] SHA2 in pure Lua optimized for every Lua version

Egor Skriptunoff-2
On Sun, Oct 7, 2018 at 7:57 PM Peter Melnichenko wrote:
I've just released an optimized version of
SHA1 in pure Lua

Your project is very similar to mine.  :-)
 
For Lua 5.1 I've added an optimized way to build look-up tables for
8bit operators.
Previous version of SHA1 used a lot of time to build these when the
module was required.
In the case of your module start up is already fast (0.05 seconds on
my machine vs
0.02 with the version from SHA1) so it's not that important, but
perhaps this way
to build the look-up table is at least interesting.

Yes, the way you're constructing the look-up table is really interesting (and is faster than my look-up table constructor).
Thanks for pointing it out.
I've updated my repo to make the module load faster on Lua 5.1.