SipHash for Lua 5.4.0 patch

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

SipHash for Lua 5.4.0 patch

Sam Trenholme
As promised earlier today, here is a SipHash patch for Lua 5.4.0, which
replaces its default hash compression code with the fast and secure
SipHash algorithm.  This patch will cause Lua 5.4.0 to generate two
harmless warnings while compiling lstring.c.

Here is my benchmark: No difference in speed.

SipHash compile Stock Lua 5.4.0
real    0m23.038s real    0m22.653s
user    0m22.414s user    0m22.034s
sys     0m0.619s sys     0m0.617s

real    0m23.491s real    0m24.177s
user    0m22.859s user    0m23.289s
sys     0m0.595s sys     0m0.598s

real    0m23.375s real    0m23.262s
user    0m22.803s user    0m22.693s
sys     0m0.569s sys     0m0.566s

To replicate this benchmark:

git clone https://github.com/Samboy/covid-19-html/
cd covid-19-html
git clone https://github.com/nytimes/covid-19-data/
cp covid-19-data/us-counties.csv data.csv
time lua examine-growth.lua benchmark

Note that this patch does not make an attempt to generate a random key
for SipHash.  The way to do so is implementation dependent, but
*usually* can be done with /dev/random (I posted a Lua script earlier
today which uses /dev/random to make a SipHash.h file with a random
key).

The patch is public domain.  Both this patch and a similar patch for Lua
5.1.5 are available here:

https://github.com/samboy/LUAstuff/

There’s also some Lua code there for doing stuff like sorted table
iterations, splitting lines with regular expressions or comma separated
data, copying files, etc. -- this is my collection of “batteries” for
Lua so I can use it to solve my day to day computing problems.

-- Sam



lua-5.4.0-SipHash.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: SipHash for Lua 5.4.0 patch

Sam Trenholme
> No difference in speed.

Note that there is about a 20% slowdown when we run the same benchmark
as an i386 binary compiled with GCC 3.2.3 from 2002. SipHash is
incredibly fast on a modern 64-bit processor because it can run its core
algorithm using only four 64-bit registers. I would suspect that there
would be less slowdown on a 32-bit ARM or other RISC architecture
because they do not have the register starvation i386 has.
Reply | Threaded
Open this post in threaded view
|

Re: SipHash for Lua 5.4.0 patch

Gé Weijers
On Sat, Aug 8, 2020 at 10:16 PM Sam Trenholme <[hidden email]> wrote:
>
> > No difference in speed.
>
> Note that there is about a 20% slowdown when we run the same benchmark
> as an i386 binary compiled with GCC 3.2.3 from 2002. SipHash is
> incredibly fast on a modern 64-bit processor because it can run its core
> algorithm using only four 64-bit registers. I would suspect that there
> would be less slowdown on a 32-bit ARM or other RISC architecture
> because they do not have the register starvation i386 has.

On i386-type processors less than 20 years old you could use SSE2
instructions to improve the performance. It won't be as fast as on a
64-bit architecture with plenty of registers, but it would still speed
things up.
Public domain code can be found here: https://github.com/floodyberry/siphash


Reply | Threaded
Open this post in threaded view
|

Re: SipHash for Lua 5.4.0 patch

Sam Trenholme
>> > No difference in speed [Using 64-bit SipHash 2-4 with x86_64]
>>
>> Note that there is about a 20% slowdown when we run the same benchmark
>> as an i386 binary compiled with GCC 3.2.3 from 2002.
>
> On i386-type processors less than 20 years old you could use SSE2
> instructions
>
> Public domain code can be found here:
> https://github.com/floodyberry/siphash

That’s one way to speed things up if using x86 code.  Another would be
to use HalfSipHash 1-3 instead of the SipHash 2-4 I use in the current
code.

Neither SipHash 1-3 nor HalfSipHash are readily documented at
https://131002.net/siphash/ so I will explain them.  HalfSipHash is a
variant of SipHash that uses 32-bit words instead of 64-bit words.  It’s
described, along with reference test vectors, over at:
https://github.com/veorq/SipHash

SipHash 1-3 is a simple variant which does one round of the SipHash
“compression” function while hashing the input (instead of two rounds),
and three rounds of the SipHash “compression” function after the input
ends.  In the patch I provided, there’s a line which looks like this:

     for(round = 0; round < 2; round++) {

Replace the line to look like this:

     for(round = 0; round < 1; round++) {

Or, better yet, just get rid of that particular for loop in SipHash 1-3.

There’s a another line in my patch which looks like this:

     for(round = 0; round < 4; round++) {

In SipHash 1-3, make it:

     for(round = 0; round < 3; round++) {

Some more discussion of SipHash 1-3:

https://bugs.ruby-lang.org/issues/13017
https://github.com/rust-lang/rust/issues/29754#issue-116174313
https://github.com/rust-lang/rust/issues/29754#issuecomment-156073946

Personally, I think i386-based systems are rare enough here in 2020 that
we do not need to worry about the 20% overall performance impact of
using SipHash instead of Lua’s default hasher, so that’s the approach I
take.  SipHash 2-4 is fast enough on x86_64 that I can’t see any
real-world performance hit by using it.

The biggest issue with adding SipHash to strictly ANSI C compliant code
is that it needs 128 bits of randomness to make good keys which are
strong against hash collision attacks.  The code I have to gather
entropy is OS-specific and not ANSI C, looking like this:

-----------------------------------------------------------------------------
         char noise[17];
#ifndef MINGW
         FILE *rfile = NULL;
         rfile = fopen("/dev/urandom","rb");
         for(a=0;a<16;a++) { noise[a] = getc(rfile); }
#else // Windows API
         HCRYPTPROV CryptContext;
         CryptAcquireContext(&CryptContext, NULL, NULL, PROV_RSA_FULL,
                 CRYPT_VERIFYCONTEXT);
         CryptGenRandom(CryptContext, 16, noise);
-----------------------------------------------------------------------------

— Sam