Real-World Impact of Hash DoS in Lua

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Rob Kendrick-2
On Thu, Jan 26, 2012 at 11:29:18AM +0100, Florian Weimer wrote:

> * Roberto Ierusalimschy:
>
> >> OK.  Cool.  This is a showstopper for the company I am working with for
> >> rolling out embedded Lua with nginx.  Is there anything I can do to help?
> >
> > what it is still missing now is how to create the initial per-state
> > random seed. Suggestions included some address and arc4random. I am
> > afraid that, for the backup ANSI implementation, we cannot do much
> > better than something like this:
> >
> >   seed = (unsigned int)time() + (unsigned int)L;
>
> Addresses of a stack variable and a public Lua function should provide
> a few bits of randomness, too.

And perhaps one or two functions from the C library and maths library:
ASLR will mean these will be pleasingly distributed for another few
bits: even more so on 64 bit systems.

> Reading from /dev/urandom might be problematic because drains entropy
> from the whole system.

Not to mention making non-Linux users sad.

B.

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Matthew Wild
In reply to this post by Florian Weimer
On 26 January 2012 10:29, Florian Weimer <[hidden email]> wrote:

> * Roberto Ierusalimschy:
>
>>> OK.  Cool.  This is a showstopper for the company I am working with for
>>> rolling out embedded Lua with nginx.  Is there anything I can do to help?
>>
>> what it is still missing now is how to create the initial per-state
>> random seed. Suggestions included some address and arc4random. I am
>> afraid that, for the backup ANSI implementation, we cannot do much
>> better than something like this:
>>
>>   seed = (unsigned int)time() + (unsigned int)L;
>
> Addresses of a stack variable and a public Lua function should provide
> a few bits of randomness, too.
>
> Instead of addition, you should use one of Bob Jenkin's mixing
> functions.  This means that it's not important where the randomness is
> inside the initial values (sometimes, it's in the lower bits,
> sometimes, somewhere in the middle).
>
> Reading from /dev/urandom might be problematic because drains entropy
> from the whole system.
>
>> We can have better implementations for particular system. For
>> instance, we can use arc4random if present, but how to detect it?
>> Are there any other suggestions?
>
> Have you tried using PATRICIA and similar trie structures for
> interning strings?  I'm just curious how large the performance impact
> would be.  I suspect it's so large that it's not worth it.
>

I'm personally wondering what the downsides are to using a tree to
store conflicting keys. It doesn't seem to be the popular approach to
fixing this problem. Is the code complexity deemed too great? Too much
(runtime) overhead? Laziness?

Regards,
Matthew

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Tony Finch
In reply to this post by Rob Kendrick-2
Rob Kendrick <[hidden email]> wrote:
> On Thu, Jan 26, 2012 at 11:29:18AM +0100, Florian Weimer wrote:
>
> > Reading from /dev/urandom might be problematic because drains entropy
> > from the whole system.
>
> Not to mention making non-Linux users sad.

Neither of these is a problem on Mac OS X or FreeBSD :-)

But Windows and some commercial unices need something else.

Places to look for inspiration: Python's os.urandom(); Perl_seed() in
perl's util.c; OpenSSL crypto/rand/rand_*.c; Apache APR misc/*/rand.c.

Tony.
--
f.anthony.n.finch  <[hidden email]>  http://dotat.at/
Dogger, Fisher, German Bight: Southeasterly, becoming cyclonic later, 6 to
gale 8, occasionally severe gale 9 in Fisher, decreasing 4 or 5 at times.
Moderate or rough, occasionally very rough. Rain then showers. Moderate or
good.

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Sean Conner
In reply to this post by Rob Kendrick-2
It was thus said that the Great Rob Kendrick once stated:
> On Thu, Jan 26, 2012 at 11:29:18AM +0100, Florian Weimer wrote:
>
> > Reading from /dev/urandom might be problematic because drains entropy
> > from the whole system.
>
> Not to mention making non-Linux users sad.

  /dev/urandom exists on Solaris.  

  -spc (So it's not exactly *Linux only*)


Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Paul Hudson-2
But it is a small subset of the things Lua does or could run on.  And since we're talking about a core feature of Lua,  platform/OS dependencies are to be avoided (IMO) if a portable solution can be found.

On 26 January 2012 16:10, Sean Conner <[hidden email]> wrote:
 -spc (So it's not exactly *Linux only*)



Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Sean Conner
It was thus said that the Great Paul Hudson once stated:
> On 26 January 2012 16:10, Sean Conner <[hidden email]> wrote:
>
> >  -spc (So it's not exactly *Linux only*)
>
> But it is a small subset of the things Lua does or could run on.  And since
> we're talking about a core feature of Lua,  platform/OS dependencies are to
> be avoided (IMO) if a portable solution can be found.

  And I was replying to Roberto, who asked:

> We can have better implementations for particular system. For instance, we
> can use arc4random if present, but how to detect it? Are there any other
> suggestions?

  C89 is pretty restrictive in this (seeding a random number generator)
regard.  You really can't even rely upon time() since a C89 implementation
has to only give "its best approximation of the time" which could be 0 (I
don't my references handy at the moment, but once I get home I can cite the
appropriate documenation).  

  Even using the address of a function or variable can fail to provide a
seed, given an implementation that does not randomize addresses.  

  I just tested Linux 2.6.9 and 2.6.35, and yes, both will randomize the
base stack address, but not necessarily library function addresses:

#include <stdio.h>
#include <stdlib.h>

#include <openssl/evp.h>

int main(int argc,char *argv[])
{
  int x;
 
  printf(
        "x:                    %p\n"
        "srand:                %p\n"
        "EVP_get_digestbyname: %p\n",
        (void *)&x,
        (void *)srand,
        (void *)EVP_get_digestbyname
  );
  return EXIT_SUCCESS;
}

2.6.9 (32 bit system):
[spc]lucy:/tmp>./a.out
x:                    0xbff14df4
srand:                0x8048404
EVP_get_digestbyname: 0x8048414
[spc]lucy:/tmp>./a.out
x:                    0xbfe95f14
srand:                0x8048404
EVP_get_digestbyname: 0x8048414
[spc]lucy:/tmp>./a.out
x:                    0xbff4d0f4
srand:                0x8048404
EVP_get_digestbyname: 0x8048414
[spc]lucy:/tmp>./a.out
x:                    0xbffbcfc4
srand:                0x8048404
EVP_get_digestbyname: 0x8048414
[spc]lucy:/tmp>./a.out
x:                    0xbffa9194
srand:                0x8048404
EVP_get_digestbyname: 0x8048414

2.6.35 (64 bit system):
[spc]saltmine:/tmp>./a.out
x:                    0x7fff0e3e0c1c
srand:                0x4005d8
EVP_get_digestbyname: 0x4005c8
[spc]saltmine:/tmp>./a.out
x:                    0x7fff4638091c
srand:                0x4005d8
EVP_get_digestbyname: 0x4005c8
[spc]saltmine:/tmp>./a.out
x:                    0x7fff79e3a16c
srand:                0x4005d8
EVP_get_digestbyname: 0x4005c8
[spc]saltmine:/tmp>./a.out
x:                    0x7fffb2e2b3ac
srand:                0x4005d8
EVP_get_digestbyname: 0x4005c8
[spc]saltmine:/tmp>./a.out
x:                    0x7fff2676dd1c
srand:                0x4005d8
EVP_get_digestbyname: 0x4005c8

I find it interesting though, that even with the random stack addresses, the
lowest four bits on both systems are identical, and the upper N bits (8 bits
for the 32b system, 32 bits for the 64b system) are also identical, which
still isn't ideal for seeding a random number generator (although it's
better than just 0).  

Solaris (v5.10, 64 bit SPARC) is amusing---while the address of x changes
per invocation, it's not exactly random either:

[spc]sol:/tmp>./a.out
x:                    ffffffff7ffff8e8
srand:                100100ae0
EVP_get_digestbyname: 100100b00
[spc]sol:/tmp>./a.out
x:                    ffffffff7ffff8d8
srand:                100100ae0
EVP_get_digestbyname: 100100b00
[spc]sol:/tmp>./a.out
x:                    ffffffff7ffff8c8
srand:                100100ae0
EVP_get_digestbyname: 100100b00
[spc]sol:/tmp>./a.out
x:                    ffffffff7ffff8b8
srand:                100100ae0
EVP_get_digestbyname: 100100b00
[spc]sol:/tmp>./a.out
x:                    ffffffff7ffff8a8
srand:                100100ae0
EVP_get_digestbyname: 100100b00
[spc]sol:/tmp>./a.out
x:                    ffffffff7ffff898
srand:                100100ae0
EVP_get_digestbyname: 100100b00
[spc]sol:/tmp>./a.out
x:                    ffffffff7ffff888
srand:                100100ae0
EVP_get_digestbyname: 100100b00
[spc]sol:/tmp>./a.out
x:                    ffffffff7ffff878
srand:                100100ae0
EVP_get_digestbyname: 100100b00
[spc]sol:/tmp>./a.out
x:                    ffffffff7ffff868
srand:                100100ae0
EVP_get_digestbyname: 100100b00

(the 32b version of this code under Solaris behaves similarly, only with a
32-bit address).

  Even using the address of the lua_State might not work, even if allocated
via malloc(), since the program state up to that point could very well be
deterministic and thus, the same address could be returned each time
(malloc() returned random addresses for the the two Linux systems I tested
on, Solaris returned the same address each time).

  -spc (So, portably seeding the random number generator in C89?  Not so
        easy ...)

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Florian Weimer
* Sean Conner:

>   I just tested Linux 2.6.9 and 2.6.35, and yes, both will randomize the
> base stack address, but not necessarily library function addresses:

Oh, I think those are PLT addresses, then.
dlsym(RTLD_DEFAULT, "srand") returns a randomized
address, but the PLT address is constant (unless you
compile with -fPIE or as a dynamic shared object).

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Roberto Ierusalimschy
In reply to this post by William Ahern
> It just requires that LUA_RANDOM be defined to one of the
> identifiers `time', `arc4random', or `RAND_bytes'.

That is part of the problem. We would like something that Lua could
detect testing some predefined macros.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Roberto Ierusalimschy
In reply to this post by Jerome Vuarand
> Well, if these needs for "randomization for increased security" are
> gaining weight, maybe a secondary lua_newstate taking a seed argument
> would solve the problem, that seed being used for all (present and
> future) randomizations needs.

That seems a bit overshooting. Unlike memory allocation, I do not see
any need for an application to use different seed-creation methods for
different states. And we still have the problem for the stand-alone
interpreter.

I still think this whole problem is somewhat overstated. As Jay Carlson
put it (Jan 20), there are many other forms of DoS attacks that most
sites are not resistant to. The main reasons we are changing Lua
are that we like this idea of long non-internalized strings and for
propaganda. For propaganda purposes, the stand-alone interpreter is
very important. If it is too easy to simmulate an attack in a standard
interpreter running in a decent system (e.g., Linux), you will have a
hard time to counter-argument with "but you can solve that by passing an
appropriate seed to lua_newstate".

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Roberto Ierusalimschy
In reply to this post by Florian Weimer
> Addresses of a stack variable and a public Lua function should provide
> a few bits of randomness, too.

That sounds interesting.


> Instead of addition, you should use one of Bob Jenkin's mixing
> functions.

What are Bob Jenkin's "mixing functions"? Hash functions?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Roberto Ierusalimschy
In reply to this post by Matthew Wild
> I'm personally wondering what the downsides are to using a tree to
> store conflicting keys. It doesn't seem to be the popular approach to
> fixing this problem. Is the code complexity deemed too great? Too much
> (runtime) overhead? Laziness?

Do you mean using trees only for the string table or for tables in
general?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Roberto Ierusalimschy
In reply to this post by Roberto Ierusalimschy
> What are Bob Jenkin's "mixing functions"? Hash functions?

Never mind. I found them.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Roberto Ierusalimschy
In reply to this post by Florian Weimer
> For 5.1, I've got a patch that replaces the string hash function with
> the de-facto industry standard (one of Bob Jenkin's functions).  I
> could tidy up the patch and post it if there is enough interest.


What were the gains?

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Joshua Haberman-2
In reply to this post by Roberto Ierusalimschy
On Thu, Jan 26, 2012 at 11:56 AM, Roberto Ierusalimschy <[hidden email]> wrote:
> What are Bob Jenkin's "mixing functions"? Hash functions?

Never mind. I found them.

My impression is that Bob Jenkins' functions have been more
or less obsoleted by MurmurHash and CityHash:

Josh
Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Matthew Wild
In reply to this post by Roberto Ierusalimschy
On 26 January 2012 19:53, Roberto Ierusalimschy <[hidden email]> wrote:
>> I'm personally wondering what the downsides are to using a tree to
>> store conflicting keys. It doesn't seem to be the popular approach to
>> fixing this problem. Is the code complexity deemed too great? Too much
>> (runtime) overhead? Laziness?
>
> Do you mean using trees only for the string table or for tables in
> general?
>

Tables in general. I had the belief that the string table actually
used the Lua table mechanisms internally, but looking at the source
just now suggests I'm wrong on that.

Regards,
Matthew

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Joshua Haberman-2
In reply to this post by Joshua Haberman-2
On Thu, Jan 26, 2012 at 12:07 PM, Josh Haberman <[hidden email]> wrote:
On Thu, Jan 26, 2012 at 11:56 AM, Roberto Ierusalimschy <[hidden email]> wrote:
> What are Bob Jenkin's "mixing functions"? Hash functions?

Never mind. I found them.

My impression is that Bob Jenkins' functions have been more
or less obsoleted by MurmurHash and CityHash:

Though actually it looks like Bob Jenkins has a new hash function also:

It sounds like CityHash wins on CPUs with the CRC32 instruction
(part of SSE 4.2, Intel Nehalem and later) but SpookyHash wins
on x86-64 CPUs that lack this instruction.

Josh
Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Sean Conner
In reply to this post by Florian Weimer
It was thus said that the Great Florian Weimer once stated:
> * Sean Conner:
>
> >   I just tested Linux 2.6.9 and 2.6.35, and yes, both will randomize the
> > base stack address, but not necessarily library function addresses:
>
> Oh, I think those are PLT addresses, then.
> dlsym(RTLD_DEFAULT, "srand") returns a randomized
> address, but the PLT address is constant (unless you
> compile with -fPIE or as a dynamic shared object).

  Yes and no.  Linux 64 bit, the address changes, but only if you don't
reference srand() before pulling it from dlsym().  On 32 bit Linux, and
SPARC (64 bit, but I assume the same behavior with 32 bit as well) the
address never changes.

  -spc (And even when it does change on 64-bit Linux, the upper 16 bits and
        lower 12 never change ... )


Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Florian Weimer
In reply to this post by Roberto Ierusalimschy
* Roberto Ierusalimschy:

>> For 5.1, I've got a patch that replaces the string hash function with
>> the de-facto industry standard (one of Bob Jenkin's functions).  I
>> could tidy up the patch and post it if there is enough interest.

> What were the gains?

The sensitive to input with regular patterns was gone, but hashing was
a bit slower.  Some microbenchmarks were slower, some were faster.  My
test case for collisions did not work anymore.

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

Florian Weimer
In reply to this post by Sean Conner
* Sean Conner:

> It was thus said that the Great Florian Weimer once stated:
>> * Sean Conner:
>>
>> >   I just tested Linux 2.6.9 and 2.6.35, and yes, both will randomize the
>> > base stack address, but not necessarily library function addresses:
>>
>> Oh, I think those are PLT addresses, then.
>> dlsym(RTLD_DEFAULT, "srand") returns a randomized
>> address, but the PLT address is constant (unless you
>> compile with -fPIE or as a dynamic shared object).
>
>   Yes and no.  Linux 64 bit, the address changes, but only if you
> don't reference srand() before pulling it from dlsym().  On 32 bit
> Linux, and [...] the address never changes.

I can't reproduce either aspect.

This is not too surprising---there are multiple randomization
implementations, with slightly differing behaviors.  I'm a bit
astonished that the system I'm using (stock Debian squeeze) is
randomizing at all.

>   -spc (And even when it does change on 64-bit Linux, the upper 16 bits and
> lower 12 never change ... )

Pointers are actually 48 bits only (and the hardware may support even
less in terms of concrete address space layout).  The lower 12 bits
are the intra-page offset, which cannot be changed without rewriting
the page after it comes from disk.  mmap on Linux only supports offset
which are multiples of the page size, which is somewhat
understandable.

Reply | Threaded
Open this post in threaded view
|

Re: Real-World Impact of Hash DoS in Lua

William Ahern
In reply to this post by Roberto Ierusalimschy
On Thu, Jan 26, 2012 at 05:17:38PM -0200, Roberto Ierusalimschy wrote:
> > It just requires that LUA_RANDOM be defined to one of the
> > identifiers `time', `arc4random', or `RAND_bytes'.
>
> That is part of the problem. We would like something that Lua could
> detect testing some predefined macros.
>

There are two problems, really: 1) making it easy to select sources, and 2)
getting the default right.

The solution I posted earlier addressed #1. The second is, obviously,
harder. For arc4random I test for __NetBSD__, __FreeBSD__, __OpenBSD__, and
__APPLE__. I also set _BSD_SOURCE and _DARWIN_C_SOURCE, similar to
_GNU_SOURCE.

As has been pointed out, Arch Linux has arc4random(), but I'd be wary of
degenerating to checking for every Linux distribution out there, which is
why I think solving problem #1 is important; perhaps more important than
#2.

For the Linux sysctl hack you can just check for __linux since it's a kernel
interface. Unfortunately the RANDOM_UUID identifier for the sysctl mib is an
enum.

Like some others, I'd prefer an extension to lua_newstate, but I realize
that doesn't solve the lua command-line utility issue.

For OpenSSL I don't know of any way to check other than relying on the
preprocessor to continue even for a failed #include. Also, the location of
openssl header files used to be unreliable, though I think
<openssl/crypto.h> is most common. But then there's the linking issue, so
maybe a run-time approach of trying to dlopen() libcrypto.so would work
better.

/dev/urandom is fairly common (*BSDs, Linux, Apple, and Solaris), and at
least for the command-line utility won't have the chroot() issues.

I guess the 4 good bets are 1) arc4random, 2) sysctl({RANDOM_UUID}), 3)
Windows Crypto API, and 4) /dev/urandom. After that just fallback to time(),
etc.


123