why lua table don't use prime number to hash

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

why lua table don't use prime number to hash

zhao xin
HI
I found that lua table use 2^n to hash key. normaly we use the prime number to prevent hash collision.
now I use lua-5.3.4, and the code below will cost about 6 second because the hash table degenerates into a linked list
so why lua table don't use prime number to hash?

function alloc_id(index)
    return ((index & 0xffffffff) << 32) | 1
end

local src = {}
local index = os.time()
for i = 1, 30000 do
    local id = alloc_id(index)
    index = index + 1
    table.insert(src, id)
end

local dst = {}
local begin = os.time()
for k, v in ipairs(src) do
    dst[v] = 1
end
print("cost " .. os.time() - begin)




 

Reply | Threaded
Open this post in threaded view
|

Re: why lua table don't use prime number to hash

Roberto Ierusalimschy
> I found that lua table use 2^n to hash key. normaly we use the prime number to prevent hash collision.
> now I use lua-5.3.4, and the code below will cost about 6 second because the hash table degenerates into a linked list
> so why lua table don't use prime number to hash?

Hashing with 2^n has the advantage that we can compute the modulo
with (x & (size - 1)) (see macro 'lmod'), which is much cheaper
than the '%' operator.

Nontheless, for several types (all types that use addresses as hashes
plus floats), Lua hashes with Mersenne numbers (2^n - 1), which are a
"good approximation" for primes. The choice between the two hashes is
based on how good we consider the initial hash (before the modulo).

Integer keys tend to be sequential, which gives sequential keys
after the modulo, which is good for hashes. So, for this quite
common case, it is not worth paying the price of '%'.

Of course, the worst-case for any hashing algorithm for unknown keys is
O(n), even with prime sizes.

Is yours a real problem? Do you have a program that does something
useful to someone that is having performance problems because of
this hashing?

-- Roberto
Reply | Threaded
Open this post in threaded view
|

Re:Re: why lua table don't use prime number to hash

zhao xin
thanks for your reply
>Is yours a real problem? Do you have a program that does something
>useful to someone that is having performance problems because of
>this hashing?
yes, I am a online game server developer, and we use Lua to write the whole game's logic.
this is the real problem that happened in our server which stuck the server about 10 seconds.
since we didn't know the lua hash detail, so we used a ID generator function like this

function alloc_id(timestamp)
    return ((timestamp& 0xffffffff) << 32) | ( small increment count )
end

we found the problem and fix it by change the id to string before put into table.

dst[ tostring(id) ] = info

so if someone don't know the hash detail, may be he will have the same problem.

thank you.

At 2021-03-10 20:47:06, "Roberto Ierusalimschy" <[hidden email]> wrote: >> I found that lua table use 2^n to hash key. normaly we use the prime number to prevent hash collision. >> now I use lua-5.3.4, and the code below will cost about 6 second because the hash table degenerates into a linked list >> so why lua table don't use prime number to hash? > >Hashing with 2^n has the advantage that we can compute the modulo >with (x & (size - 1)) (see macro 'lmod'), which is much cheaper >than the '%' operator. > >Nontheless, for several types (all types that use addresses as hashes >plus floats), Lua hashes with Mersenne numbers (2^n - 1), which are a >"good approximation" for primes. The choice between the two hashes is >based on how good we consider the initial hash (before the modulo). > >Integer keys tend to be sequential, which gives sequential keys >after the modulo, which is good for hashes. So, for this quite >common case, it is not worth paying the price of '%'. > >Of course, the worst-case for any hashing algorithm for unknown keys is >O(n), even with prime sizes. > >Is yours a real problem? Do you have a program that does something >useful to someone that is having performance problems because of >this hashing? > >-- Roberto


 

Reply | Threaded
Open this post in threaded view
|

Re: Re: why lua table don't use prime number to hash

Roberto Ierusalimschy
> >Is yours a real problem? Do you have a program that does something
> >useful to someone that is having performance problems because of
> >this hashing?
> yes, I am a online game server developer, and we use Lua to write the whole game's logic.
> this is the real problem that happened in our server which stuck the server about 10 seconds.

If you can change the code of your Lua implementation, you can try the
following:

In file 'ltable.c', change the definition of 'hashint':

-#define hashint(t,i)           hashpow2(t, i)
+#define hashint(t,i)           hashmod(t, l_castS2U(i))

See if that helps and report back to us.

-- Roberto
Reply | Threaded
Open this post in threaded view
|

Re:Re: Re: why lua table don't use prime number to hash

zhao xin
I tried it, and the running time of the test code reduce to zero second.
I think maybe this is more adaptable to various situations if lua code change like that.

thank you.


At 2021-03-10 22:35:09, "Roberto Ierusalimschy" <[hidden email]> wrote:
>> >Is yours a real problem? Do you have a program that does something
>> >useful to someone that is having performance problems because of
>> >this hashing?
>> yes, I am a online game server developer, and we use Lua to write the whole game's logic.
>> this is the real problem that happened in our server which stuck the server about 10 seconds.
>
>If you can change the code of your Lua implementation, you can try the
>following:
>
>In file 'ltable.c', change the definition of 'hashint':
>
>-#define hashint(t,i)           hashpow2(t, i)
>+#define hashint(t,i)           hashmod(t, l_castS2U(i))
>
>See if that helps and report back to us.
>
>-- Roberto


 

Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: why lua table don't use prime number to hash

Roberto Ierusalimschy
> I tried it, and the running time of the test code reduce to zero second.
> I think maybe this is more adaptable to various situations if lua code change like that.

Maybe a slightly better option would be this one:

#define hashint(t,i)  \
  hashmod(t, cast_uint(l_castS2U(i) & (~0u)) ^ \
             cast_uint(l_castS2U(i) >> (sizeof(int) * 8 - 8) >> 8));

It uses an unsigned 32-bit division, which is cheaper than an unsigned
64-bit division. (If lua_Integer == int, the compiler should be able
to optimize this out.)

-- Roberto
Reply | Threaded
Open this post in threaded view
|

Re:Re: Re: Re: why lua table don't use prime number to hash

zhao xin
Hi Roberto
I had test the two hashint macro. and when insert 10000000 keys, the new macro cost about 2.7s, while old macro cost about 4s.
this is indeed faster.

and I also test the three hashint macro by use auto increase int key. this is the test lua code:

function alloc_id(index)
    return index & 0xffffffff
end

local src = {}
local index = os.time()
for i = 1, 10000000 do
    local id = alloc_id(index)
    index = index + 1
    table.insert(src, id)
end

local dst = {}
local begin = os.clock()
for k, v in ipairs(src) do
    dst[v] = 1
end
print("cost " .. os.clock()  - begin)

and the result is:
1. hashpow2(t, i)                                                                ---->       2.5s
2. hashmod(t, l_castS2U(i))                                               ---->      3.4s
3. hashmod(t, cast_uint(l_castS2U(i) & (~0u)) ^  \             ---->      2.9s
     cast_uint(l_castS2U(i) >> (sizeof(int) * 8 - 8) >> 8));  

so maybe the third method can better deal with different scenarios.

thank you.

-- zhao xin

At 2021-03-11 22:05:34, "Roberto Ierusalimschy" <[hidden email]> wrote: >> I tried it, and the running time of the test code reduce to zero second. >> I think maybe this is more adaptable to various situations if lua code change like that. > >Maybe a slightly better option would be this one: > >#define hashint(t,i) \ > hashmod(t, cast_uint(l_castS2U(i) & (~0u)) ^ \ > cast_uint(l_castS2U(i) >> (sizeof(int) * 8 - 8) >> 8)); > >It uses an unsigned 32-bit division, which is cheaper than an unsigned >64-bit division. (If lua_Integer == int, the compiler should be able >to optimize this out.) > >-- Roberto


 

Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: why lua table don't use prime number to hash

Roberto Ierusalimschy
> I had test the two hashint macro. and when insert 10000000 keys, the new macro cost about 2.7s, while old macro cost about 4s.
>
> this is indeed faster.
>
>
>
> and I also test the three hashint macro by use auto increase int key. this is the test lua code:
>
> [...]

Many thanks for the feedback.

The new macro is obviously faster, but it is also more sensitive to
bad patterns (e.g., «for i = 1, 1e5 do a[(i << 32) | i] = i end»).
If we want a good hash, it may be worth paying the price of a full
64-bit modulo.

-- Roberto