# why lua table don't use prime number to hash

8 messages
Open this post in threaded view
|

## why lua table don't use prime number to hash

HI
I found that lua table use 2^n to hash key. normaly we use the prime number to prevent hash collision.
now I use lua-5.3.4, and the code below will cost about 6 second because the hash table degenerates into a linked list
so why lua table don't use prime number to hash?

function alloc_id(index)
return ((index & 0xffffffff) << 32) | 1
end

local src = {}
local index = os.time()
for i = 1, 30000 do
local id = alloc_id(index)
index = index + 1
table.insert(src, id)
end

local dst = {}
local begin = os.time()
for k, v in ipairs(src) do
dst[v] = 1
end
print("cost " .. os.time() - begin)

Open this post in threaded view
|

## Re: why lua table don't use prime number to hash

 > I found that lua table use 2^n to hash key. normaly we use the prime number to prevent hash collision. > now I use lua-5.3.4, and the code below will cost about 6 second because the hash table degenerates into a linked list > so why lua table don't use prime number to hash? Hashing with 2^n has the advantage that we can compute the modulo with (x & (size - 1)) (see macro 'lmod'), which is much cheaper than the '%' operator. Nontheless, for several types (all types that use addresses as hashes plus floats), Lua hashes with Mersenne numbers (2^n - 1), which are a "good approximation" for primes. The choice between the two hashes is based on how good we consider the initial hash (before the modulo). Integer keys tend to be sequential, which gives sequential keys after the modulo, which is good for hashes. So, for this quite common case, it is not worth paying the price of '%'. Of course, the worst-case for any hashing algorithm for unknown keys is O(n), even with prime sizes. Is yours a real problem? Do you have a program that does something useful to someone that is having performance problems because of this hashing? -- Roberto
Open this post in threaded view
|

## Re:Re: why lua table don't use prime number to hash

 thanks for your reply```>Is yours a real problem? Do you have a program that does something >useful to someone that is having performance problems because of >this hashing?```yes, I am a online game server developer, and we use Lua to write the whole game's logic.this is the real problem that happened in our server which stuck the server about 10 seconds.since we didn't know the lua hash detail, so we used a ID generator function like thisfunction alloc_id(timestamp)    return ((timestamp& 0xffffffff) << 32) | ( small increment count )end we found the problem and fix it by change the id to string before put into table.dst[ tostring(id) ] = infoso if someone don't know the hash detail, may be he will have the same problem.thank you.```At 2021-03-10 20:47:06, "Roberto Ierusalimschy" <[hidden email]> wrote: >> I found that lua table use 2^n to hash key. normaly we use the prime number to prevent hash collision. >> now I use lua-5.3.4, and the code below will cost about 6 second because the hash table degenerates into a linked list >> so why lua table don't use prime number to hash? > >Hashing with 2^n has the advantage that we can compute the modulo >with (x & (size - 1)) (see macro 'lmod'), which is much cheaper >than the '%' operator. > >Nontheless, for several types (all types that use addresses as hashes >plus floats), Lua hashes with Mersenne numbers (2^n - 1), which are a >"good approximation" for primes. The choice between the two hashes is >based on how good we consider the initial hash (before the modulo). > >Integer keys tend to be sequential, which gives sequential keys >after the modulo, which is good for hashes. So, for this quite >common case, it is not worth paying the price of '%'. > >Of course, the worst-case for any hashing algorithm for unknown keys is >O(n), even with prime sizes. > >Is yours a real problem? Do you have a program that does something >useful to someone that is having performance problems because of >this hashing? > >-- Roberto ```
Open this post in threaded view
|

## Re: Re: why lua table don't use prime number to hash

 > >Is yours a real problem? Do you have a program that does something > >useful to someone that is having performance problems because of > >this hashing? > yes, I am a online game server developer, and we use Lua to write the whole game's logic. > this is the real problem that happened in our server which stuck the server about 10 seconds. If you can change the code of your Lua implementation, you can try the following: In file 'ltable.c', change the definition of 'hashint': -#define hashint(t,i)           hashpow2(t, i) +#define hashint(t,i)           hashmod(t, l_castS2U(i)) See if that helps and report back to us. -- Roberto
Open this post in threaded view
|

## Re:Re: Re: why lua table don't use prime number to hash

 I tried it, and the running time of the test code reduce to zero second.I think maybe this is more adaptable to various situations if lua code change like that.thank you.```At 2021-03-10 22:35:09, "Roberto Ierusalimschy" <[hidden email]> wrote: >> >Is yours a real problem? Do you have a program that does something >> >useful to someone that is having performance problems because of >> >this hashing? >> yes, I am a online game server developer, and we use Lua to write the whole game's logic. >> this is the real problem that happened in our server which stuck the server about 10 seconds. > >If you can change the code of your Lua implementation, you can try the >following: > >In file 'ltable.c', change the definition of 'hashint': > >-#define hashint(t,i) hashpow2(t, i) >+#define hashint(t,i) hashmod(t, l_castS2U(i)) > >See if that helps and report back to us. > >-- Roberto ```
Open this post in threaded view
|

## Re: Re: Re: why lua table don't use prime number to hash

 > I tried it, and the running time of the test code reduce to zero second. > I think maybe this is more adaptable to various situations if lua code change like that. Maybe a slightly better option would be this one: #define hashint(t,i)  \   hashmod(t, cast_uint(l_castS2U(i) & (~0u)) ^ \              cast_uint(l_castS2U(i) >> (sizeof(int) * 8 - 8) >> 8)); It uses an unsigned 32-bit division, which is cheaper than an unsigned 64-bit division. (If lua_Integer == int, the compiler should be able to optimize this out.) -- Roberto
 Hi RobertoI had test the two hashint macro. and when insert 10000000 keys, the new macro cost about 2.7s, while old macro cost about 4s.this is indeed faster. and I also test the three hashint macro by use auto increase int key. this is the test lua code:function alloc_id(index)    return index & 0xffffffffendlocal src = {}local index = os.time()for i = 1, 10000000 do    local id = alloc_id(index)    index = index + 1    table.insert(src, id)endlocal dst = {}local begin = os.clock()for k, v in ipairs(src) do    dst[v] = 1endprint("cost " .. os.clock()  - begin)and the result is:1. hashpow2(t, i)                                                                ---->       2.5s2. hashmod(t, l_castS2U(i))                                               ---->      3.4s3. hashmod(t, cast_uint(l_castS2U(i) & (~0u)) ^  \             ---->      2.9s     cast_uint(l_castS2U(i) >> (sizeof(int) * 8 - 8) >> 8));   so maybe the third method can better deal with different scenarios.thank you.`-- zhao xin````At 2021-03-11 22:05:34, "Roberto Ierusalimschy" <[hidden email]> wrote: >> I tried it, and the running time of the test code reduce to zero second. >> I think maybe this is more adaptable to various situations if lua code change like that. > >Maybe a slightly better option would be this one: > >#define hashint(t,i) \ > hashmod(t, cast_uint(l_castS2U(i) & (~0u)) ^ \ > cast_uint(l_castS2U(i) >> (sizeof(int) * 8 - 8) >> 8)); > >It uses an unsigned 32-bit division, which is cheaper than an unsigned >64-bit division. (If lua_Integer == int, the compiler should be able >to optimize this out.) > >-- Roberto ```