HI I found that lua table use 2^n to hash key. normaly we use the prime number to prevent hash collision. now I use lua-5.3.4, and the code below will cost about 6 second because the hash table degenerates into a linked list so why lua table don't use prime number to hash? function alloc_id(index) return ((index & 0xffffffff) << 32) | 1 end local src = {} local index = os.time() for i = 1, 30000 do local id = alloc_id(index) index = index + 1 table.insert(src, id) end local dst = {} local begin = os.time() for k, v in ipairs(src) do dst[v] = 1 end print("cost " .. os.time() - begin)
|
> I found that lua table use 2^n to hash key. normaly we use the prime number to prevent hash collision.
> now I use lua-5.3.4, and the code below will cost about 6 second because the hash table degenerates into a linked list > so why lua table don't use prime number to hash? Hashing with 2^n has the advantage that we can compute the modulo with (x & (size - 1)) (see macro 'lmod'), which is much cheaper than the '%' operator. Nontheless, for several types (all types that use addresses as hashes plus floats), Lua hashes with Mersenne numbers (2^n - 1), which are a "good approximation" for primes. The choice between the two hashes is based on how good we consider the initial hash (before the modulo). Integer keys tend to be sequential, which gives sequential keys after the modulo, which is good for hashes. So, for this quite common case, it is not worth paying the price of '%'. Of course, the worst-case for any hashing algorithm for unknown keys is O(n), even with prime sizes. Is yours a real problem? Do you have a program that does something useful to someone that is having performance problems because of this hashing? -- Roberto |
thanks for your reply >Is yours a real problem? Do you have a program that does something >useful to someone that is having performance problems because of >this hashing? yes, I am a online game server developer, and we use Lua to write the whole game's logic. this is the real problem that happened in our server which stuck the server about 10 seconds. since we didn't know the lua hash detail, so we used a ID generator function like this function alloc_id(timestamp) return ((timestamp& 0xffffffff) << 32) | ( small increment count ) end we found the problem and fix it by change the id to string before put into table. dst[ tostring(id) ] = info so if someone don't know the hash detail, may be he will have the same problem. thank you.
|
> >Is yours a real problem? Do you have a program that does something
> >useful to someone that is having performance problems because of > >this hashing? > yes, I am a online game server developer, and we use Lua to write the whole game's logic. > this is the real problem that happened in our server which stuck the server about 10 seconds. If you can change the code of your Lua implementation, you can try the following: In file 'ltable.c', change the definition of 'hashint': -#define hashint(t,i) hashpow2(t, i) +#define hashint(t,i) hashmod(t, l_castS2U(i)) See if that helps and report back to us. -- Roberto |
I tried it, and the running time of the test code reduce to zero second. I think maybe this is more adaptable to various situations if lua code change like that. thank you. At 2021-03-10 22:35:09, "Roberto Ierusalimschy" <[hidden email]> wrote: >> >Is yours a real problem? Do you have a program that does something >> >useful to someone that is having performance problems because of >> >this hashing? >> yes, I am a online game server developer, and we use Lua to write the whole game's logic. >> this is the real problem that happened in our server which stuck the server about 10 seconds. > >If you can change the code of your Lua implementation, you can try the >following: > >In file 'ltable.c', change the definition of 'hashint': > >-#define hashint(t,i) hashpow2(t, i) >+#define hashint(t,i) hashmod(t, l_castS2U(i)) > >See if that helps and report back to us. > >-- Roberto
|
> I tried it, and the running time of the test code reduce to zero second.
> I think maybe this is more adaptable to various situations if lua code change like that. Maybe a slightly better option would be this one: #define hashint(t,i) \ hashmod(t, cast_uint(l_castS2U(i) & (~0u)) ^ \ cast_uint(l_castS2U(i) >> (sizeof(int) * 8 - 8) >> 8)); It uses an unsigned 32-bit division, which is cheaper than an unsigned 64-bit division. (If lua_Integer == int, the compiler should be able to optimize this out.) -- Roberto |
Hi Roberto I had test the two hashint macro. and when insert 10000000 keys, the new macro cost about 2.7s, while old macro cost about 4s. this is indeed faster. and I also test the three hashint macro by use auto increase int key. this is the test lua code: function alloc_id(index) return index & 0xffffffff end local src = {} local index = os.time() for i = 1, 10000000 do local id = alloc_id(index) index = index + 1 table.insert(src, id) end local dst = {} local begin = os.clock() for k, v in ipairs(src) do dst[v] = 1 end print("cost " .. os.clock() - begin) and the result is: 1. hashpow2(t, i) ----> 2.5s 2. hashmod(t, l_castS2U(i)) ----> 3.4s 3. hashmod(t, cast_uint(l_castS2U(i) & (~0u)) ^ \ ----> 2.9s cast_uint(l_castS2U(i) >> (sizeof(int) * 8 - 8) >> 8)); so maybe the third method can better deal with different scenarios. thank you. -- zhao xin
|
> I had test the two hashint macro. and when insert 10000000 keys, the new macro cost about 2.7s, while old macro cost about 4s.
> > this is indeed faster. > > > > and I also test the three hashint macro by use auto increase int key. this is the test lua code: > > [...] Many thanks for the feedback. The new macro is obviously faster, but it is also more sensitive to bad patterns (e.g., «for i = 1, 1e5 do a[(i << 32) | i] = i end»). If we want a good hash, it may be worth paying the price of a full 64-bit modulo. -- Roberto |
Free forum by Nabble | Edit this page |