manual table rehash

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view

manual table rehash
hi all! :)

im wondering if a manual table rehash could be a useful utility for
experts in some rare cases (long living messy tables with much
contents and much deletes), or not? simply like
`table.rehash(aTable)`, i think it would be even a kinda cheap goodie,
but with much harmless uses that arent useful. :D maybe even for 5.4

any ideas about this?

[not so important part]
my possible use case:
actually i have a function(((-like beast))) that is used for
traversing whatever structures (currently lua tables and half done for
fs) that can serve for
iter/dump/serialize/diff/search/analyze/whatever, even a stream based
(so iterative) parser can be used for making whatever text to be a
structure on the go (and more goodies are done/planned).

so the state (a table) of this have an array of tables that hold
whatever info about the nodes+leafs (path, val, whatever) and as this
can grow huge, but it isnt necessary to hold everything (like
traversed branches; or leafs, if i need to detect only recursions or
repetitions, but thats mostly a bad example, as i can skip them),
therefor i could actually delete some of the members of this array,
while it can be useful to keep it for whatever purpose, so the rehash
could be a thing here.
[/not so important part]

furthermore, i have some questions about the array<->hash part transitions:
- how it goes if i have int keys in the hash part, and fill up the gap
between the indices of the array part, and those that are in the hash
- what happens when i delete a value from the middle of the array part?
- what about the expense of any related work?
- "dont try this at home" kinda hacks are also welcome! :D (anything
like unreliable implementation details as this
`assert(#{1,2,nil,4,nil,nil,nil,8}==8)`; and some more detailed info
on triggering the rehash with inserting `nil`s into a table, than what
lua programming gems have.)

if other lua implementations do a different thing, then im also
interested in those, especially luajit, but its good to know this
about any mostly compatible fork.

btw i just came across with (therefore the
topic) and the links could get some review, like is
already just history... (in the worst case, wayback machine could
help, but i think those lost bits exists somewhere else.)

many thx for ur time; and any info, help and whatever else; all the
bests to all of u! :)

Reply | Threaded
Open this post in threaded view

Re: manual table rehash

Philippe Verdy
Rehashiong a table is only necessary when:
(1) you want to change the maximum number of hash buskets,
(2) you want to change the hashing function to produce another distribution.

In both cases it is justified:
(3) if the collision lists in are overlong, which means that the hasing function was not flattening enough the distribution (was not strong enough) which can only occur when there are too many elements in the table compared to the number of hash buckets.
(4) there are too many empty hash buckets, which occurs when there are too few elements in the table compard to the number of hash buckets.

These last two justifications should be handled by the table implementation which should compute the correct numberĀ  of hash buckets, and then rehash only when needed (but not too often: this should be controled by some tuning metrics like minimum and maximum fill factors (a good minimum should be about 10%, but the maximum should be around 400%, where the fill factor is the ratio of total hashbuckets divided by the effective cardinality of keys in the table). But it does not apply to keys that are part of the integer sequence (which follow other rules and should never be hashed at all).

Beside that, when there are some non unitary collision list, rehashing may be justified for security reasons (against time-based side-channel attacks) only if the hashing function is very poor, i.e. when condition (3) is met, but sometimes more often if one wants a protection to defeat predictions, by regularly changing the hash function (for example you can change the prime multipliers used in Knuth modular hash function, by selecting them from a large enough known set of big primes)

However Lua's basic tables do not provide any way to change the hashing function, which is implemented internally by the implementation of the Lua engine itself: I think all these problems should be handled directly by Lua without having Lua programs to do that. Other languages (e.g. Java) allow controling the hasing function and you only need to rehash if you change the hash function (in all other cases, notably metrics of collision lists and fill factors should be always be part of the internal implementation, because an hashing function created in Lua would be quite slow, an in fact quite difficult to write correctly with Lua's dynamic polymorphic data types and also because you cannot hash non-simple types for keys, like tables, functions, or userdata that are only known by their internal reference pointer, not accessible to lua programs; but it is possible if your keys are strings or numbers outside integer sequences from 1 to N).
Reply | Threaded
Open this post in threaded view

Re: manual table rehash
haha, thx 4 ur nice answer, but thats an another subject. :D
(otherwise there are some good-to-know stuffs in ur answer anyhow! :)
) btw i think its my fault! i didnt know about this usage of the word
"rehash" (even the other is fresh for me), and i even mentioned only
from kinda far the meaning that i asked about, but here u go:

so its actually a resize that can be triggered only indirectly (at
least from lua side, i dunno about c side, but i think it can be done
either officially or not), and manual shrinking would be a
nice-to-have for it for some rare cases. :) and the rest is in my
initial msg... :D

but thx anyhow, i appreciated ur answer! :)