Tuples and other constant values

classic Classic list List threaded Threaded
47 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Tuples and other constant values

Dirk Laurie-2
2015-07-04 16:55 GMT+02:00 Cosmin Apreutesei <[hidden email]>:

> [Shouldn't Lua have built-in tuples by now? :) Tuples are very useful in
> application code, but hardly ever present in library code so I can see how
> they could be undervalued by low-level coders. As I move between being a lib
> coder and an app coder I sometimes wonder/worry about these differences in
> perspective and how they affect the APIs that we create for app usage. I
> admit that I don't always drive API design from real use cases myself but I
> try goddamn it.]

Tuples are a special case of constant values. E.g.

const c = {1,2,3}

would create a constant table with three elements and associate it
with the name c. The interpreter will remember that _ENV.c for this
_ENV may not be redefined. A constant value can only be assigned
to such constant table fields.

One would probably has to disallow tables with a hash part from
being declared constant. I.e tuples == constant sequences.

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Jay Carlson
On 2015-07-05, at 4:23 AM, Dirk Laurie <[hidden email]> wrote:
>
> Tuples are a special case of constant values. E.g.
>
> const c = {1,2,3}
>
> would create a constant table with three elements and associate it
> with the name c. The interpreter will remember that _ENV.c for this
> _ENV may not be redefined. A constant value can only be assigned
> to such constant table fields.

Could I call a function like table.append(c, 4)?

I had a more complicated proposal for "static" which might help. "static c = Const{1,2,3}" would do the same thing, even inside a loop.

Jay
Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

steve donovan
On Sun, Jul 5, 2015 at 2:39 PM, Jay Carlson <[hidden email]> wrote:
> Could I call a function like table.append(c, 4)?

I think the idea of a tuple is that it is a const unmodifiable table.

> I had a more complicated proposal for "static" which might help. "static c = Const{1,2,3}" would do the same thing, even inside a loop.

So it would be a _compile-time_ constant? That makes it cheap to use,
although restricts what values it can be initialized to

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Jay Carlson
On 2015-07-05, at 11:59 AM, steve donovan <[hidden email]> wrote:

>
> On Sun, Jul 5, 2015 at 2:39 PM, Jay Carlson <[hidden email]> wrote:
>> Could I call a function like table.append(c, 4)?
>
> I think the idea of a tuple is that it is a const unmodifiable table.
>
>> I had a more complicated proposal for "static" which might help. "static c = Const{1,2,3}" would do the same thing, even inside a loop.
>
> So it would be a _compile-time_ constant? That makes it cheap to use,
> although restricts what values it can be initialized to
>

I think the compiler for Lua will remain (relatively) simple.

The "static" I'm thinking of is a tree transformation such that:

for k,v in f() do
  static s = {1,2,3}
end

is transformed into

local __t1 = {1,2,3}
for k,v in f() do
  local s = __t1
end

This lets you use Const() or Set() constructors without invoking them on every loop.

The transformation is simple, except for any or all of these complications:

1) The compiler may not need to actually introduce a local "s". It seems to me "s" aliases __t1, and the slot number can be reused. The intent is just that the scope of "s" is limited to the body.

2) The rule might be that initializers are "hoisted out of all functions" instead of "hoisted out of any body into top-level chunk"; this kind of irregularity is of the same sort as "local function".

3) A form of "static" syntax is almost exactly what you want for localizing globals. Everybody loves the (premature?) optimization of moving references out of _ENV["name"] into "local name=name", so perhaps "static local name" does just that.

4) In order to deal with re-import of modules, all of the initializers may get bunched up into a single function body

The one nasty problem a simple tree transformation doesn't solve is use of previously-declared locals, especially inside nested chunk bodies. I don't remember whether I solved this the last time I typed this up. I have been avoiding looking up my own message in the archives to see if I can still remember how number 4 worked without a reminder. :-) Perhaps if I can't, it is not a very good idea...

Jay

(it still might not be a good idea)
Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Tim Hill
In reply to this post by steve donovan

> On Jul 5, 2015, at 8:59 AM, steve donovan <[hidden email]> wrote:
>
> On Sun, Jul 5, 2015 at 2:39 PM, Jay Carlson <[hidden email]> wrote:
>> Could I call a function like table.append(c, 4)?
>
> I think the idea of a tuple is that it is a const unmodifiable table.
>
>> I had a more complicated proposal for "static" which might help. "static c = Const{1,2,3}" would do the same thing, even inside a loop.
>
> So it would be a _compile-time_ constant? That makes it cheap to use,
> although restricts what values it can be initialized to
>

But in Lua constants have nearly identical overhead to locals (in fact, you can think of them conceptually as read-only locals). Look carefully at the bytecode and the only real difference is a bit indicating if the source is a local (on the stack) or a constant (in the constant table for the chunk).

Apart from the original OP issue of efficient memoization, I’m not really seeing anything in this thread that cannot already be done in Lua with appropriate tables and metatables, or a few good practices (use of locals etc).

—Tim


Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Dirk Laurie-2
2015-07-05 20:39 GMT+02:00 Tim Hill <[hidden email]>:

>
>> On Jul 5, 2015, at 8:59 AM, steve donovan <[hidden email]> wrote:
>>
>> On Sun, Jul 5, 2015 at 2:39 PM, Jay Carlson <[hidden email]> wrote:
>>> Could I call a function like table.append(c, 4)?
>>
>> I think the idea of a tuple is that it is a const unmodifiable table.
>>
>>> I had a more complicated proposal for "static" which might help. "static c = Const{1,2,3}" would do the same thing, even inside a loop.
>>
>> So it would be a _compile-time_ constant? That makes it cheap to use,
>> although restricts what values it can be initialized to
>>
>
> But in Lua constants have nearly identical overhead to locals (in fact, you can think of them conceptually as read-only locals). Look carefully at the bytecode and the only real difference is a bit indicating if the source is a local (on the stack) or a constant (in the constant table for the chunk).
>
> Apart from the original OP issue of efficient memoization, I’m not really seeing anything in this thread that cannot already be done in Lua with appropriate tables and metatables, or a few good practices (use of locals etc).

The thing is that there is a whole circle of ideas that belong
together: immutable values,
non-reassignable names, tuples and probably more. Just bringing in one
of them is not
a policy.

Maybe Ravi is a better language than Lua for bringing in a "const"
keyword, since
it is already concerned with typing.

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Cosmin Apreutesei
In reply to this post by Dirk Laurie-2
Tuples are a special case of constant values. E.g.

Uhm, I don't know what you guys are talking about :)

I'm talking multiple-value indexing, not compile-time const tables. The first is useful to me and somewhat inefficient to implement in Lua, the second is not useful to me and trivial to implement in Lua.

It looks like we just hit the dichotomy between lib coders and app coders that I've been hinting at earlier :)

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Coda Highland
On Mon, Jul 6, 2015 at 6:15 AM, Cosmin Apreutesei
<[hidden email]> wrote:

>> Tuples are a special case of constant values. E.g.
>
>
> Uhm, I don't know what you guys are talking about :)
>
> I'm talking multiple-value indexing, not compile-time const tables. The
> first is useful to me and somewhat inefficient to implement in Lua, the
> second is not useful to me and trivial to implement in Lua.
>
> It looks like we just hit the dichotomy between lib coders and app coders
> that I've been hinting at earlier :)
>

They sort of are a special case of constant values, though -- and
strings are a similar special case. If tuples can be pooled like
strings, then they can have reference equality, which means they can
work as table keys as-is.

This IS actually possible to implement in Lua, and it's not TOO
horribly inefficient (O(n log m) overhead per construction, no
additional overhead for reading and indexing):

local tuple_key = {}
local tuple_nil = {}
local tuple_store = {}
local tuple_mt = {
  __newindex = function(t,k,v) error("attempt to modify tuple") end,
  __len = function(t) return t.n end
}
tuple_mt.__metatable = tuple_mt

function tuple_pack(...)
  return { n = select('#', ...), ... }
end

function tuple(...)
  local args = tuple_pack(...)
  local t = tuple_store
  for i = 1, args.n do
    local k = args[i]
    if k == nil then k = tuple_nil end
    local c = t[k]
    if c then
      t = c
    else
      t[k] = {}
      t = t[k]
    end
  end

  local v = t[tuple_key]
  if not v then
    v = setmetatable(args, tuple_mt)
    t[tuple_key] = v
  end
  return v
end

assert(tuple(1,nil,3) == tuple(1,nil,3))
assert(tuple(1,2,3) ~= tuple(1,nil,3))
assert(#tuple(1,2,3) == 3)
assert(tuple(1,nil,3)[2] == nil)

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Coda Highland
On Mon, Jul 6, 2015 at 7:05 AM, Coda Highland <[hidden email]> wrote:
> They sort of are a special case of constant values, though -- and
> strings are a similar special case. If tuples can be pooled like
> strings, then they can have reference equality, which means they can
> work as table keys as-is.

Self-nitpick: I meant "interned" here, not "pooled."

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Cosmin Apreutesei
In reply to this post by Coda Highland
This IS actually possible to implement in Lua, and it's not TOO
horribly inefficient (O(n log m) overhead per construction, no
additional overhead for reading and indexing):

It gets more complicated and slow[1] if you want the tuple table to be self-cleaning though (add in gc/weak-table overhead). I would imagine that a built-in tuple type would be much more efficient since it would just index on the sum of the hashes of the elements and only check the elements on collisions, or am I missing something?


Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Coda Highland
On Mon, Jul 6, 2015 at 8:02 AM, Cosmin Apreutesei
<[hidden email]> wrote:

>> This IS actually possible to implement in Lua, and it's not TOO
>> horribly inefficient (O(n log m) overhead per construction, no
>> additional overhead for reading and indexing):
>
>
> It gets more complicated and slow[1] if you want the tuple table to be
> self-cleaning though (add in gc/weak-table overhead). I would imagine that a
> built-in tuple type would be much more efficient since it would just index
> on the sum of the hashes of the elements and only check the elements on
> collisions, or am I missing something?
>
> [1] https://github.com/luapower/tuple/blob/master/tuple.lua
>

It depends on what kind of efficiency you're looking for.

As for weak values, your implementation there in luapower doesn't have
a __gc metamethod on tuples, which makes the use of weak values not
even worth the overhead it does add -- you aren't saving that much
memory since the index tables are be populated. And if you DO add __gc
to tuples, then you make the weak-valued case that much more
expensive. So unless you expect to be generating a LOT of unique,
ephemeral tuples (is this even sensible?), it's quite likely to not be
worth the effort at all. (I suppose there are some possible
alternative implementations, like not sweeping the index tree until a
certain number of leaves have reported themselves empty, or requiring
the user to explicitly sweep the index tree from time to time.)

An alternative that hashes the tuple in the index tree is something
that's completely plausible to do in Lua code as well. I suspect this
would be less efficient for short tuples and more efficient for long
tuples.

Yet another option is to simply use non-interned tables, and introduce
a __hash metamethod and add a hash-keyed table structure that calls
the key's __hash to produce a string, either memoized (preferable for
immutable, non-ephemeral tables) or not (preferable for mutable or
ephemeral tables). This is a different side of the tradeoff: it makes
table reads and writes more expensive (only slightly for immutable,
noticeably for mutable), it makes tuple construction cheaper than the
interned method, it avoids the need to maintain an index tree
(substantially less memory usage when you have ephemeral tuples, but
might use more memory if you construct a lot of identical tuples), and
it makes garbage collection free. You can do this entirely in Lua code
as well, although I would recommend a C API function to perform the
hashing.

In other words, there's a lot of different ways to do it, each with
advantages and disadvantages, and implementing any of these in the Lua
core would have similar tradeoffs and I'm fairly certain that you're
not going to satisfy everyone by choosing ONE of these to canonize in
the core. In particular, the most obvious implementation in the core
is the non-interned hashed version, and I think people would complain
vehemently about making table reads more expensive. While memoizing
would fix that, it would make ephemeral tuples more expensive (even
for non-unique ephemeral tuples). Interning would avoid those problems
at the cost of increased memory/GC overhead. These are, of course,
exactly the same tradeoffs you have to make doing it outside of the
core.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Cosmin Apreutesei
As for weak values, your implementation there in luapower doesn't have
a __gc metamethod on tuples, which makes the use of weak values not
even worth the overhead it does add -- you aren't saving that much
memory since the index tables are be populated.

I'm not sure I understand this point. The use of weak tables is not an optimization, it is necessary to have a non-leaky implementation. What does __gc has to do with it?
 
In other words, there's a lot of different ways to do it, each with
advantages and disadvantages, and implementing any of these in the Lua
core would have similar tradeoffs and I'm fairly certain that you're
not going to satisfy everyone by choosing ONE of these to canonize in
the core.
 
All engineering is tradeoffs and you'll never going to satisfy anyone, that's an impossible standard which I hope Lua is not following or we'll never see anything new in Lua again.

In particular, the most obvious implementation in the core
is the non-interned hashed version, and I think people would complain
vehemently about making table reads more expensive.

Why would they get more expensive? It's the same hashmap algorithm except now the hash function is a sum of the hashes of the elements and I don't think anyone would object that that should be O(#t) as it is for strings. What am I missing?

These are, of course,
exactly the same tradeoffs you have to make doing it outside of the
core.

I disagree. A built-in tuple would not create garbage (other than the tuple itself), would not slow down the gc for _other_ objects, and would not create hash strings and I never said anything about a __hash metamethod (which would be useless without a __collision_check metamethod btw or you're back to square one, but I don't want any of that anyway).

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Coda Highland
On Mon, Jul 6, 2015 at 9:36 AM, Cosmin Apreutesei
<[hidden email]> wrote:
>> As for weak values, your implementation there in luapower doesn't have
>> a __gc metamethod on tuples, which makes the use of weak values not
>> even worth the overhead it does add -- you aren't saving that much
>> memory since the index tables are be populated.
>
>
> I'm not sure I understand this point. The use of weak tables is not an
> optimization, it is necessary to have a non-leaky implementation. What does
> __gc has to do with it?

The point is that you don't currently have a non-leaky implementation.
It still leaks. It just leaks slower than if you weren't using weak
keys. Meanwhile, it's not exactly "leaky" if you think of it as an
interning pool or like memoization, where it's EXPECTED that the
memory usage is monotonically increasing as new unique values appear
-- allowing the memory usage to decrease is an optimization, from that
perspective.

>> In other words, there's a lot of different ways to do it, each with
>> advantages and disadvantages, and implementing any of these in the Lua
>> core would have similar tradeoffs and I'm fairly certain that you're
>> not going to satisfy everyone by choosing ONE of these to canonize in
>> the core.
>
> All engineering is tradeoffs and you'll never going to satisfy anyone,
> that's an impossible standard which I hope Lua is not following or we'll
> never see anything new in Lua again.

The standard isn't "never make tradeoffs." The standard is "make sure
the tradeoffs are worth it" with a side of "avoid breaking/penalizing
existing code if you can". And one tradeoff that's ALMOST NEVER worth
it is making an extremely common operation more expensive. It's ALMOST
ALWAYS preferable to add a little bit more per-call cost to a
less-frequent scenario.

(Whether either of these is more or less preferable to using up more
memory isn't so easily generalized.)

>> In particular, the most obvious implementation in the core
>> is the non-interned hashed version, and I think people would complain
>> vehemently about making table reads more expensive.
>
> Why would they get more expensive? It's the same hashmap algorithm except
> now the hash function is a sum of the hashes of the elements and I don't
> think anyone would object that that should be O(#t) as it is for strings.
> What am I missing?

It's not O(#t) for short strings (that is, the kind you're most likely
to use as table keys), it's O(1) thanks to string interning. Likewise,
currently using a table as a key is O(1), but hashing on every lookup
would be O(#t) as you describe -- but if you interned tuples like you
do strings, it would still be O(1). You could try to find a balance by
memoizing the hash or computing it on construction, but that's still
not ideal for ephemeral objects like the one produced in an expression
like "tbl[tuple(x,y,z)]".

(Memoizing it lazily is preferable because then if you never need the
hash at all you don't have to compute it.)

>> These are, of course,
>> exactly the same tradeoffs you have to make doing it outside of the
>> core.
>
> I disagree. A built-in tuple would not create garbage (other than the tuple
> itself)

A built-in tuple would either have more garbage from having multiple
deep copies of non-unique tuples, or more long-term memory usage from
having a canonical copy of the tuple interned in a pool. Possibly
both, if there's a long/short distinction like there is for strings.

> would not slow down the gc for _other_ objects

It doesn't slow it down for INDIVIDUAL other objects, but cleaning up
is still extra work the gc has to do when it's time to reap, which
makes those gc sweeps take longer.

> and would not
> create hash strings and I never said anything about a __hash metamethod
> (which would be useless without a __collision_check metamethod btw or you're
> back to square one, but I don't want any of that anyway).

I didn't say YOU said anything about a __hash metamethod; that would
just be one possible implementation. (Also, you wouldn't need a
__collision_check metamethod anyway; it would just be __eq (which
tuples would want to have regardless).)

I also didn't say anything about hash STRINGS, just hashes in general
-- you either need to memoize them (increasing the memory footprint of
each tuple) or calculate them every time (makes table accesses more
expensive). Whether these are stored as Lua strings, as char arrays,
or as ints doesn't change this fact.

Furthermore, I didn't even mention __hash in that last paragraph about
tradeoffs. Besides, in-core code or out-of-core code would have the
same choice to make (hard-coded hashing vs. metamethod).

All in all: To me it strongly feels like tuples would be better served
by library code (or maybe a power patch) than core code. They aren't
broadly used enough to warrant the corresponding increase in Lua's
footprint, and the people who would benefit from its existence can use
a library or patch.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Tim Hill
In reply to this post by Cosmin Apreutesei

On Jul 6, 2015, at 8:02 AM, Cosmin Apreutesei <[hidden email]> wrote:

This IS actually possible to implement in Lua, and it's not TOO
horribly inefficient (O(n log m) overhead per construction, no
additional overhead for reading and indexing):

It gets more complicated and slow[1] if you want the tuple table to be self-cleaning though (add in gc/weak-table overhead). I would imagine that a built-in tuple type would be much more efficient since it would just index on the sum of the hashes of the elements and only check the elements on collisions, or am I missing something?



But what happens if one of the values in the tuple is a table? (yes, I know, “not allowed” is the easy way out, but what about userdata etc? It would seem that the types of values allowed in a tuple is quite limited (though presumably a nested tuple is allowed.)

The issue of “by reference” (interned like strings) or “by value” (discrete copies regardless of similarity of contents) is more subtle when looking at performance. Interning means tuple comparisons are very fast (basically a pointer compare), at the expense of initial creation and GC time. So which is faster depends very much upon the usage profile of a given application.

However, if you follow through on which values ARE allowed in a tuple, then what you are left with (imho) is a small set of value types (number, string, boolean, nested tuple) that are allowed in a tuple. Assume we now designate a “tuple table” to be a Lua sequence containing only items of these values (purely a convention). At this point it’s quite trivial to write a small C library that, given such a “tuple table”, would return a unique userdata item (garbage collected) that was (a) the same item for any identical tuple table and (b) a different item for any non-identical tuple table (where “identical” means the same value at the same sequence index in the tuple table, applied recursively to nested tuple tables).

This would give you all the functionality you need for memoization type tasks.

The API looks something like this:

tuple = totuple{ 10, “a”, 19.5, { “nested”, 100 } }

One possible (and quite efficient when coded in C) implementation would be to create a Lua string “signature" from the packed binary concatenation of type/value pairs from the tuple table (in binary, no need to convert to string form). (You would also need to add markers for nested tuples, but that’s trivial.) Then, maintain a hidden Lua “master tuple” table in the Registry, with weak values. Use the “signature" as a key, and a dummy (one byte long) full userdata as the value. All the function has to do is create the “signature" key as above, then look it up in the “master tuple” table. if its found, return the value (the userdata). If not, create a new dummy userdata, add it to the table keyed by the Lua string key, and return the new userdata.

You now have a  unique Lua value for any valid tuple, that will be the same for any given “tuple table” fed to the above API. This value can then be used as the key for whatever memoization system you want, or to compare to other tuples as desired. Performance should be pretty good, since the overhead is mostly just concatenating binary bytes to create the signature (which can be very fast), followed by a single table lookup. Each “signature” is only ever stored once, providing intern-like advantages, and the userdata (and signature) are garbage collected when no longer used (thanks to the weak values in the master tuple table).

—Tim

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Coda Highland
On Mon, Jul 6, 2015 at 12:00 PM, Tim Hill <[hidden email]> wrote:
> At this point it’s quite trivial to write a small C
> library that, given such a “tuple table”, would return a unique userdata
> item (garbage collected) that was (a) the same item for any identical tuple
> table and (b) a different item for any non-identical tuple table (where
> “identical” means the same value at the same sequence index in the tuple
> table, applied recursively to nested tuple tables).

Why do you need userdata for this at all? This behavior is already
present with normal tables.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Tim Hill

> On Jul 6, 2015, at 12:20 PM, Coda Highland <[hidden email]> wrote:
>
> On Mon, Jul 6, 2015 at 12:00 PM, Tim Hill <[hidden email]> wrote:
>> At this point it’s quite trivial to write a small C
>> library that, given such a “tuple table”, would return a unique userdata
>> item (garbage collected) that was (a) the same item for any identical tuple
>> table and (b) a different item for any non-identical tuple table (where
>> “identical” means the same value at the same sequence index in the tuple
>> table, applied recursively to nested tuple tables).
>
> Why do you need userdata for this at all? This behavior is already
> present with normal tables.
>
> /s/ Adam
>

First, the tuple table may be large; you already have the overhead of the “signature”, using a dummy userdata is more compact. Second, if you mean why not index using a normal user table, then that’s not the point. The point is that when tables are used as keys, it’s the identity of the table, not the contents, which are used as the key (that’s why the table does not have to be immutable to be used as a key). The tuple table uses the tuple value itself (that is, the contents of the table) as the key, which was the need expressed by the OP (for use in memoization of function arguments).

Or, to put it another way:

t1 = {10, 20, 30}
t2 = (10, 20, 30}
print( t1 == t2 ) — false
print( totuple(t1) == totuple(t2) ) — true

The whole scheme is an elaboration of comparing the contents of two constrained (in the sense of limited domain of Lua types) sequences, but with the added ability to reduce the comparison to fast reference item equalities by surfacing the signatures as userdata (so that the signatures are garbage collected). The assumption of course is that the overhead of creating the signatures/userdata tuples is recovered by the time savings accrued when comparing tuples.

—Tim


Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Cosmin Apreutesei
In reply to this post by Coda Highland
The point is that you don't currently have a non-leaky implementation.

I'm not sure I follow. If it's leaky then that's a bug. Can you show me please?
 
It still leaks. It just leaks slower than if you weren't using weak
keys. Meanwhile, it's not exactly "leaky" if you think of it as an
interning pool or like memoization, where it's EXPECTED that the
memory usage is monotonically increasing as new unique values appear
-- allowing the memory usage to decrease is an optimization, from that
perspective.

Uhm, you lost me, sorry :) Maybe you could show me with an example where it leaks?

The standard isn't "never make tradeoffs." The standard is "make sure
the tradeoffs are worth it" with a side of "avoid breaking/penalizing
existing code if you can". And one tradeoff that's ALMOST NEVER worth
it is making an extremely common operation more expensive. It's ALMOST
ALWAYS preferable to add a little bit more per-call cost to a
less-frequent scenario.

What specifically is the overhead and what is the operation?

> Why would they get more expensive? It's the same hashmap algorithm except
> now the hash function is a sum of the hashes of the elements and I don't
> think anyone would object that that should be O(#t) as it is for strings.
> What am I missing?

It's not O(#t) for short strings (that is, the kind you're most likely
to use as table keys), it's O(1) thanks to string interning.

Creating a new string implies hashing it (which implies traversing at least part of it). It would be the same with tuples. Hashing is O(min(#s, some_threshold)) in both cases. The rest of the lookup algorithm would be exactly the same. What am I missing?

A built-in tuple would either have more garbage from having multiple
deep copies of non-unique tuples,

No deep copies are necessary. Traversing the elements of the tuple doesn't have to be a recursive operation.

or more long-term memory usage from
having a canonical copy of the tuple interned in a pool.

I'm not sure how interning increases memory usage. Can you explain that point?
 
All in all: To me it strongly feels like tuples would be better served
by library code (or maybe a power patch) than core code. They aren't
broadly used enough to warrant the corresponding increase in Lua's
footprint, and the people who would benefit from its existence can use
a library or patch.

Now we're back to what I was saying about lib coders vs. app coders. How did you form the opinion that they aren't broadly used in apps? Multi-value indexing is fundamental in any data-rich app (games, web crawling, web apps -- I can't remember an app where I haven't used them).

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Coda Highland
On Mon, Jul 6, 2015 at 2:14 PM, Cosmin Apreutesei
<[hidden email]> wrote:
>> The point is that you don't currently have a non-leaky implementation.
>
> I'm not sure I follow. If it's leaky then that's a bug. Can you show me
> please?
>

Create a tuple. Destroy the tuple. The index tree keeps all of the
intermediate nodes that it had to add in order to pick up that tuple,
and only the leaf node is actually removed.

If you have tuples (1, 2, 3) and (2, 2, 3) and destroy both then
you've leaked four tables that have no useful children:

{
  [1] = {
    [2] = {}
  },
  [2] = {
    [2] = {}
  }
}

The resolution, if minimizing memory usage is your goal, is to
implement a __gc method on your tuple objects that walks up the tree
and removes empty branches. This does, of course, make GC sweeps more
expensive, and it means that a subsequent insertion is more expensive,
so it's again a time/space tradeoff.

>> The standard isn't "never make tradeoffs." The standard is "make sure
>> the tradeoffs are worth it" with a side of "avoid breaking/penalizing
>> existing code if you can". And one tradeoff that's ALMOST NEVER worth
>> it is making an extremely common operation more expensive. It's ALMOST
>> ALWAYS preferable to add a little bit more per-call cost to a
>> less-frequent scenario.
>
> What specifically is the overhead and what is the operation?

This is a general statement in response to your general statement of
all engineering being tradeoffs.

Applied to the discussion at hand, the relevant operation would be
table indexing. The overhead is hypothetical, something to think about
to inform whatever decisions would be made, not a specific critique of
any proposed implementation.

>> > Why would they get more expensive? It's the same hashmap algorithm
>> > except
>> > now the hash function is a sum of the hashes of the elements and I don't
>> > think anyone would object that that should be O(#t) as it is for
>> > strings.
>> > What am I missing?
>>
>> It's not O(#t) for short strings (that is, the kind you're most likely
>> to use as table keys), it's O(1) thanks to string interning.
>
>
> Creating a new string implies hashing it (which implies traversing at least
> part of it). It would be the same with tuples. Hashing is O(min(#s,
> some_threshold)) in both cases. The rest of the lookup algorithm would be
> exactly the same. What am I missing?

What you're missing is that there are two different operations being
discussed here: constructing a value, and indexing into a table.

You don't hash a string every time you want to look up a value in a
table. You intern it once, and then the interned value can be used to
index into a table in O(1) time.

However, if you DON'T intern tuples in advance or memoize their hash
values, then you have to deal with an O(n) hashing operation every
time you index into a table. (The decision between interning, advance
memoization, and lazy memoization is a bit deep, since the tradeoffs
there can be subtle, and there's a lot of them.)

The difference between these methods gets increasingly significant
when you cache tuples in variables.

>> A built-in tuple would either have more garbage from having multiple
>> deep copies of non-unique tuples,
>
> No deep copies are necessary. Traversing the elements of the tuple doesn't
> have to be a recursive operation.

Sorry, I didn't mean to imply a copy operation when talking about deep
copies. Rather, it would come up innocuously:

local x = tuple(1, 2, 3)
local y = tuple(1, 2, 3)

x and y COULD be references to the same value (interning) or they
could be element-wise deep copies of each other (hashing).

>> or more long-term memory usage from
>> having a canonical copy of the tuple interned in a pool.
>
> I'm not sure how interning increases memory usage. Can you explain that
> point?

Interning requires you to maintain a pool of values. If you don't have
a mechanism in place to reap gc'ed values (and AFAIK Lua's string pool
doesn't reap gc'ed strings) then the pool's memory usage is
monotonically increasing.

If you have a bunch of unique tuples that are only used for a short
period at a time (for example if you use tuples to represent
outstanding RPC calls or something) then interning without reaping
starts using up a lot of memory, and interning with reaping is more
expensive on the GC side than using element-wise hashing.

If you have a selection of tuples that you use a lot (compound lookup
keys, perhaps) then interning turns into a huge savings since you
don't have lots of copies of identical data lying around.

>> All in all: To me it strongly feels like tuples would be better served
>> by library code (or maybe a power patch) than core code. They aren't
>> broadly used enough to warrant the corresponding increase in Lua's
>> footprint, and the people who would benefit from its existence can use
>> a library or patch.
>
>
> Now we're back to what I was saying about lib coders vs. app coders. How did
> you form the opinion that they aren't broadly used in apps? Multi-value
> indexing is fundamental in any data-rich app (games, web crawling, web apps
> -- I can't remember an app where I haven't used them).

This isn't really a lib-vs-app distinction. Tuples are one of those
things that you use a lot when you're accustomed to using them, and if
you haven't used them much in the past you won't jump to adopt them if
you're introduced to them. (A large proportion of the tuple users I've
met are Python converts.)

I've only used compound data types as keys in a very small number of
cases. Most of the time that tuples would be useful I tend to just use
nested data structures instead. (Unless the low-order keys in your
data set are particularly sparse, this doesn't introduce much waste.)
I'm having trouble even imagining where I would use a tuple while
implementing a game, outside of the networking code. An application
that's HEAVILY data-rich is going to be sending accesses to a database
anyway.

/s/ Adam

Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Cosmin Apreutesei
Create a tuple. Destroy the tuple. The index tree keeps all of the
intermediate nodes that it had to add in order to pick up that tuple,
and only the leaf node is actually removed.

Did you actually test the code before saying that? My unit tests[1] disagree with that statement.

>> It's not O(#t) for short strings (that is, the kind you're most likely
>> to use as table keys), it's O(1) thanks to string interning.
>
>
> Creating a new string implies hashing it (which implies traversing at least
> part of it). It would be the same with tuples. Hashing is O(min(#s,
> some_threshold)) in both cases. The rest of the lookup algorithm would be
> exactly the same. What am I missing?

What you're missing is that there are two different operations being
discussed here: constructing a value, and indexing into a table.

You don't hash a string every time you want to look up a value in a
table. You intern it once, and then the interned value can be used to
index into a table in O(1) time.

I know there are two operations, I must be missing something else then because I can't see how that would be any different for tuples than for strings.

The difference between these methods gets increasingly significant
when you cache tuples in variables.

Interning requires you to maintain a pool of values. If you don't have
a mechanism in place to reap gc'ed values (and AFAIK Lua's string pool
doesn't reap gc'ed strings) then the pool's memory usage is
monotonically increasing.

That doesn't make any sense. That would make some of my apps crash in minutes.


Reply | Threaded
Open this post in threaded view
|

Re: Tuples and other constant values

Coda Highland
On Mon, Jul 6, 2015 at 3:43 PM, Cosmin Apreutesei
<[hidden email]> wrote:
>> Create a tuple. Destroy the tuple. The index tree keeps all of the
>> intermediate nodes that it had to add in order to pick up that tuple,
>> and only the leaf node is actually removed.
>
>
> Did you actually test the code before saying that? My unit tests[1] disagree
> with that statement.
>

No, I did it by inspection, and I do see where I made a mistake in my
analysis. Your implementation is subtly different from my own, and the
memory-freeing trick is pretty clever. Props on that one.

>> >> It's not O(#t) for short strings (that is, the kind you're most likely
>> >> to use as table keys), it's O(1) thanks to string interning.
>> >
>> >
>> > Creating a new string implies hashing it (which implies traversing at
>> > least
>> > part of it). It would be the same with tuples. Hashing is O(min(#s,
>> > some_threshold)) in both cases. The rest of the lookup algorithm would
>> > be
>> > exactly the same. What am I missing?
>>
>> What you're missing is that there are two different operations being
>> discussed here: constructing a value, and indexing into a table.
>>
>> You don't hash a string every time you want to look up a value in a
>> table. You intern it once, and then the interned value can be used to
>> index into a table in O(1) time.
>
>
> I know there are two operations, I must be missing something else then
> because I can't see how that would be any different for tuples than for
> strings.

I think at this point I'm the one missing something, because I have no
idea what you're confused about.

>
>> The difference between these methods gets increasingly significant
>> when you cache tuples in variables.
>>
>> Interning requires you to maintain a pool of values. If you don't have
>> a mechanism in place to reap gc'ed values (and AFAIK Lua's string pool
>> doesn't reap gc'ed strings) then the pool's memory usage is
>> monotonically increasing.
>
>
> That doesn't make any sense. That would make some of my apps crash in
> minutes.

Yeah, you're right, my recollection was faulty there. Not sure I know
exactly what it was I was thinking about there.

/s/ Adam

123