fastest way to maintain nested tables of counters?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

fastest way to maintain nested tables of counters?

Tim Menzies
is the following code the best way to handle auto-initialization of nested table cells to zero?

 function inc3(c,x,y,z) -- inc 3d to zero
   c[x] = c[x] or {}
   c[x][y] = c[x][y] or {}
   c[x][y][z] = c[x][y][z] or 0
   c[x][y][z] = c[x][y][z] + 1
end

i ask since this task is the core of  my naive bayes classifier. so it must be fast.

notes: the table index names won't be known till runtime, so you can auto-init the data with the right names as a pre-processor.

oh-  i've read the automagical table entry (http://lua-users.org/wiki/AutomagicTables) and i cant quite justify to myself that that approach is preferred to the above. 

but i'd like the advice of some Lua gurus!

ADVAthanxNCE

share and enjoy!
:-)

tim menzies



Reply | Threaded
Open this post in threaded view
|

Re: fastest way to maintain nested tables of counters?

Mason Larobina
On 15 May 2011 20:48, Tim Menzies <[hidden email]> wrote:

> is the following code the best way to handle auto-initialization of nested
> table cells to zero?
>  function inc3(c,x,y,z) -- inc 3d to zero
>    c[x] = c[x] or {}
>    c[x][y] = c[x][y] or {}
>    c[x][y][z] = c[x][y][z] or 0
>    c[x][y][z] = c[x][y][z] + 1
> end
> i ask since this task is the core of  my naive bayes classifier. so it must
> be fast.
> notes: the table index names won't be known till runtime, so you can
> auto-init the data with the right names as a pre-processor.
> oh-  i've read the automagical table entry
> (http://lua-users.org/wiki/AutomagicTables) and i cant quite justify to
> myself that that approach is preferred to the above.
> but i'd like the advice of some Lua gurus!
> ADVAthanxNCE
> share and enjoy!
> :-)
> tim menzies

Perhaps:

function inc3(t, x, y, z)
    if not t[x] then t[x] = {} end
    t = t[x]
    if not t[y] then t[y] = {} end
    t = t[y]
    t[z] = (t[z] or 0) + 1
end

There might be a better way again but at least this reduces the number
of table lookups.

Reply | Threaded
Open this post in threaded view
|

Re: fastest way to maintain nested tables of counters?

Dirk Laurie
On Sun, May 15, 2011 at 02:56:03PM +0200, Mason Larobina wrote:

> function inc3(t, x, y, z)
>     if not t[x] then t[x] = {} end
>     t = t[x]
>     if not t[y] then t[y] = {} end
>     t = t[y]
>     t[z] = (t[z] or 0) + 1
> end
>
> There might be a better way again but at least this reduces the number
> of table lookups.

If t[x][y] already exists then this code has 6 lookups; if t[x], 7;
if t only, 8; if not even t, there is an error, as there should be.

The following code reduces 6,7,8 to 4,5,6.

function inc3(t, x, y, z)
    local tx = t[x]
    if not tx then tx={}; t[x] = tx end
    local txy = tx[y]
    if not txy then txy={}; tx[y] = txy end
    local txyz = txy[z];
    if not txyz then txy[z]=1 else txy[z]=txyz+1 end
end

Dirk


Reply | Threaded
Open this post in threaded view
|

Re: fastest way to maintain nested tables of counters?

Kaj Eijlers
I would wonder if creating a hash out of x,y,z and using that as an index would be cheaper. Creating the hash might be expensive but you'd avoid multiple lookups - depending on whether x,y,z end up being hashed themselves for the lookup?.
 
Kaj

On Sun, May 15, 2011 at 8:51 AM, Dirk Laurie <[hidden email]> wrote:
On Sun, May 15, 2011 at 02:56:03PM +0200, Mason Larobina wrote:
> function inc3(t, x, y, z)
>     if not t[x] then t[x] = {} end
>     t = t[x]
>     if not t[y] then t[y] = {} end
>     t = t[y]
>     t[z] = (t[z] or 0) + 1
> end
>
> There might be a better way again but at least this reduces the number
> of table lookups.

If t[x][y] already exists then this code has 6 lookups; if t[x], 7;
if t only, 8; if not even t, there is an error, as there should be.

The following code reduces 6,7,8 to 4,5,6.

function inc3(t, x, y, z)
   local tx = t[x]
   if not tx then tx={}; t[x] = tx end
   local txy = tx[y]
   if not txy then txy={}; tx[y] = txy end
   local txyz = txy[z];
   if not txyz then txy[z]=1 else txy[z]=txyz+1 end
end

Dirk



Reply | Threaded
Open this post in threaded view
|

Re: fastest way to maintain nested tables of counters?

Mark Hamburg
In reply to this post by Tim Menzies
On May 15, 2011, at 5:48 AM, Tim Menzies wrote:

> is the following code the best way to handle auto-initialization of nested table cells to zero?
>
>  function inc3(c,x,y,z) -- inc 3d to zero
>    c[x] = c[x] or {}
>    c[x][y] = c[x][y] or {}
>    c[x][y][z] = c[x][y][z] or 0
>    c[x][y][z] = c[x][y][z] + 1
> end

function inc3( c, x, y, z )
        local cx = c[ x ]
        if not cx then cx = { }; c[ x ] = cx end
        local cxy = cx[ y ]
        if not cxy then cxy = { }; cx[ y ] = cxy end
        local cxyz = cxy[ z ]
        if not cxyz then cxyz = 0 end -- No need to assign since we will do so below
        cxy[ z ] = cxyz + 1
end

Faster still may be to use metatables.

local setmetatable = setmetatable
local xyzmeta = { __index = function( t, k ) return 0 end }
local xymeta = { __index = function( t, k ) local v = setmetatable( { }, xyzmeta ); t[ k ] = v; return v end }
local xmeta = { __index = function( t, k ) local v = setmetatable( { }, xymeta ); t[ k ] = v; return v end }

Now, initialize c with:

        c = setmetatable( { }, xmeta )

And do the increment with:

        local cxy = c[ x ][ y ]
        cxy[ z ] = cxy[ z ] + 1

The benefit here is that the table misses get detected in the VM index instruction rather than in a subsequent test. On the other hand, the miss handling is more expensive because it incurs function call overhead. So, whether this is faster in practice depends on how often you miss.

(And a really optimized version of the above code might try to share code between xmeta and xymeta in order to reduce the memory footprint for the code, but at that point we're really getting tweaky.)

Mark


Reply | Threaded
Open this post in threaded view
|

Re: fastest way to maintain nested tables of counters?

Mark Hamburg
In reply to this post by Dirk Laurie
On May 15, 2011, at 8:51 AM, Dirk Laurie wrote:

> If t[x][y] already exists then this code has 6 lookups; if t[x], 7;
> if t only, 8; if not even t, there is an error, as there should be.
>
> The following code reduces 6,7,8 to 4,5,6.
>
> function inc3(t, x, y, z)
>    local tx = t[x]
>    if not tx then tx={}; t[x] = tx end
>    local txy = tx[y]
>    if not txy then txy={}; tx[y] = txy end
>    local txyz = txy[z];
>    if not txyz then txy[z]=1 else txy[z]=txyz+1 end
> end

Missed this one when writing my response. Same as mine except the last line is more clever. Cute.

Mark


Reply | Threaded
Open this post in threaded view
|

Re: fastest way to maintain nested tables of counters?

Edgar Toernig
In reply to this post by Tim Menzies
A boring sunday and I was curious about the performance of the
different implementations, especially regarding the differences
between Lua and LuaJIT.

A quick summary - explanation further down:
  -- ORIG ---- lua 7.23s, luajit 0.24s/0.62s ---
  -- OPT ----- lua 2.91s, luajit 0.14s/0.32s ---
  -- NAIVE --- lua 2.03s, luajit 3.83s/21.34s --- (no typo!)
  -- TUNED --- lua 1.84s, luajit 0.15s/1.21s ---


I noticed that LuaJIT is sometimes MUCH slower if the test loop is
slightly modified (Lua shows no difference) so there are two numbers
for LuaJIT, the first for this test loop:

|-- x y n z
|for x=1,100 do
|  for y=1,100 do
|    for n=1,10 do
|      for z=1,20 do
|        inc3(c,x,y,z)
|end end end end

and the second for this one:

|-- n x y z
|for n=1,10 do
|  for x=1,100 do
|    for y=1,100 do
|      for z=1,20 do
|        inc3(c,x,y,z)
|end end end end


To the original question ...

Tim Menzies wrote:
>
> is the following code the best way to handle auto-initialization of nested
> table cells to zero?

|--\[[-- ORIG ---- lua 7.23s, luajit 0.24s/0.62s ---
|local function inc3(c,x,y,z)
|    c[x] = c[x] or {}
|    c[x][y] = c[x][y] or {}
|    c[x][y][z] = c[x][y][z] or 0
|    c[x][y][z] = c[x][y][z] + 1
|end
|local c = {}
|--]]----------------------------------------

Best depends.  But it's definitely not the fastest ;-)
You do a lot of redundant table lookups and they are
not for free.  LuaJIT shows what it can do with such
code: 12 to 30 times faster then Lua.


An optiomized version would be:

|--\[[-- OPT ----- lua 2.91s, luajit 0.14s/0.32s ---
|local function inc3(c,x,y,z)
|    local b = c[x]  if b==nil then b={} c[x] = b end
|    local a = b[y]  if a==nil then a={} b[y] = a end
|    a[z] = (a[z] or 0) + 1
|end
|local c = {}
|--]]----------------------------------------

Table lookups/stores are minimized.  And it shows:
about two times faster then the original version -
both for Lua and LuaJIT.


> oh-  i've read the automagical table entry [...] and
> i cant quite justify to myself that that approach is
> preferred to the above.

Well, autotables move the nil-tests to the C side.  You'll
get a performance that is similar to pre-populated tables.

And, IMO more important, you don't have to perform these
tests for every use.  You don't have to write functions
like inc3 (see next paragraph).

The call "inc3(c,x,y,z)" in the test-loop is replaced by
the inline code:

|   local t=c[x][y]  t[z]=t[z]+1


First a naive implementation of autotables:

|--\[[-- NAIVE --- lua 2.03s, luajit 3.83s/21.34s ---
|local function zero() return 0 end
|local function auto_tab(a,b,c,d,e)
|    return setmetatable({}, {
|        __index = function(t,k)
|            local v = a(b,c,d,e)
|            t[k] = v
|            return v
|        end
|    })
|end
|local c = auto_tab(auto_tab, auto_tab, zero)
|--]]----------------------------------------

It's expensive on the memory side (each table has its own
metatable and a unique index-function with upvalues) but
Lua can handle it quite well, about 30% faster then the
optimized version though most of this speedup is due to
the inlined increment code (one less function call).

But what's LuaJIT doing?  Especially with the alternative
test loop?  21s?!?  Ten times slower than Lua!


Ok, time for a tuned "unrolled" auto-table version:

|--\[[-- TUNED --- lua 1.84s, luajit 0.15s/1.21s ---
|local auto_0, auto_t0, auto_tt0
|local meta_0   = { __index = function(t,k) local v=0         t[k]=v return v end }
|local meta_t0  = { __index = function(t,k) local v=auto_0()  t[k]=v return v end }
|local meta_tt0 = { __index = function(t,k) local v=auto_t0() t[k]=v return v end }
|function auto_0()   return setmetatable({}, meta_0) end
|function auto_t0()  return setmetatable({}, meta_t0) end
|function auto_tt0() return setmetatable({}, meta_tt0) end
|local c = auto_tt0()
|--]]----------------------------------------

This version is only slightly faster (1.84s vs 2.03s) but it
needs no additional memory.

LuaJIT is back on track, about the same runtime as the optimized
non-auto-table version (impressive) but it still chokes on the
alternative test-loop (8 times slower but still faster than
Lua).


> but i'd like the advice of some Lua gurus!

Well, the usual advice: don't optimize too early ;-)

I.e.: For 2 million increments you can save at max 7.2seconds.
If the total runtime to generated these 2 million increments
is 10 minutes, you save 1.2% runtime - not worth looking at.

Ciao, ET.

Full test code attached ...

autotabs.lua (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: fastest way to maintain nested tables of counters?

Romulo
In reply to this post by Dirk Laurie
On Sun, May 15, 2011 at 12:51, Dirk Laurie <[hidden email]> wrote:

> The following code reduces 6,7,8 to 4,5,6.
>
> function inc3(t, x, y, z)
>    local tx = t[x]
>    if not tx then tx={}; t[x] = tx end
>    local txy = tx[y]
>    if not txy then txy={}; tx[y] = txy end
>    local txyz = txy[z];
>    if not txyz then txy[z]=1 else txy[z]=txyz+1 end
> end

Another alternative (haven't benchmarked. sorry):

function inc3( t, x, y, z )
  local a = t[ x ] if a == nil then t[ x ] = { [ y ] = { [ z ] = 1 }
}; return end
  local b = a[ y ] if b == nil then a[ y ] = { [ z ] = 1 }; return end
  local c = b[ z ] if c == nil then b[ z ] = 1 else b[ z ] = c + 1 end
end

--rb