Speed query

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Speed query

David Given
Given that Lua is an interpreted language, conventional wisdom dictates
that it's most likely going to spend most of its time doing instruction
decodes rather than actual work. Therefore, the fewer the instructions
the better --- no matter what the instructions do.

I want to implement a priority queue. I have two possible strategies
when inserting an item:

(a) binary chop through the array until I find the place to insert the
item, then call table.insert to do it.

(b) append the item at the end of the array and then use table.qsort to
sort the entire array.

(a) involves the least amount of theoretical work. (b) involves the
fewest number of actual instructions. So, which one is faster? And is
the answer still true when using LuaJit?

(Right now I'm going for (b) --- since I can implement it in three lines
of Lua.)

-- 
ââââ ïïïïïïïïïïïïïï âââââ http://www.cowlark.com âââââ
â "I have always wished for my computer to be as easy to use as my
â telephone; my wish has come true because I can no longer figure out
â how to use my telephone." --- Bjarne Stroustrup

Attachment: signature.asc
Description: OpenPGP digital signature

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Patrick Donnelly-3
Why not use a Max-Heap? It satisfies all your requirements...

-- 
-Patrick Donnelly

"One of the lessons of history is that nothing is often a good thing
to do and always a clever thing to say."

-Will Durant

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Ralph Hempel
In reply to this post by David Given
David Given wrote:

I want to implement a priority queue. I have two possible strategies
when inserting an item:

(a) binary chop through the array until I find the place to insert the
item, then call table.insert to do it.

Depends if the insert is near the beginning of the table. Does this
move all the subsequent elements "up" by one?

(b) append the item at the end of the array and then use table.qsort to
sort the entire array.

The table will be in an indeterminate state for a while, if this is
a concern at all. Here, the performance on table.sort() when the table
is already mostly sorted is a concern.

I'm guessing (almost always a bad idea) that both approaches will
move about the same amount of data on average, and that your binary
search code for approach (a) will take longer to write and debug than
the total time you'll spend running approach (b) :-)

Cheers, Ralph

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Javier Guerra Giraldez
In reply to this post by David Given
On Tue, Mar 25, 2008 at 5:39 PM, David Given <[hidden email]> wrote:
> Given that Lua is an interpreted language, conventional wisdom dictates
>  that it's most likely going to spend most of its time doing instruction
>  decodes rather than actual work. Therefore, the fewer the instructions
>  the better --- no matter what the instructions do.

you'd be surprised (as i am, often) to see how efficient a good VM is
(and Lua is one of the bestest's)

>  I want to implement a priority queue. I have two possible strategies
>  when inserting an item:
>
>  (a) binary chop through the array until I find the place to insert the
>  item, then call table.insert to do it.
>
>  (b) append the item at the end of the array and then use table.qsort to
>  sort the entire array.

(c) a heap queue: http://en.wikipedia.org/wiki/Heap_(data_structure)


>  (a) involves the least amount of theoretical work. (b) involves the
>  fewest number of actual instructions. So, which one is faster? And is
>  the answer still true when using LuaJit?

in most cases (b) is far faster than (a), not because of VM overhead,
but because sorts are so optimized.  just be sure to sort only when
needed and not after each insert.


-- 
Javier

Reply | Threaded
Open this post in threaded view
|

RE: Speed query

Brian Weed
In reply to this post by David Given
Not actually answering your question...

With a finite number of priorities, maybe it would be faster to use the priority as a hash into a table of queues.

q = {}

...

q[priority] = q[priority] or {}

table.insert(q[priority], item)

Then you pull items from the "topmost" non-empty queue.

Brian
 

-----Original Message-----
From: [hidden email] [[hidden email]] On Behalf Of David Given
Sent: Tuesday, March 25, 2008 6:39 PM
To: Lua list
Subject: Speed query

Given that Lua is an interpreted language, conventional wisdom dictates that it's most likely going to spend most of its time doing instruction decodes rather than actual work. Therefore, the fewer the instructions the better --- no matter what the instructions do.

I want to implement a priority queue. I have two possible strategies when inserting an item:

(a) binary chop through the array until I find the place to insert the item, then call table.insert to do it.

(b) append the item at the end of the array and then use table.qsort to sort the entire array.

(a) involves the least amount of theoretical work. (b) involves the fewest number of actual instructions. So, which one is faster? And is the answer still true when using LuaJit?

(Right now I'm going for (b) --- since I can implement it in three lines of Lua.)

--
┌─── dg@cowlark.com ───── http://www.cowlark.com ───── │ "I have always wished for my computer to be as easy to use as my │ telephone; my wish has come true because I can no longer figure out │ how to use my telephone." --- Bjarne Stroustrup


Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Geoff Leyland
In reply to this post by David Given
Good question. I just tested (b) vs a pure lua binary heap and the binary heap is so much better (the sort version still hasn't finished) that I think I must have got (b) wrong (does table.qsort exist? I can only find table.sort documented. Which is better out of table_insert(t, x) and t[#t+1] = x?)

If you want to try my heap code I can send it, but I think there's one in loop that's bound to be better.

Cheers,
Geoff

On 26/03/2008, at 11:39 AM, David Given wrote:
Given that Lua is an interpreted language, conventional wisdom dictates that it's most likely going to spend most of its time doing instruction
decodes rather than actual work. Therefore, the fewer the instructions
the better --- no matter what the instructions do.

I want to implement a priority queue. I have two possible strategies
when inserting an item:

(a) binary chop through the array until I find the place to insert the
item, then call table.insert to do it.

(b) append the item at the end of the array and then use table.qsort to
sort the entire array.

(a) involves the least amount of theoretical work. (b) involves the
fewest number of actual instructions. So, which one is faster? And is
the answer still true when using LuaJit?

(Right now I'm going for (b) --- since I can implement it in three lines
of Lua.)

--
ââââ ïïïïïïïïïïïïïï âââââ http://www.cowlark.com âââââ
â "I have always wished for my computer to be as easy to use as my
â telephone; my wish has come true because I can no longer figure out
â how to use my telephone." --- Bjarne Stroustrup





Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Peter Cawley
In reply to this post by David Given
The number of lines is not always equivalent to the number of VM instructions.

On 25/03/2008, David Given <[hidden email]> wrote:
>  (Right now I'm going for (b) --- since I can implement it in three lines
>  of Lua.)

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Frédéric van der Plancke
In reply to this post by Geoff Leyland
Geoff Leyland wrote:
Good question. I just tested (b) vs a pure lua binary heap and the binary heap is so much better (the sort version still hasn't finished) that I think I must have got (b) wrong
Well, not necessarily.

On a queue of n items, an insertion or removal costs O(log n) (at worst) with a binary heap, O(n log n) (on average) with quicksort. O(x) roughly means "proportional to x".

That makes the quicksort implementation about n times slower (x some constant value). That factor n may overhelm the advantage of quicksorting in C instead of in Lua.

Also, if the quicksort is naively implemented, you may be hitting a weakness of the algorithm, such that the sorting time becomes O(n^2) instead of O(n log n) : nearly another factor n. (Someone else already pointed that out.)

FrÃdÃric


Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Roberto Ierusalimschy
In reply to this post by Geoff Leyland
> does table.qsort exist?  I can only find table.sort documented.

table.sort uses the quicksort algorithm, if that is what you mean.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

David Given
In reply to this post by Geoff Leyland
Geoff Leyland wrote:
Good question. I just tested (b) vs a pure lua binary heap and the binary heap is so much better (the sort version still hasn't finished) that I think I must have got (b) wrong (does table.qsort exist? I can only find table.sort documented. Which is better out of table_insert(t, x) and t[#t+1] = x?)

If you want to try my heap code I can send it, but I think there's one in loop that's bound to be better.

That deeply surprises me, but I can hardly argue with results --- so, yes please, I'd love a copy.

(For context: it's to handle a message queue. The priorities are timestamps and will be floats. Therefore bundling up equal priorities isn't going to be any use, although it might be worth special-casing the 'now' timestamp.)

I've looked at loop. Unfortunately, loop uses lots of infrastructure I don't want to use, and is also very focused on the Javascripty data-and-methods-in-the-same-namespace model which makes me rather uneasy (although priority queues are one exception), so I'd rather have my own version where I know what it does.

Thanks for all the comments, people!

--
David Given
[hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Geoff Leyland
On 27/03/2008, at 2:39 AM, Roberto Ierusalimschy wrote:
does table.qsort exist?  I can only find table.sort documented.

table.sort uses the quicksort algorithm, if that is what you mean.

Thanks. David had written table.qsort, rather than table.sort, so I was wondering. The note in the reference manual saying it's not a stable sort made me suspect as much.

On 27/03/2008, at 5:45 AM, David Given wrote:
Geoff Leyland wrote:
Good question. I just tested (b) vs a pure lua binary heap and the binary heap is so much better (the sort version still hasn't finished) that I think I must have got (b) wrong (does table.qsort exist? I can only find table.sort documented. Which is better out of table_insert(t, x) and t[#t+1] = x?) If you want to try my heap code I can send it, but I think there's one in loop that's bound to be better.

That deeply surprises me, but I can hardly argue with results --- so, yes please, I'd love a copy.

But you can - I was being a bit unfair and making the sort version re- sort after every insertion, rather than at every transition between insertion and removal. With that changed I think it's fairer.

The test I used is insert X items, remove Y, and repeat Z times.
For X=10,000, y=7,500 and Z=100 (final queue length 250,000), in lua the sort is slightly faster. With luajit the heap is slightly faster. For X=1,000, y=750 and Z=1,000 (final queue length 250,000) the heap is about 10 times faster in both cases.

So I guess the answer is that it depends what you're doing.

Here's the code with the tests.  No guarantees it works :)

Cheers,
Geoff

Attachment: binary_heap-test.lua
Description: Binary data

Attachment: binary_heap.lua
Description: Binary data

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Javier Guerra Giraldez
On Wed, Mar 26, 2008 at 1:43 PM, Geoff Leyland
<[hidden email]> wrote:
>  The test I used is insert X items, remove Y, and repeat Z times.
>  For X=10,000, y=7,500 and Z=100 (final queue length 250,000), in lua
>  the sort is slightly faster.  With luajit the heap is slightly faster.
>  For X=1,000, y=750 and Z=1,000 (final queue length 250,000) the heap
>  is about 10 times faster in both cases.
>
>  So I guess the answer is that it depends what you're doing.
>
>  Here's the code with the tests.  No guarantees it works :)

a simpleminded optimization (reduce the use of #self) gains a 20% speedup:


--- orig/binary_heap.lua        2008-03-26 13:48:55.000000000 -0500
+++ binary_heap.lua     2008-03-26 15:45:56.000000000 -0500
@@ -23,7 +23,7 @@
 -- info ------------------------------------------------------------------------

 function heap:next_key()
-  assert(#self > 0, "The heap is empty")
+  assert(self[1], "The heap is empty")
   return self[1].key
 end

@@ -50,7 +50,7 @@
 end

 function heap:pop()
-  assert(#self > 0, "The heap is empty")
+  assert(self[1], "The heap is empty")

   local cmp = self.comparison

@@ -58,15 +58,18 @@
   local result = self[1]
   self[1] = nil

+  local ssize = #self
+
   -- push the last element in the heap down from the top
-  local last = self[#self]
+  local last = self[ssize]
   local last_key = (last and last.key) or nil
-  self[#self] = nil
+  self[ssize] = nil
+  ssize = ssize-1

   local parent_index = 1
-  while parent_index * 2 <= #self do
+  while parent_index * 2 <= ssize do
     local child_index = parent_index * 2
-    if child_index+1 <= #self and cmp(self[child_index+1].key,
self[child_index].key) then
+    if child_index+1 <= ssize and cmp(self[child_index+1].key,
self[child_index].key) then
       child_index = child_index + 1
     end
     local child_rec = self[child_index]
@@ -86,11 +89,12 @@

 function heap:check()
   local cmp = self.comparison
+  local ssize = #self
   local i = 1
   while true do
-    if i*2 > #self then return true end
+    if i*2 > ssize then return true end
     if cmp(self[i*2].key, self[i].key) then return false end
-    if i*2+1 > #self then return true end
+    if i*2+1 > ssize then return true end
     if cmp(self[i*2+1].key, self[i].key) then return false end
     i = i + 1
   end


now, it would be nice to add an in-place 'heapify'...


-- 
Javier

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Javier Guerra Giraldez
ok, i wrote the 'h:reheapify()' function:

function heap:reheapify ()
	local cmp = self.comparison
	local ssize = #self
	local i = math_floor (ssize/2)
	local j = i*2
	while i > 0 do
		local j2 = j
		if j2+1 <= ssize and cmp (self[j2+1].key, self[j2].key) then
			j2 = j2+1
		end
		if cmp (self[j2].key, self[i].key) then
			local r = self[j2]
			self[j2] = self[i]
			self[i] = r
		end
		i = i -1
		j = j - 2
	end
end

so now the tests can sloppily insert items at the end, and just call
h:reheapify() before popping them (what i called a 'lazy heap').  that
creates a performance profile similar to the sort()ed queue: with many
small cycles it's far worse, with few large cycles it's a little
better.

but also noted that your speed test of the sort()ed queue just inserts
numbers in the queue, while the heap uses a {key,value} table for each
item... no fair!

with the overhead of creating a table per item, and using a Lua
callback for sorting... it's far,far worse than a heap, both lazy and
eager!

-- 
Javier

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Geoff Leyland
In reply to this post by Javier Guerra Giraldez
On 27/03/2008, at 9:49 AM, Javier Guerra wrote:
On Wed, Mar 26, 2008 at 1:43 PM, Geoff Leyland
<[hidden email]> wrote:
 Here's the code with the tests.  No guarantees it works :)

a simpleminded optimization (reduce the use of #self) gains a 20% speedup:

Nice! I didn't know # was slow, so thanks a bunch for pointing that out Javier.

ok, i wrote the 'h:reheapify()' function:

Thanks. For the stuff I actually use this for it's small cycles so I guess the push is better, but it's definitely worth adding.

but also noted that your speed test of the sort()ed queue just inserts
numbers in the queue, while the heap uses a {key,value} table for each
item... no fair!

with the overhead of creating a table per item, and using a Lua
callback for sorting... it's far,far worse than a heap, both lazy and
eager!

Good point. I added the sort test in a bit of a hurry so it was a little rough...

Cheers,
Geoff

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Tony Finch
In reply to this post by Javier Guerra Giraldez
On Wed, 26 Mar 2008, Javier Guerra wrote:
>
> but also noted that your speed test of the sort()ed queue just inserts
> numbers in the queue, while the heap uses a {key,value} table for each
> item... no fair!

You could avoid this by storing all the keys and values in one table and
the ordering in another, i.e.
	pairs[key] = value
	index[n] = key

Tony.
-- 
f.anthony.n.finch  <[hidden email]>  http://dotat.at/
CROMARTY FORTH TYNE: CYCLONIC 4 OR 5, BECOMING SOUTHEAST 5 TO 7 LATER.
MODERATE. SHOWERS. MODERATE OR GOOD.

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Javier Guerra Giraldez
In reply to this post by Geoff Leyland
On Thursday 27 March 2008, Geoff Leyland wrote:
> On 27/03/2008, at 9:49 AM, Javier Guerra wrote:
> > ok, i wrote the 'h:reheapify()' function:
>
> Thanks.  For the stuff I actually use this for it's small cycles so I
> guess the push is better, but it's definitely worth adding.

it could be useful for heap initialisation, to create a heap from an initial 
data source without having to fed it one by one.  a single heapify() for a 
big data set can be much better than a sort(), even if it's in C



-- 
Javier

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Geoff Leyland
In reply to this post by Tony Finch

On 28/03/2008, at 12:36 AM, Tony Finch wrote:
On Wed, 26 Mar 2008, Javier Guerra wrote:

but also noted that your speed test of the sort()ed queue just inserts numbers in the queue, while the heap uses a {key,value} table for each
item... no fair!

You could avoid this by storing all the keys and values in one table and
the ordering in another, i.e.
	pairs[key] = value
	index[n] = key

Good idea, but I tried it and it didn't seem to make a significant difference with lua or luajit.

I also tried changing the entries to { k, v } rather than { key=k, value=v } figuring I'd swap some hash lookups for index lookups, but it was slightly slower on lua and only a little faster with luajit, so I figured it wasn't worth it.

Javier's #self improvement is even better with luajit. I tried going further by having a manually updated self.size to avoid *all* the #selfs, but it was no better.

Cheers,
Geoff

Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Gé Weijers
I've attached a small priority queue implementation based on skew heaps. It builds and destroys a 100,000 element priority queue in less than 3 seconds on my laptop (1.3 seconds on luajit).

I hope it's of some use.

Gé


Attachment: pq.lua
Description: Binary data


On Mar 27, 2008, at 2:50 PM, Geoff Leyland wrote:

On 28/03/2008, at 12:36 AM, Tony Finch wrote:
On Wed, 26 Mar 2008, Javier Guerra wrote:

but also noted that your speed test of the sort()ed queue just inserts numbers in the queue, while the heap uses a {key,value} table for each
item... no fair!

You could avoid this by storing all the keys and values in one table and
the ordering in another, i.e.
	pairs[key] = value
	index[n] = key

Good idea, but I tried it and it didn't seem to make a significant difference with lua or luajit.

I also tried changing the entries to { k, v } rather than { key=k, value=v } figuring I'd swap some hash lookups for index lookups, but it was slightly slower on lua and only a little faster with luajit, so I figured it wasn't worth it.

Javier's #self improvement is even better with luajit. I tried going further by having a manually updated self.size to avoid *all* the #selfs, but it was no better.

Cheers,
Geoff

--
Gé Weijers
[hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Stefan Sandberg
I was initially quite skeptical, but it's 1,000,000, not 100,000 in the code...


Gé Weijers wrote:
I've attached a small priority queue implementation based on skew heaps. It builds and destroys a 100,000 element priority queue in less than 3 seconds on my laptop (1.3 seconds on luajit).

I hope it's of some use.

Gé



On Mar 27, 2008, at 2:50 PM, Geoff Leyland wrote:

On 28/03/2008, at 12:36 AM, Tony Finch wrote:
On Wed, 26 Mar 2008, Javier Guerra wrote:

but also noted that your speed test of the sort()ed queue just inserts
numbers in the queue, while the heap uses a {key,value} table for each
item... no fair!

You could avoid this by storing all the keys and values in one table and
the ordering in another, i.e.
    pairs[key] = value
    index[n] = key

Good idea, but I tried it and it didn't seem to make a significant difference with lua or luajit.

I also tried changing the entries to { k, v } rather than { key=k, value=v } figuring I'd swap some hash lookups for index lookups, but it was slightly slower on lua and only a little faster with luajit, so I figured it wasn't worth it.

Javier's #self improvement is even better with luajit. I tried going further by having a manually updated self.size to avoid *all* the #selfs, but it was no better.

Cheers,
Geoff

--
Gé Weijers
[hidden email]





Reply | Threaded
Open this post in threaded view
|

Re: Speed query

Javier Guerra Giraldez
In reply to this post by Gé Weijers
On Mon, Mar 31, 2008 at 9:30 AM, GÃ Weijers <[hidden email]> wrote:
> I've attached a small priority queue implementation based on skew
>  heaps. It builds and destroys a 100,000 element priority queue in
>  less than 3 seconds on my laptop (1.3 seconds on luajit).

now that is fast! (less than 2 secs on my desktop without JIT)

and the code is much, much simpler than the binary heap.  i think
mostly because it uses small tables to hold the heap tree, instead of
the old optimization of using an array to fold the tree.  that might
be a big gain on C using fixed-size records, but on Lua it's neater,
cleaner and (at least) as fast to use tables instead of integer
'pointers' on an array.

a lovely humbling lesson.

-- 
Javier


12