Binarytrees benchmark results

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Binarytrees benchmark results

Luís Eduardo Jason Santos
Does anyone knows why the binarytrees benchmark on [1] doesn't have the usual 'killing machine' performance one could expect from Lua?

--
Luís Eduardo Jason Santos

[1] http://shootout.alioth.debian.org/u32/benchmark.php?test=binarytrees&lang=all
Reply | Threaded
Open this post in threaded view
|

Re: Binarytrees benchmark results

Mike Pall-10
Luís Eduardo Jason Santos wrote:
> Does anyone knows why the binarytrees benchmark on [1] doesn't have the
> usual 'killing machine' performance one could expect from Lua?

All the other language implementations either use a bump allocator
with a generational GC or a dedicated pool allocator.

Lua uses plain malloc(). LuaJIT 2.0 uses its own allocator and
colocated allocation for small arrays. As you can see the latter
is 3x-4x faster in this case (the benchmark is still interpreted
in the latest beta release).

But neither is a substitute for a bump allocator. Unfortunately,
using such an allocator implies the need for a moving GC. The
current Lua GC is non-moving. Revising this fundamental design
choice is very complicated while keeping the current Lua/C API.

In the long run this will always be a benchmark where Lua will not
be competitive. But such a high allocation rate is quite unusual
for most real-world Lua programs, anyway.

--Mike
Reply | Threaded
Open this post in threaded view
|

Re: Binarytrees benchmark results

Francesco Abbate-2
2009/12/7 Mike Pall <[hidden email]>:
> Luís Eduardo Jason Santos wrote:
>> Does anyone knows why the binarytrees benchmark on [1] doesn't have the
>> usual 'killing machine' performance one could expect from Lua?
>
> All the other language implementations either use a bump allocator
> with a generational GC or a dedicated pool allocator.

Actually I'm the owner of the fastest program, written in C :-) and it
does use the pool allocator from the APR (apache portable runtime).

In my opinion this test is one of the least interesting since it is
mainly a test of the allocator, so I would not give it so much
importance!

Francesco
Reply | Threaded
Open this post in threaded view
|

Re: Binarytrees benchmark results

Linker
In reply to this post by Mike Pall-10
On Mon, Dec 7, 2009 at 22:30, Mike Pall <[hidden email]> wrote:

But neither is a substitute for a bump allocator. Unfortunately,
using such an allocator implies the need for a moving GC. The
current Lua GC is non-moving. Revising this fundamental design
choice is very complicated while keeping the current Lua/C API.
I think that a moving GC is bad for CPU which has a multi-cache modern architecture . 



--
Regards,
Linker Lin
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Binarytrees benchmark results

Alex Queiroz
Hallo,

On Mon, Dec 7, 2009 at 3:43 PM, Linker <[hidden email]> wrote:

> On Mon, Dec 7, 2009 at 22:30, Mike Pall <[hidden email]> wrote:
>>
>> But neither is a substitute for a bump allocator. Unfortunately,
>> using such an allocator implies the need for a moving GC. The
>> current Lua GC is non-moving. Revising this fundamental design
>> choice is very complicated while keeping the current Lua/C API.
>
> I think that a moving GC is bad for CPU which has a multi-cache modern
> architecture .
>

     One single factor will hardly decide if a moving GC or a
mark-and-sweep one is better. Point in case, the generational GC
allocates memory faster.

--
-alex
@asandroq
http://www.ventonegro.org/
Reply | Threaded
Open this post in threaded view
|

Re: Binarytrees benchmark results

KHMan
Alex Queiroz wrote:

> On Mon, Dec 7, 2009 at 3:43 PM, Linker wrote:
>> On Mon, Dec 7, 2009 at 22:30, Mike Pall wrote:
>>> But neither is a substitute for a bump allocator. Unfortunately,
>>> using such an allocator implies the need for a moving GC. The
>>> current Lua GC is non-moving. Revising this fundamental design
>>> choice is very complicated while keeping the current Lua/C API.
>> I think that a moving GC is bad for CPU which has a multi-cache modern
>> architecture .
>
>      One single factor will hardly decide if a moving GC or a
> mark-and-sweep one is better. Point in case, the generational GC
> allocates memory faster.

To put it in another way... Data needed, otherwise it's
hand-waving!!! :-) ;-)

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia
Reply | Threaded
Open this post in threaded view
|

Re: Binarytrees benchmark results

Linker
In my opinion, the best GC is a stopped GC.
So, we can just halt lua's GC for a while, then restart it when system load is low.


On Tue, Dec 8, 2009 at 00:07, KHMan <[hidden email]> wrote:
Alex Queiroz wrote:
On Mon, Dec 7, 2009 at 3:43 PM, Linker wrote:

On Mon, Dec 7, 2009 at 22:30, Mike Pall wrote:
But neither is a substitute for a bump allocator. Unfortunately,
using such an allocator implies the need for a moving GC. The
current Lua GC is non-moving. Revising this fundamental design
choice is very complicated while keeping the current Lua/C API.
I think that a moving GC is bad for CPU which has a multi-cache modern
architecture .

    One single factor will hardly decide if a moving GC or a
mark-and-sweep one is better. Point in case, the generational GC
allocates memory faster.

To put it in another way... Data needed, otherwise it's hand-waving!!! :-) ;-)

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia



--
Regards,
Linker Lin
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Binarytrees benchmark results

KHMan
Linker wrote:
> In my opinion, the best GC is a stopped GC.
> So, we can just halt lua's GC for a while, then restart it when system
> load is low.

Okay, okay, I'll resist the temptation to poke some more... :-)

But, have you seen a Comp Sci paper that implements your idea? I
don't believe I have. Would be interested if any such paper can be
found.

> On Tue, Dec 8, 2009 at 00:07, KHMan wrote:
>     Alex Queiroz wrote:
>         On Mon, Dec 7, 2009 at 3:43 PM, Linker wrote:
>             On Mon, Dec 7, 2009 at 22:30, Mike Pall wrote:
>
>                 But neither is a substitute for a bump allocator.
>                 Unfortunately,
>                 using such an allocator implies the need for a moving
>                 GC. The
>                 current Lua GC is non-moving. Revising this fundamental
>                 design
>                 choice is very complicated while keeping the current
>                 Lua/C API.
>
>             I think that a moving GC is bad for CPU which has a
>             multi-cache modern
>             architecture .
>
>             One single factor will hardly decide if a moving GC or a
>         mark-and-sweep one is better. Point in case, the generational GC
>         allocates memory faster.
>
>     To put it in another way... Data needed, otherwise it's
>     hand-waving!!! :-) ;-)

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia
Reply | Threaded
Open this post in threaded view
|

Re: Binarytrees benchmark results

Tony Finch
In reply to this post by Mike Pall-10
On Mon, 7 Dec 2009, Mike Pall wrote:
> Luís Eduardo Jason Santos wrote:
> > Does anyone knows why the binarytrees benchmark on [1] doesn't have the
> > usual 'killing machine' performance one could expect from Lua?
> > [1] http://shootout.alioth.debian.org/u32/benchmark.php?test=binarytrees&lang=all
>
> All the other language implementations either use a bump allocator
> with a generational GC or a dedicated pool allocator.

I tried simulating a pool allocator using a Lua table, but it was much
slower.

I then wondered how much you are allowed to deviate from the spirit of the
benchmark. In particular this benchmark's trees aren't modified after
being built, so you can use a trick to represent the tree where the entire
thing is stored in one table, and the children of node t[i] are in t[i*2]
and t[i*2+1]. This turned out to be faster. I've attached the modified
code.

Tony.
--
f.anthony.n.finch  <[hidden email]>  http://dotat.at/
GERMAN BIGHT HUMBER: SOUTHWEST 5 TO 7. MODERATE OR ROUGH. SQUALLY SHOWERS.
MODERATE OR GOOD.

binarytrees.fanf.lua (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Binarytrees benchmark results

Mike Pall-10
Tony Finch wrote:
> I then wondered how much you are allowed to deviate from the spirit of the
> benchmark. In particular this benchmark's trees aren't modified after
> being built, so you can use a trick to represent the tree where the entire
> thing is stored in one table, and the children of node t[i] are in t[i*2]
> and t[i*2+1]. This turned out to be faster. I've attached the modified
> code.

Well, that's not allowed. Despite the name, the intent is to be an
allocation benchmark, not a benchmark for constructing binary
trees. Even the pool allocators must be provided by language
intrinsics or a commonly used external library. [But please let's
not start a discussion about the shootout benchmark rules here.]

--Mike