[ANN] Clue 0.1.1

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANN] Clue 0.1.1

David Given
Clue now has a real home on SourceForge:

http://cluecc.sourceforge.net/

No, I'm not planning to announce every single version here, but you
might be interested to know that after refactoring and applying some
pretty standard optimisations to avoid memory allocations and table
dereferences, the Whetstone score has gone up a bit --- instead of 7,
I'm now getting 159. Given that C compiled with gcc is getting 866, this
means it's now running at a fifth the speed of native code. Admittedly,
Whetstone is an artifical known-to-be-poor benchmark, but that's still
rather impressive, and certainly in the fast-enough-to-be-useful camp.

(That's with LuaJIT 1.1.4. Interpreted Lua gets a Whetstone score of 35,
 which is still not bad. I'm rather curious to know what it would be
like with LuaJIT 2 when that comes out.)

(Does anyone know any decent but simple C benchmarks it would be worth
trying?)

(And I *really* wish that Lua had a GOTO statement.)

-- 
ââââ ïïïïïïïïïïïïïï âââââ http://www.cowlark.com âââââ
â "I have always wished for my computer to be as easy to use as my
â telephone; my wish has come true because I can no longer figure out
â how to use my telephone." --- Bjarne Stroustrup

Attachment: signature.asc
Description: OpenPGP digital signature

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Clue 0.1.1

Mike Pall-87
David Given wrote:
> I'm rather curious to know what it would be
> like with LuaJIT 2 when that comes out.

This won't work since you're patching Lua bytecode. LJ2 has a
completely different bytecode. It also stores more details needed
by the trace recorder for efficient tracing (e.g. loop control
flow or stack slot allocation). The fact that Lua is a goto-free
language simplifies the design a lot. :-)

But LJ2 is of course fully compatible to Lua at the source level.
So if you want to use it as a backend, you need to transform the
BB graph into Lua source with standard control flow constructs.
See Muchnick, chapter 7.7: "Structural Analysis".

--Mike

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Clue 0.1.1

David Given
Mike Pall wrote:
[...]
> But LJ2 is of course fully compatible to Lua at the source level.
> So if you want to use it as a backend, you need to transform the
> BB graph into Lua source with standard control flow constructs.
> See Muchnick, chapter 7.7: "Structural Analysis".

I hadn't even considered that it was possible to do that in all cases
--- I'll go and look into it (because it'll solve major problems with
the other language backends such as Javascript).

Can you suggest any other references? Muchnick's book seems to be rather
hard to get hold of (and is apparently notoriously full of bugs, but
that's another matter).

-- 
ââââ ïïïïïïïïïïïïïï âââââ http://www.cowlark.com âââââ
â "I have always wished for my computer to be as easy to use as my
â telephone; my wish has come true because I can no longer figure out
â how to use my telephone." --- Bjarne Stroustrup

Attachment: signature.asc
Description: OpenPGP digital signature

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Clue 0.1.1

David Kessler
In reply to this post by David Given
> (Does anyone know any decent but simple C benchmarks it would be worth
> trying?)

Check out the HINT (Hierarchical Integration) benchmark: http://hint.byu.edu/

HINT is somewhat unique in that it produces not just a single number,
but a graph showing the results over a large range of data set sizes.
The resulting graph makes it trivial to see how performance degrades
as the working set size exceeds the data caches and even the physical
memory. This can be done manually with some benchmarks (like LINPACK),
but with HINT it's designed in.

--
Dave Kessler

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Clue 0.1.1

Mike Pall-87
In reply to this post by David Given
David Given wrote:
> Mike Pall wrote:
> [...]
> > But LJ2 is of course fully compatible to Lua at the source level.
> > So if you want to use it as a backend, you need to transform the
> > BB graph into Lua source with standard control flow constructs.
> > See Muchnick, chapter 7.7: "Structural Analysis".
> 
> I hadn't even considered that it was possible to do that in all cases
> --- I'll go and look into it (because it'll solve major problems with
> the other language backends such as Javascript).

The basic idea is to construct a depth-first spanning tree for the
graph and then do a postorder matching against the supported
region types. You may discover some irreducible cases that need
special handling.

> Can you suggest any other references? Muchnick's book seems to be rather
> hard to get hold of (and is apparently notoriously full of bugs, but
> that's another matter).

You can get a Chinese copy of most of these textbooks for a few
dollars (same content in English, just with a Chinese cover). And
I wouldn't say it's full of bugs, but then I rarely read the code
examples. The biggest problem is that it's rather ancient by
todays standards (e.g. it mentions SSA only in passing). AFAIK
there is no up-to-date, comprehensive textbook on modern compiler
design.

The references given for chapter 7.7 are:

[Shar80] Sharir, Micha. Structural Analysis: A New Approach to
Flow Analysis in Optimizing Compilers, Computer Languages, Vol. 5,
Nos. 3/4, 1980, pp. 141-153

[Rose77] Rosen, Barry K. High-Level Data Flow Analysis, CACM,
Vol. 20, No. 10, Oct. 1977, pp. 712-724

[Rose79] Rosen, Barry K. Data Flow Analysis for Procedural
Languages, JACM, Vol. 26, No. 2, Apr. 1977, pp. 322-344

[SchS79] Schwartz, Jacob T. and Micha Sharir. A Design for
Optimizations of the Bitvectoring Class, Comp. Sci. Rept. No. 17,
Courant Inst. of Math. Sci., New York Univ., New York, NY, 1979

[JaiT88] Jain, Sunil and Carol Thompson. An Efficient Approach to
Data Flow Analysis in a Multiple Pass Global Optimizer, in
[PLDI88], pp. 154-163

Unfortunately most of them are behind paywalls. You may want to
check the cited-by papers to find newer treatments of the same
topic.

For a deeper understanding of the issues at hand, I also recommend
reading this (but it's only applicable to goto-free languages):

Marc M. Brandis and Hans-Peter Moessenboeck, Single-Pass
Generation of Static Single Assignment Form for Structured
Languages, ACM TOPLAS, Vol. 16, No. 6, Nov. 1994, pp 1684-1698
http://citeseer.ist.psu.edu/brandis94singlepas.html

--Mike

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Clue 0.1.1

Tony Finch
In reply to this post by Mike Pall-87
On Tue, 15 Jul 2008, Mike Pall wrote:
>
> This won't work since you're patching Lua bytecode.
> LJ2 has a completely different bytecode. [...]
> But LJ2 is of course fully compatible to Lua at
> the source level. [...]

Would it not be reasonable to translate standard Lua bytecode
to LuaJIT2 bytecode?

Tony.
-- 
f.anthony.n.finch  <[hidden email]>  http://dotat.at/
LUNDY FASTNET IRISH SEA: WEST OR SOUTHWEST VEERING NORTHWEST 4 OR 5,
OCCASIONALLY 6 IN IRISH SEA. SLIGHT OR MODERATE. DRIZZLE WITH FOG PATCHES THEN
FAIR. MODERATE, OCCASIONALLY VERY POOR, BECOMING GOOD.

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Clue 0.1.1

Mike Pall-87
Tony Finch wrote:
> On Tue, 15 Jul 2008, Mike Pall wrote:
> > This won't work since you're patching Lua bytecode.
> > LJ2 has a completely different bytecode. [...]
> > But LJ2 is of course fully compatible to Lua at
> > the source level. [...]
> 
> Would it not be reasonable to translate standard Lua bytecode
> to LuaJIT2 bytecode?

Well, maybe. But you'd need to recreate some semantic information
that is lost in the translation from Lua source to standard Lua
bytecode. It's much simpler to translate from Lua source to the
new bytecode (and that's the only thing I'll support).

Umm, and this doesn't mean you'd be able to run bytecode patched
with gotos. Ok, the interpreter would grok it, but the region
analysis used by the trace recorder might get confused.

--Mike

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Clue 0.1.1

Florian Weimer
* Mike Pall:

> Umm, and this doesn't mean you'd be able to run bytecode patched
> with gotos. Ok, the interpreter would grok it, but the region
> analysis used by the trace recorder might get confused.

But doesn't Lua have got gotos?  Non-escaping local functions with
tailcalls are kind of equivalent.

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Clue 0.1.1

Mike Pall-87
Florian Weimer wrote:
> > Umm, and this doesn't mean you'd be able to run bytecode patched
> > with gotos. Ok, the interpreter would grok it, but the region
> > analysis used by the trace recorder might get confused.
> 
> But doesn't Lua have got gotos?  Non-escaping local functions with
> tailcalls are kind of equivalent.

Yes, this is handled by the trace recorder. Recording continues
across tail-calls, effectively inlining the target function. The
trace is aborted when the same target appears more than once,
except if it's the initiator. Amortized over time this has the
same effect as region analysis for regular loops. But it's less
precise and may need multiple attempts to discover natural loops.

David's current solution patches the bytecode with JMPs. These may
jump between arbitrary regions, e.g. from the middle of one loop
into the middle of another. Even assuming neither loop has any
locals (since they could alias under Lua's stack discipline) the
shortcuts used to determine the live stack slots for loop exits
might fail. And the tree building heuristics would have trouble
deciding whether a trace fragment stays inside the loop or not.

--Mike

Reply | Threaded
Open this post in threaded view
|

Re: [ANN] Clue 0.1.1

David Given
In reply to this post by Mike Pall-87
Mike Pall wrote:
[...]
>> Can you suggest any other references? Muchnick's book seems to be rather
>> hard to get hold of (and is apparently notoriously full of bugs, but
>> that's another matter).
> 
> You can get a Chinese copy of most of these textbooks for a few
> dollars (same content in English, just with a Chinese cover). And
> I wouldn't say it's full of bugs, but then I rarely read the code
> examples. The biggest problem is that it's rather ancient by
> todays standards (e.g. it mentions SSA only in passing). AFAIK
> there is no up-to-date, comprehensive textbook on modern compiler
> design.

Surprisingly enough I actually found a copy in the tiny reference
library at work (sandwiched between vols. I-III of _The Art of Computer
Programming_ and someone else's copy of the Dragon Book) --- complete
with the big wodge of errata.

Having looked at the relevant chapter, it does look promising, but it's
not a complete solution. It's trivially easy in C to construct a graph
that won't neatly decompose into primitives that my target language
supports. Even something as simple as a C switch statement can't be
represented with if...then...else due to the requirement to fall through
from one block to another, for example.

So I'll bear it in mind, and thanks very much for the recommendation,
but I'll stick with gotos for the time being until I've had a chance to
vet how much of a performance impact using emulated gotos will have.
(The fact that writing graph traversal code is possibly my least
favourite programming task --- even ranking below documentation --- has
nothing to do with this...)

-- 
ââââ ïïïïïïïïïïïïïï âââââ http://www.cowlark.com âââââ
â "I have always wished for my computer to be as easy to use as my
â telephone; my wish has come true because I can no longer figure out
â how to use my telephone." --- Bjarne Stroustrup

Attachment: signature.asc
Description: OpenPGP digital signature