ANN: LuaJIT 1.1.0

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Andras Balogh-4
Having the FPU operate in full precision mode does not mean that you have to send doubles to the GPU. I'm using 32 bit floats for geometry too. This flag only means that internally, the FPU computes everything in single/double precision. Besides, this 580FPS is an ill test case, normal apps don't run at this rate. So it is probably limited by CPU, in which case, I think that setting and resetting the FPU registers in every API call probably hurts a bit (even then, 3.5% is nothing). If DirectX just didn't touch it, it would probably be fine. I don't think there's any speed difference internally. Modern FPUs do most instructions in one cycle.. Sure, if you do software transforms (god forbid rasterizing), then it _might_ give you an edge, but I just don't think it's worth the trouble. Besides, DirectX was designed by humans too, it's not without design flaws...



andras


On Thu, 16 Mar 2006 07:35:18 -0700, Framework Studios: Hugo <[hidden email]> wrote:

Well, one number I found on the decrease of Direct3D's speed with and without the FPU preserve flag:

http://discuss.microsoft.com/SCRIPTS/WA-MSD.EXE?A2=ind0504b&L=directxdev&D=1&P=3121

with: 560 fps
without: 580 fps

However I think it is a bit beside the point to 'prove' this with numbers since DirectX more or less already chose single precision for us (for a good reason, I trust). Also it seems logical for a 3D API to be faster when using float-s in stead of double-s because twice the data can be pushed to the GPU with the same bandwidth / stored in VRAM. Isn't this the same for OpenGL?

Looking at the performance of double vs float on modern CPU-s should be interesting though. Are double-s faster, slower or the same compared to float-s on 32-bit and 64-bit CPU architecture? What about the CPU-s people are actually using on average at the moment? (to sell games we need to look at what is average on the market, not only to what is top-notch :)

        Cheers,
                Hugo

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Framework Studios: Hugo
If it 'only' means the FPU computes things differently internally, why does Lua start crashing as soon as DX sets this flag to single-precision? :-)

Using double-precision with DX means the driver will be setting/resetting the flag a lot (run-time) whilst this can be avoided simply by only using single-precision (compile-time). You make a good point DirectX itself would probably be fine on modern CPU-s if it wouldn't touch the flag at all, however this is not our choice; it simply is as it is.

A performance hit of 3.5% just for having some FPU flag in the correct state without even knowing if it actually really is used (SSE2) sounds like a very good reason not to use double-precision to me.

   Cheers,
       Hugo

P.S.: design-flaws -> check out DirectShow. ;-)

----- Original Message ----- From: "Andras Balogh" <[hidden email]>
To: "Lua list" <[hidden email]>
Sent: Thursday, March 16, 2006 4:51 PM
Subject: Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX


Having the FPU operate in full precision mode does not mean that you have to send doubles to the GPU. I'm using 32 bit floats for geometry too. This flag only means that internally, the FPU computes everything in single/double precision. Besides, this 580FPS is an ill test case, normal apps don't run at this rate. So it is probably limited by CPU, in which case, I think that setting and resetting the FPU registers in every API call probably hurts a bit (even then, 3.5% is nothing). If DirectX just didn't touch it, it would probably be fine. I don't think there's any speed difference internally. Modern FPUs do most instructions in one cycle.. Sure, if you do software transforms (god forbid rasterizing), then it _might_ give you an edge, but I just don't think it's worth the trouble. Besides, DirectX was designed by humans too, it's not without design flaws...



andras


On Thu, 16 Mar 2006 07:35:18 -0700, Framework Studios: Hugo <[hidden email]> wrote:

Well, one number I found on the decrease of Direct3D's speed with and without the FPU preserve flag:

http://discuss.microsoft.com/SCRIPTS/WA-MSD.EXE?A2=ind0504b&L=directxdev&D=1&P=3121

with: 560 fps
without: 580 fps

However I think it is a bit beside the point to 'prove' this with numbers since DirectX more or less already chose single precision for us (for a good reason, I trust). Also it seems logical for a 3D API to be faster when using float-s in stead of double-s because twice the data can be pushed to the GPU with the same bandwidth / stored in VRAM. Isn't this the same for OpenGL?

Looking at the performance of double vs float on modern CPU-s should be interesting though. Are double-s faster, slower or the same compared to float-s on 32-bit and 64-bit CPU architecture? What about the CPU-s people are actually using on average at the moment? (to sell games we need to look at what is average on the market, not only to what is top-notch :)

        Cheers,
Hugo


Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

SevenThunders
In reply to this post by Framework Studios: Hugo
Well we've had SSE2 instructions since the pentium 4 was released in Nov. of
2000.  All of AMDs recent releases support it (though perhaps the FPU is
faster on AMDs offerings I don't know).  Will your graphics application even
run on these older processors?

I have to admit that I said the FPU was deprecated with tongue in cheek. 
That is Intel's intention in their documentation.  Compiler support for SSE
seems somewhat spotty, especially when optimization comes into play.  I've
noticed that enabling the SSE estensions using microsofts supposedly
optimizing compiler rarely improves performance for pure double floating
point operations.  Hand coded libraries such as say numerical linear algebra
(e.g. BLAS) have made good use of SSE however.  Also once you start using
single precision, SSE may actually be worth it. Again in SSE you can do
twice as many FLOPS per clock cycle, not just load twice as many floating
point words.  So bottom line is, if you are concerned with performance, and
you are going to use single precision, try to use SSE if possible.  This
will avoid the need to do the FPU switch to single precision.
--
View this message in context: http://www.nabble.com/ANN%3A-LuaJIT-1.1.0-t1273815.html#a3445696
Sent from the Lua - General forum at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Andras Balogh-4
In reply to this post by Framework Studios: Hugo
On Thu, 16 Mar 2006 09:27:41 -0700, Framework Studios: Hugo <[hidden email]> wrote:

If it 'only' means the FPU computes things differently internally, why does Lua start crashing as soon as DX sets this flag to single-precision? :-)

I don't know why it crashes, and I don't want to start guessing here.. Still, floating point bugs can cause FPU exceptions, so it's not impossible that this causes a crash. And as far as I know, this flag really only changes the FPU control word...

A performance hit of 3.5% just for having some FPU flag in the correct state without even knowing if it actually really is used (SSE2) sounds like a very good reason not to use double-precision to me.

Don't forget that this 3.5% gain was measured in an ill formed test case. In real apps I doubt that you would see this much difference. Anyways, this is your call, if there's an easy fix, then good for you, but I would rather opt to just avoid all the trouble altogether. You never know when it's gonna pop up again in some form of mysterious bug...

jm2c,


andras

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Dave Dodge
In reply to this post by Framework Studios: Hugo
On Thu, Mar 16, 2006 at 03:35:18PM +0100, Framework Studios: Hugo wrote:
> Also it seems logical for a 3D API to be faster when using float-s
> in stead of double-s because twice the data can be pushed to the GPU
> with the same bandwidth / stored in VRAM. Isn't this the same for
> OpenGL?

There's about two dozen variations of the glVertex() function.  If you
want to submit your coordinates to OpenGL as single-precision rather
than double-precision, the API allows for it (likewise regular and
short integers).  Granted, whether the data actually reaches the GPU
without being converted to some other type internally depends on the
implementation.

I don't know enough about Direct3D to say whether it works similarly,
or if it instead just expects floats everywhere.

                                                  -Dave Dodge

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Framework Studios: Hugo
In reply to this post by Andras Balogh-4
If it 'only' means the FPU computes things differently internally, why does Lua start crashing as soon as DX sets this flag to single-precision? :-)

I don't know why it crashes, and I don't want to start guessing here..

I ment it as a rhetorical question. ;) It crashes because it fails at least on the number2int conversion trick (see first link below). The issue has been discussed in this mailing list before:

http://lua-users.org/lists/lua-l/2005-10/msg00254.html
http://lua-users.org/lists/lua-l/2005-11/msg00044.html
http://lua-users.org/lists/lua-l/2005-10/msg00265.html
http://lua-users.org/lists/lua-l/2006-03/msg00320.html

Don't forget that this 3.5% gain was measured in an ill formed test case. In real apps I doubt that you would see this much difference. Anyways,

I'm not saying that one test Google produced gives us the 3.5% as scientifically correct, well-researched proof at all. It simply is yet another indication DX can get slower when the FPU is in double-precision as stated in the documentation that comes with the DX SDK.

In real game apps, a possible 3.5% performance hit which can easily be avoided I would say definitely is worth the trouble. Not making the easy fix optimizing for 'only' 3.5% at least in our team would raise an eyebrowe or two. ;) (who knows, maybe it is 5%... or maybe 2%... maybe it depends on the driver, another piece of software we don't control...)

this is your call, if there's an easy fix, then good for you, but I would rather opt to just avoid all the trouble altogether. You never know when it's gonna pop up again in some form of mysterious bug...

The easy fix / avoiding the trouble altogether would of course be to define Lua to use float in stead of double. ;-)

   Cheers,
Hugo

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Framework Studios: Hugo
In reply to this post by SevenThunders
Thanks for the tip! I'll definitely have a look at BLAS. I'm using D3DX for most math operations now, which is supposed to be optimised for MMX, SSE etcetera.

Our applications work on Windows'2000 with DirectX9 (the latest DX9 SDK doesn't run on Windows 2000, but DX9 itself does). So theoretically someone with a PC from 2000 and an AMD 900MHz (not sure if I'm naming a CPU without SSE here) would be able to run our applications.

The thing I don't understand is why it would be a bad thing to let DX switch the (almost deprecated anyhow ;) FPU to single precision, which is exactly what it 'wants'. It only means compiling Lua to use float-s, why would that be a bad thing? :)

   Thanks again!
                   Hugo

----- Original Message ----- From: "SevenThunders" <[hidden email]>
To: <[hidden email]>
Sent: Thursday, March 16, 2006 10:54 PM
Subject: Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX



Well we've had SSE2 instructions since the pentium 4 was released in Nov. of
2000.  All of AMDs recent releases support it (though perhaps the FPU is
faster on AMDs offerings I don't know). Will your graphics application even
run on these older processors?

I have to admit that I said the FPU was deprecated with tongue in cheek.
That is Intel's intention in their documentation. Compiler support for SSE
seems somewhat spotty, especially when optimization comes into play.  I've
noticed that enabling the SSE estensions using microsofts supposedly
optimizing compiler rarely improves performance for pure double floating
point operations. Hand coded libraries such as say numerical linear algebra
(e.g. BLAS) have made good use of SSE however.  Also once you start using
single precision, SSE may actually be worth it. Again in SSE you can do
twice as many FLOPS per clock cycle, not just load twice as many floating
point words. So bottom line is, if you are concerned with performance, and
you are going to use single precision, try to use SSE if possible.  This
will avoid the need to do the FPU switch to single precision.
--
View this message in context: http://www.nabble.com/ANN%3A-LuaJIT-1.1.0-t1273815.html#a3445696
Sent from the Lua - General forum at Nabble.com.



Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Adam D. Moss
Framework Studios: Hugo wrote:
The thing I don't understand is why it would be a bad thing to let DX switch the (almost deprecated anyhow ;) FPU to single precision, which is exactly what it 'wants'. It only means compiling Lua to use float-s, why would that be a bad thing? :)

I can't help thinking that this thread is going in circles (actually
approximately its third rotation so far just in this incarnation).
If you don't think that this question has been answered thoroughly
by now then I don't reckon there's really any mileage in rehashing
things yet again -- use float-y Lua and live in bliss. :)

--adam

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Framework Studios: Hugo
Amen. ;)

----- Original Message ----- From: "Adam D. Moss" <[hidden email]>
To: "Lua list" <[hidden email]>
Sent: Friday, March 17, 2006 11:33 AM
Subject: Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX


Framework Studios: Hugo wrote:
The thing I don't understand is why it would be a bad thing to let DX switch the (almost deprecated anyhow ;) FPU to single precision, which is exactly what it 'wants'. It only means compiling Lua to use float-s, why would that be a bad thing? :)

I can't help thinking that this thread is going in circles (actually
approximately its third rotation so far just in this incarnation).
If you don't think that this question has been answered thoroughly
by now then I don't reckon there's really any mileage in rehashing
things yet again -- use float-y Lua and live in bliss. :)

--adam

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Zachary P. Landau-4
In reply to this post by Mike Pall-3-2
> Narrowing numbers to integers with help from the optimizer is one
> way to go. Dual number support (int + double) would have benefits
> for embedded CPUs (lacking FPU hardware). But it's tricky to get
> this fast for the interpreter and even more so for compiled code.
> I guess pure integer support is too limiting for most embedded
> projects (but would be really fast). [I need feedback on this
> topic from people who use Lua on embedded devices.]

Mike,

I meant to respond to this a while ago and it slipped my mind.

So far I have used Lua for two embedded devices and for both of
them I've used integers exclusively.  Using floating point is way
too slow.  At least for the applications I've usd Lua for, integers
did just fine.  I can think of embedded applications where integer
only support would be a problem, but I think there are enough
applications that don't need it to justify a version of LuaJIT without it.

Just for your reference, one of the devices I used Lua with ran on a
Coldfire and the other on ARM, so LuaJIT running on either of those
platforms would be pretty interesting.

--
Zachary P. Landau <[hidden email]>


12