ANN: LuaJIT 1.1.0

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Andras Balogh-4
Having the FPU operate in full precision mode does not mean that you have  
to send doubles to the GPU. I'm using 32 bit floats for geometry too. This  
flag only means that internally, the FPU computes everything in  
single/double precision. Besides, this 580FPS is an ill test case, normal  
apps don't run at this rate. So it is probably limited by CPU, in which  
case, I think that setting and resetting the FPU registers in every API  
call probably hurts a bit (even then, 3.5% is nothing). If DirectX just  
didn't touch it, it would probably be fine. I don't think there's any  
speed difference internally. Modern FPUs do most instructions in one  
cycle.. Sure, if you do software transforms (god forbid rasterizing), then  
it _might_ give you an edge, but I just don't think it's worth the  
trouble. Besides, DirectX was designed by humans too, it's not without  
design flaws...



andras


On Thu, 16 Mar 2006 07:35:18 -0700, Framework Studios: Hugo  
<[hidden email]> wrote:

> Well, one number I found on the decrease of Direct3D's speed with and  
> without the FPU preserve flag:
>
> http://discuss.microsoft.com/SCRIPTS/WA-MSD.EXE?A2=ind0504b&L=directxdev&D=1&P=3121
>
> with: 560 fps
> without: 580 fps
>
> However I think it is a bit beside the point to 'prove' this with  
> numbers since DirectX more or less already chose single precision for us  
> (for a good reason, I trust). Also it seems logical for a 3D API to be  
> faster when using float-s in stead of double-s because twice the data  
> can be pushed to the GPU with the same bandwidth / stored in VRAM. Isn't  
> this the same for OpenGL?
>
> Looking at the performance of double vs float on modern CPU-s should be  
> interesting though. Are double-s faster, slower or the same compared to  
> float-s on 32-bit and 64-bit CPU architecture? What about the CPU-s  
> people are actually using on average at the moment? (to sell games we  
> need to look at what is average on the market, not only to what is  
> top-notch :)
>
>         Cheers,
>                 Hugo
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Framework Studios: Hugo
If it 'only' means the FPU computes things differently internally, why does
Lua start crashing as soon as DX sets this flag to single-precision? :-)

Using double-precision with DX means the driver will be setting/resetting
the flag a lot (run-time) whilst this can be avoided simply by only using
single-precision (compile-time). You make a good point DirectX itself would
probably be fine on modern CPU-s if it wouldn't touch the flag at all,
however this is not our choice; it simply is as it is.

A performance hit of 3.5% just for having some FPU flag in the correct state
without even knowing if it actually really is used (SSE2) sounds like a very
good reason not to use double-precision to me.

    Cheers,
        Hugo

P.S.: design-flaws -> check out DirectShow. ;-)

----- Original Message -----
From: "Andras Balogh" <[hidden email]>
To: "Lua list" <[hidden email]>
Sent: Thursday, March 16, 2006 4:51 PM
Subject: Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX


> Having the FPU operate in full precision mode does not mean that you have
> to send doubles to the GPU. I'm using 32 bit floats for geometry too. This
> flag only means that internally, the FPU computes everything in
> single/double precision. Besides, this 580FPS is an ill test case, normal
> apps don't run at this rate. So it is probably limited by CPU, in which
> case, I think that setting and resetting the FPU registers in every API
> call probably hurts a bit (even then, 3.5% is nothing). If DirectX just
> didn't touch it, it would probably be fine. I don't think there's any
> speed difference internally. Modern FPUs do most instructions in one
> cycle.. Sure, if you do software transforms (god forbid rasterizing), then
> it _might_ give you an edge, but I just don't think it's worth the
> trouble. Besides, DirectX was designed by humans too, it's not without
> design flaws...
>
>
>
> andras
>
>
> On Thu, 16 Mar 2006 07:35:18 -0700, Framework Studios: Hugo
> <[hidden email]> wrote:
>
>> Well, one number I found on the decrease of Direct3D's speed with and
>> without the FPU preserve flag:
>>
>> http://discuss.microsoft.com/SCRIPTS/WA-MSD.EXE?A2=ind0504b&L=directxdev&D=1&P=3121
>>
>> with: 560 fps
>> without: 580 fps
>>
>> However I think it is a bit beside the point to 'prove' this with
>> numbers since DirectX more or less already chose single precision for us
>> (for a good reason, I trust). Also it seems logical for a 3D API to be
>> faster when using float-s in stead of double-s because twice the data
>> can be pushed to the GPU with the same bandwidth / stored in VRAM. Isn't
>> this the same for OpenGL?
>>
>> Looking at the performance of double vs float on modern CPU-s should be
>> interesting though. Are double-s faster, slower or the same compared to
>> float-s on 32-bit and 64-bit CPU architecture? What about the CPU-s
>> people are actually using on average at the moment? (to sell games we
>> need to look at what is average on the market, not only to what is
>> top-notch :)
>>
>>         Cheers,
>>                 Hugo

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

SevenThunders
In reply to this post by Framework Studios: Hugo
Well we've had SSE2 instructions since the pentium 4 was released in Nov. of 2000.  All of AMDs recent releases support it (though perhaps the FPU is faster on AMDs offerings I don't know).  Will your graphics application even run on these older processors?

I have to admit that I said the FPU was deprecated with tongue in cheek.  That is Intel's intention in their documentation.  Compiler support for SSE seems somewhat spotty, especially when optimization comes into play.  I've noticed that enabling the SSE estensions using microsofts supposedly optimizing compiler rarely improves performance for pure double floating point operations.  Hand coded libraries such as say numerical linear algebra (e.g. BLAS) have made good use of SSE however.  Also once you start using single precision, SSE may actually be worth it. Again in SSE you can do twice as many FLOPS per clock cycle, not just load twice as many floating point words.  So bottom line is, if you are concerned with performance, and you are going to use single precision, try to use SSE if possible.  This will avoid the need to do the FPU switch to single precision.
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Andras Balogh-4
In reply to this post by Framework Studios: Hugo
On Thu, 16 Mar 2006 09:27:41 -0700, Framework Studios: Hugo  
<[hidden email]> wrote:

> If it 'only' means the FPU computes things differently internally, why  
> does Lua start crashing as soon as DX sets this flag to  
> single-precision? :-)

I don't know why it crashes, and I don't want to start guessing here..  
Still, floating point bugs can cause FPU exceptions, so it's not  
impossible that this causes a crash. And as far as I know, this flag  
really only changes the FPU control word...

> A performance hit of 3.5% just for having some FPU flag in the correct  
> state without even knowing if it actually really is used (SSE2) sounds  
> like a very good reason not to use double-precision to me.

Don't forget that this 3.5% gain was measured in an ill formed test case.  
In real apps I doubt that you would see this much difference. Anyways,  
this is your call, if there's an easy fix, then good for you, but I would  
rather opt to just avoid all the trouble altogether. You never know when  
it's gonna pop up again in some form of mysterious bug...

jm2c,


andras
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Dave Dodge
In reply to this post by Framework Studios: Hugo
On Thu, Mar 16, 2006 at 03:35:18PM +0100, Framework Studios: Hugo wrote:
> Also it seems logical for a 3D API to be faster when using float-s
> in stead of double-s because twice the data can be pushed to the GPU
> with the same bandwidth / stored in VRAM. Isn't this the same for
> OpenGL?

There's about two dozen variations of the glVertex() function.  If you
want to submit your coordinates to OpenGL as single-precision rather
than double-precision, the API allows for it (likewise regular and
short integers).  Granted, whether the data actually reaches the GPU
without being converted to some other type internally depends on the
implementation.

I don't know enough about Direct3D to say whether it works similarly,
or if it instead just expects floats everywhere.

                                                  -Dave Dodge
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Framework Studios: Hugo
In reply to this post by Andras Balogh-4
>> If it 'only' means the FPU computes things differently internally, why
>> does Lua start crashing as soon as DX sets this flag to
>> single-precision? :-)
>
> I don't know why it crashes, and I don't want to start guessing here..

I ment it as a rhetorical question. ;) It crashes because it fails at least
on the number2int conversion trick (see first link below). The issue has
been discussed in this mailing list before:

http://lua-users.org/lists/lua-l/2005-10/msg00254.html
http://lua-users.org/lists/lua-l/2005-11/msg00044.html
http://lua-users.org/lists/lua-l/2005-10/msg00265.html
http://lua-users.org/lists/lua-l/2006-03/msg00320.html

> Don't forget that this 3.5% gain was measured in an ill formed test case.
> In real apps I doubt that you would see this much difference. Anyways,

I'm not saying that one test Google produced gives us the 3.5% as
scientifically correct, well-researched proof at all. It simply is yet
another indication DX can get slower when the FPU is in double-precision as
stated in the documentation that comes with the DX SDK.

In real game apps, a possible 3.5% performance hit which can easily be
avoided I would say definitely is worth the trouble. Not making the easy fix
optimizing for 'only' 3.5% at least in our team would raise an eyebrowe or
two. ;) (who knows, maybe it is 5%... or maybe 2%... maybe it depends on the
driver, another piece of software we don't control...)

> this is your call, if there's an easy fix, then good for you, but I would
> rather opt to just avoid all the trouble altogether. You never know when
> it's gonna pop up again in some form of mysterious bug...

The easy fix / avoiding the trouble altogether would of course be to define
Lua to use float in stead of double. ;-)

    Cheers,
            Hugo

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Framework Studios: Hugo
In reply to this post by SevenThunders
Thanks for the tip! I'll definitely have a look at BLAS. I'm using D3DX for
most math operations now, which is supposed to be optimised for MMX, SSE
etcetera.

Our applications work on Windows'2000 with DirectX9 (the latest DX9 SDK
doesn't run on Windows 2000, but DX9 itself does). So theoretically someone
with a PC from 2000 and an AMD 900MHz (not sure if I'm naming a CPU without
SSE here) would be able to run our applications.

The thing I don't understand is why it would be a bad thing to let DX switch
the (almost deprecated anyhow ;) FPU to single precision, which is exactly
what it 'wants'. It only means compiling Lua to use float-s, why would that
be a bad thing? :)

    Thanks again!
                    Hugo

----- Original Message -----
From: "SevenThunders" <[hidden email]>
To: <[hidden email]>
Sent: Thursday, March 16, 2006 10:54 PM
Subject: Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX


>
> Well we've had SSE2 instructions since the pentium 4 was released in Nov.
> of
> 2000.  All of AMDs recent releases support it (though perhaps the FPU is
> faster on AMDs offerings I don't know).  Will your graphics application
> even
> run on these older processors?
>
> I have to admit that I said the FPU was deprecated with tongue in cheek.
> That is Intel's intention in their documentation.  Compiler support for
> SSE
> seems somewhat spotty, especially when optimization comes into play.  I've
> noticed that enabling the SSE estensions using microsofts supposedly
> optimizing compiler rarely improves performance for pure double floating
> point operations.  Hand coded libraries such as say numerical linear
> algebra
> (e.g. BLAS) have made good use of SSE however.  Also once you start using
> single precision, SSE may actually be worth it. Again in SSE you can do
> twice as many FLOPS per clock cycle, not just load twice as many floating
> point words.  So bottom line is, if you are concerned with performance,
> and
> you are going to use single precision, try to use SSE if possible.  This
> will avoid the need to do the FPU switch to single precision.
> --
> View this message in context:
> http://www.nabble.com/ANN%3A-LuaJIT-1.1.0-t1273815.html#a3445696
> Sent from the Lua - General forum at Nabble.com.
>

Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Adam D. Moss
Framework Studios: Hugo wrote:
> The thing I don't understand is why it would be a bad thing to let DX
> switch the (almost deprecated anyhow ;) FPU to single precision, which
> is exactly what it 'wants'. It only means compiling Lua to use float-s,
> why would that be a bad thing? :)

I can't help thinking that this thread is going in circles (actually
approximately its third rotation so far just in this incarnation).
If you don't think that this question has been answered thoroughly
by now then I don't reckon there's really any mileage in rehashing
things yet again -- use float-y Lua and live in bliss. :)

--adam
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX

Framework Studios: Hugo
Amen. ;)

----- Original Message -----
From: "Adam D. Moss" <[hidden email]>
To: "Lua list" <[hidden email]>
Sent: Friday, March 17, 2006 11:33 AM
Subject: Re: ANN: LuaJIT 1.1.0 / Re: Lua x DirectX


> Framework Studios: Hugo wrote:
>> The thing I don't understand is why it would be a bad thing to let DX
>> switch the (almost deprecated anyhow ;) FPU to single precision, which
>> is exactly what it 'wants'. It only means compiling Lua to use float-s,
>> why would that be a bad thing? :)
>
> I can't help thinking that this thread is going in circles (actually
> approximately its third rotation so far just in this incarnation).
> If you don't think that this question has been answered thoroughly
> by now then I don't reckon there's really any mileage in rehashing
> things yet again -- use float-y Lua and live in bliss. :)
>
> --adam
Reply | Threaded
Open this post in threaded view
|

Re: ANN: LuaJIT 1.1.0

Zachary P. Landau-4
In reply to this post by Mike Pall-3-2
> Narrowing numbers to integers with help from the optimizer is one
> way to go. Dual number support (int + double) would have benefits
> for embedded CPUs (lacking FPU hardware). But it's tricky to get
> this fast for the interpreter and even more so for compiled code.
> I guess pure integer support is too limiting for most embedded
> projects (but would be really fast). [I need feedback on this
> topic from people who use Lua on embedded devices.]

Mike,

I meant to respond to this a while ago and it slipped my mind.

So far I have used Lua for two embedded devices and for both of
them I've used integers exclusively.  Using floating point is way
too slow.  At least for the applications I've usd Lua for, integers
did just fine.  I can think of embedded applications where integer
only support would be a problem, but I think there are enough
applications that don't need it to justify a version of LuaJIT without it.

Just for your reference, one of the devices I used Lua with ran on a
Coldfire and the other on ARM, so LuaJIT running on either of those
platforms would be pretty interesting.

--
Zachary P. Landau <[hidden email]>
12