Lua string library partially reimplemented in Lua

classic Classic list List threaded Threaded
30 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Lua string library partially reimplemented in Lua

David Manura
Here is the Lua string library (lstrlib.c) partially reimplemented in
Lua (mainly, string.find and string.match functions):

  http://lua-users.org/wiki/StringLibraryInLua

Reimplementing these in Lua provides a number of generalizations and
possible applications:

  * The pattern matching library can be extended in Lua

  * The pattern matching can match not just strings but more generally
arrays of chars and arrays of arbitrary values, including arrays
backed by metamethods.  The filearray.lua example included in the
appendix allows a large file to be accessed via an array interface,
which can then be matched by these string.find/string.match functions,
without ever loading the entire file into memory at once:

  local S = require "stringinlua"  -- reimplemented pattern matching functions
  local FA = require "filearray"   -- table array interface to files
  local SA = require "stringarray" -- table array interface to simple strings

  -- match text in file manual.html (internally, only a 1K is loaded at a time)
  assert(S.match(assert(FA 'manual.html'), SA'block ::= (%w+)') == 'chunk')

Another example, of using this to match arrays that are not char
strings is this:

  local TA = require "tablearray" -- table array interface to tables

  -- match value false followed by one or more occurrences of value 'test'
  assert(S.match(TA{2,false,"test","test",2}, TA{false,"test", '+'})
            == TA{false,"test", "test"})

Unicode matching may be another application (i.e. arrays of Unicode characters).

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Paul Moore-3
2008/9/17 David Manura <[hidden email]>:
> Here is the Lua string library (lstrlib.c) partially reimplemented in
> Lua (mainly, string.find and string.match functions):
>
>  http://lua-users.org/wiki/StringLibraryInLua

Just out of curiosity, how does performance compare to the C versions?
(I'm assuming it's slower, but how much penalty is there?)
Paul.

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Fabien-3
In reply to this post by David Manura
Great! :)

Reimplementing these in Lua provides a number of generalizations and
possible applications:

Don't forget easy porting of the string lib to non-C versions of Lua (I'm thinking about Kahlua here).
 
Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Kristofer Karlsson
Indeed, Kahlua is currently lacking a full string library
implementation, mostly for formatting and pattern matching,
so it definitely sounds useful.

On Wed, Sep 17, 2008 at 10:03 AM, Fabien <[hidden email]> wrote:
> Great! :)
>>
>> Reimplementing these in Lua provides a number of generalizations and
>> possible applications:
>
> Don't forget easy porting of the string lib to non-C versions of Lua (I'm
> thinking about Kahlua here).
>

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Mark Meijer-2
In reply to this post by David Manura
Sounds great!

2008/9/17 David Manura <[hidden email]>:
> Here is the Lua string library (lstrlib.c) partially reimplemented in
> Lua (mainly, string.find and string.match functions):
>
>  http://lua-users.org/wiki/StringLibraryInLua
>
> Reimplementing these in Lua provides a number of generalizations and
> possible applications:
>
>  * The pattern matching library can be extended in Lua
>
>  * The pattern matching can match not just strings but more generally
> arrays of chars and arrays of arbitrary values, including arrays
> backed by metamethods.  The filearray.lua example included in the
> appendix allows a large file to be accessed via an array interface,
> which can then be matched by these string.find/string.match functions,
> without ever loading the entire file into memory at once:
>
>  local S = require "stringinlua"  -- reimplemented pattern matching functions
>  local FA = require "filearray"   -- table array interface to files
>  local SA = require "stringarray" -- table array interface to simple strings
>
>  -- match text in file manual.html (internally, only a 1K is loaded at a time)
>  assert(S.match(assert(FA 'manual.html'), SA'block ::= (%w+)') == 'chunk')
>
> Another example, of using this to match arrays that are not char
> strings is this:
>
>  local TA = require "tablearray" -- table array interface to tables
>
>  -- match value false followed by one or more occurrences of value 'test'
>  assert(S.match(TA{2,false,"test","test",2}, TA{false,"test", '+'})
>            == TA{false,"test", "test"})
>
> Unicode matching may be another application (i.e. arrays of Unicode characters).
>

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

David Manura
In reply to this post by Paul Moore-3
On Wed, Sep 17, 2008 at 4:02 AM, Paul Moore wrote:
> Just out of curiosity, how does performance compare to the C versions?
> (I'm assuming it's slower, but how much penalty is there?)

A test of running pm.lua from the Lua 5.1 test suite (now on the wiki
page) shows about 500x slower.  PepperfishProfiler shows that most of
the time is spent in string indexing, as expected.

Almost no effort was made in optimizing, however; rather, it was a
straight port of lstrlib.c, and it wraps strings in a table interface
so as to behave like null-terminated C char arrays.  It's interesting
how the Lua and C code can be made structurally similar to a high
degree--even the C gotos in "match" are implemented in-place in terms
of Lua tail calls, pointers are transformed to indexes into arrays,
and I initially used zero-indexed arrays in Lua.

Note that not all of the string library can be reimplemented in Lua.
This notably includes string.sub, which I use to index the i-th
character of a string.  It might be more efficient to use string.byte
(which also can't be reimplemented in Lua).

On another point, I think it's inconsistent that Lua's VM has an
opcode to obtain the length in bytes of a string (i.e. # s), but
there's no opcode to obtain the i-th character or byte of a string
(e.g. s[i]).  This is because a check for string type is hardcoded
into the "len" event but not the "index" event[1].  Indexing a string
is a primitive operation on a primitive data type, so isn't that
worthy of an opcode rather than going through the standard library?
The presence of such an opcode might significantly improve the speed
of this and other things, as well as allow string.sub/string.byte
library functions to be implemented in Lua.

[1] http://www.lua.org/manual/5.1/manual.html#2.8

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

David Manura
On Thu, Sep 18, 2008 at 12:21 AM, David Manura wrote:
> On another point, I think it's inconsistent that Lua's VM has an
> opcode to obtain the length in bytes of a string (i.e. # s), but
> there's no opcode to obtain the i-th character or byte of a string
> (e.g. s[i]).  This is because a check for string type is hardcoded
> into the "len" event but not the "index" event[1].  Indexing a string
> is a primitive operation on a primitive data type, so isn't that
> worthy of an opcode rather than going through the standard library?
> The presence of such an opcode might significantly improve the speed
> of this and other things, as well as allow string.sub/string.byte
> library functions to be implemented in Lua.

Consider the following patch to lvm.c:luaV_gettable:

--- lua-5.1.4/src/lvm.c 2007-12-28 10:32:23.000000000 -0500
+++ lua-5.1.4-patched/src/lvm.c 2008-09-18 01:02:28.656250000 -0400
@@ -119,6 +119,17 @@
       }
       /* else will try the tag method */
     }
+    else if (ttisstring(t) && ttisnumber(key)) {
+      const int pos = nvalue(key);
+      if (pos < 1 || pos > tsvalue(t)->len) {
+        setnilvalue(val);
+        return;
+      }
+      /* unused: setnvalue(val, getstr(rawtsvalue(t))[pos]); */
+      const char c = getstr(rawtsvalue(t))[pos-1];
+      setsvalue2s(L, val, luaS_newlstr(L, &c, 1));
+      return;
+    }
     else if (ttisnil(tm = luaT_gettmbyobj(L, t, TM_INDEX)))
       luaG_typeerror(L, t, "index");
     if (ttisfunction(tm)) {

Here's a performance test of unpatched Lua v.s. patched Lua on three
styles of indexing strings:

  local s = ("abcdefghijklmnopqrstuvwxyz"):rep(100)
  local sub = string.sub

  for j=1,10000 do
  for i=1,#s do
                             -- unpatched (s)  patched (s)
  --  local c = s:sub(i,i)   --     9.3           9.3
  --  local c = sub(s, i,i)  --     8.7           8.8
  --  local c = s[i]         --     N/A           3.6
  end
  end

So, the the patch reduces the runtime by 60% on this trivial example.
On the more realistic stringinlua.lua, the speed gains are much less
though still measurable (given room for optimization in other areas).

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Alex Davies
David Manura wrote:
On another point, I think it's inconsistent that Lua's VM has an
opcode to obtain the length in bytes of a string (i.e. # s), but
there's no opcode to obtain the i-th character or byte of a string
(e.g. s[i]).

I thought the same originally, but unfortunately I believe it's the best as is. Otherwise it seems quite inconsistent that they can be indexed read, but not written:

local a = "hello"
assert(a[1] == "h")
a[1] = "b"

Besides, any jiter should inline the call. (and luajit 2.x may even do a good job of avoiding the :str lookup).

(Unrelated though, but something I'd like to see in jit 2.x would be things such as "if str:sub(1, 2) == "__" then" remove the string creation and just compare the first two bytes - just because it's a fairly common operation in my code at least).

- Alex
Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Roberto Ierusalimschy
> (Unrelated though, but something I'd like to see in jit 2.x would be 
> things such as "if str:sub(1, 2) == "__" then" remove the string creation 
> and just compare the first two bytes - just because it's a fairly common 
> operation in my code at least).

Maybe this should deserve a proper function in strlib.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Mike Pall-89
In reply to this post by Alex Davies
Alex Davies wrote:
> Besides, any jiter should inline the call. (and luajit 2.x may even do a 
> good job of avoiding the :str lookup).

Yep. There is no effective performance difference between a
builtin operator and a call to a library function. The compiler
needs to "know" what the library function does, of course.

Case in point: I've reimplemented MD5 in pure Lua using the bit
library functions provided in LJ2. This is a twisted maze of
function applications:

  local function tr_g(a, b, c, d, x, s)
    return rol(bxor(c, band(d, bxor(b, c))) + a + x, s) + b
  end
  [...]
  c = tr_g(c, d, a, b, x[12] + 0x265e5a51, 14)
  b = tr_g(b, c, d, a, x[ 1] + 0xe9b6c7aa, 20)
  [...]

But the resulting code is almost on par with the code a C compiler
generates. Excerpt from round 19/20:

  [...]
  mov esi, [esp+0x34]
  add esi, 0x265e5a51
  add edx, eax
  add esi, edx
  rol esi, 0x0e
  lea edx, [ebx+esi]
  mov eax, edx
  xor eax, ebx
  mov esi, ecx
  and esi, eax
  mov eax, ebx
  xor eax, esi
  mov esi, [esp+0x60]
  add esi, 0xe9b6c7aa
  [...]

(There's still one missed opportunity for a lea.)

> (Unrelated though, but something I'd like to see in jit 2.x would be things 
> such as "if str:sub(1, 2) == "__" then" remove the string creation and just 
> compare the first two bytes - just because it's a fairly common operation 
> in my code at least).

Here's how the generated IR looks like when displayed in tree-form
(the IR itself is linear). I've left out the str:sub dispatch and
the guard that ensures str has at least 2 characters:

  SPTR str 0  KINT 2
      \      /
        SNEW     KSTR "__"
	   \    /
	     EQ  -> exit

  SPTR str ofs  Return a pointer to the string data + ofs of a string object.
  SNEW ptr len  Creates a new string object from the pointer and the length.

Assuming the SNEW result does not escape, the EQ could be replaced
by a 2-byte comparison against the original string contents. This
leaves the SNEW dead and avoids the creation of the temp. string.
But in case the temp. string must be created anyway, it's usually
cheaper to compare the string pointers.

Alas, general escape analysis is not trivial. There are a few
shortcuts though, like checking whether the SNEW is unused when
the EQ is encoded (machine code generation is backwards) and
whether there are no instructions inbetween that use it. But quite
often the temp. string _does_ escape through one of the guard
exits. This requires sinking of the SNEW, which opens up another
can of worms. Oh well ... later. :-)

BTW: Python has s.startswith/s.endswith to avoid the temp. string.

--Mike

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Jerome Vuarand
In reply to this post by Roberto Ierusalimschy
2008/9/18 Roberto Ierusalimschy <[hidden email]>:
>> (Unrelated though, but something I'd like to see in jit 2.x would be
>> things such as "if str:sub(1, 2) == "__" then" remove the string creation
>> and just compare the first two bytes - just because it's a fairly common
>> operation in my code at least).
>
> Maybe this should deserve a proper function in strlib.

Isn't str:match("^__") already doing that ?

The pattern str:sub(42,43)=="__" could be generalized with a future
numbered repetition, for example the hypothetical
str:match("^.{41}__").

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Alex Davies
In reply to this post by Mike Pall-89
Mike Pall wrote:
 [...]
 lea edx, [ebx+esi]
 mov eax, edx
 xor eax, ebx
 mov esi, ecx
 [...]

I've been trying to figure out how jit 2.x manages to reduce numbers in this instance to integers. I can see that the band/bxor could guarantee that the numbers fit in the range, but the additions? The table access? Quite amazed. Is this some kind of combination of static range analysis with overflow exceptions? I really have no idea.

And pleased to see I'm not the only one that's been curious about speeding up lua sub string equality testing.

- Alex
Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Roberto Ierusalimschy
In reply to this post by Jerome Vuarand
> 2008/9/18 Roberto Ierusalimschy <[hidden email]>:
> >> (Unrelated though, but something I'd like to see in jit 2.x would be
> >> things such as "if str:sub(1, 2) == "__" then" remove the string creation
> >> and just compare the first two bytes - just because it's a fairly common
> >> operation in my code at least).
> >
> > Maybe this should deserve a proper function in strlib.
> 
> Isn't str:match("^__") already doing that ?

Not when the second string has magic characters.

-- Roberto

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

David Manura
On Thu, Sep 18, 2008 at 12:52 PM, Roberto Ierusalimschy wrote:
>> Isn't str:match("^__") already doing that ?
> Not when the second string has magic characters.

Plus, I understand the original motivation was to avoid temporary
string creation.  On success, str:match("^__") returns the string "__"
(which is different from "^__").  str:match("^()__") is not much of an
improvement.

~~~

Something related that bothered me was the overloading of the
string.find function with the "plain" Boolean parameter:

  string.find (s, pattern [, init [, plain]])

This is stuffing two functions into one:

  (1) searching for a "pattern", returning start and end positions and captures

  (2) searching for a "simple string", returning start and end
positions (where the end position is redundant since the string length
is known).

The second is conceptually simpler yet requires more command arguments
to disable the default pattern behavior.

Moreover, string.match and string.find are essentially one function
(lstrlib.c:str_find_aux) split into two.

(Note also the related proposal to add a "plain" argument to the
string.gsub function[1].)

~~~

So, how about these string functions for Lua 5.2?

(1) string.match (s, pattern [, init ])

This is unchanged from Lua 5.1.  It searches for a pattern in the
string.  Positions can be returned via "()" in the pattern,
eliminating the need for string.find:

    string.match (s, "pattern", init)
    string.match (s, "()pattern()", init)

I know of one potential limitation[2] of this compared to string.find,
but that could be addressed by some extension to the () pattern
matching syntax if desired.

(2) string.smatch (s, s2, [, init, [, final ] ])

This searches for the string s2 in the string s, returning the
position.  The search is optionally bounded in the range [init,
final].  Set init == final to match at a specific position (it's
sort-of like string.sub).  Perhaps it would be useful to add the
optional "final" parameter to the string.match function too.

(3) Deprecate string.find.

[1] http://lua-users.org/lists/lua-l/2005-09/msg00122.html
[2] http://lua-users.org/lists/lua-l/2005-09/msg00350.html

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

David Manura
On Thu, Sep 18, 2008 at 9:44 PM, David Manura wrote:
> (2) string.smatch (s, s2, [, init, [, final ] ])

And maybe

  string.smatch (s, s2, final, initial)

for a reverse search.

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

David Jones-2
In reply to this post by David Manura

On 19 Sep 2008, at 02:44, David Manura wrote:

So, how about these string functions for Lua 5.2?

(2) string.smatch (s, s2, [, init, [, final ] ])

This searches for the string s2 in the string s, returning the
position.  The search is optionally bounded in the range [init,
final].  Set init == final to match at a specific position (it's
sort-of like string.sub).  Perhaps it would be useful to add the
optional "final" parameter to the string.match function too.

This operation, for better or worse, is usually called "index", as in "string.index".

I approve of your split.

drj

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Mike Pall-89
In reply to this post by Alex Davies
Alex Davies wrote:
> I've been trying to figure out how jit 2.x manages to reduce numbers in 
> this instance to integers. I can see that the band/bxor could guarantee 
> that the numbers fit in the range [...]

Since Lua doesn't have a standard bit manipulation library (yet),
I took the liberty to define its semantics:

  The base numeric type used by LuaJIT is a double-precision
  floating-point number. But bit operations need to work on
  fixed-precision integral numbers. The following rules are
  intended to give a precise and useful definition (for the
  programmer), yet give the implementation (interpreter and
  compiler) the maximum freedom to apply optimizations:

  All bit operations take one or more numeric arguments and produce
  one numeric result. Coercion from strings to numbers is enabled
  by default, but is deprecated.

  Input arguments to bit operations are reduced to a 32 bit integer
  by taking their least-significant 32 bits. Numbers outside of the
  range +-2^51, +-Inf or NaN give undefined results. Non-integral
  numbers are truncated or rounded in an implementation-defined
  way. Undefined or implementation-defined behaviour means that the
  results could differ between versions or even between interpreted
  and compiled code. It's strongly advised to avoid these cases.

  The result of a bit operation is always a signed 32 bit integral
  number.

  This way it's possible to pass both signed and unsigned 32 bit
  numbers (but the result is always signed) and get wrap-around
  semantics for integer arithmetics. Examples:

    bit.tobit(1234)                    --> 1234
    bit.tobit(-1234)                   --> -1234
    bit.tobit(0xffffffff)              --> -1
    bit.tobit(0x87654321)              --> -2023406815
    bit.tobit(2^40+1234)               --> 1234
    bit.tobit(2^40-1234)               --> -1234
    bit.tobit(0x87654321 + 0x87654321) --> 248153666
    bit.band(1234, -1234)              --> 2
    bit.band(2^40+1234, 2^40-1234)     --> 2

  The use of bit.tobit is for illustration only -- it's usually
  redundant, because all operations reduce all of their inputs
  anyway. But you may want to use it for the final reduction steps
  in crypto algorithms which rely heavily on wrap-around semantics
  for integer addition or subtraction.

> but the additions?

The trace recorder first emits FP ADDs for an addition of course.
Input operands are converted to doubles first with TONUM. This is
to preserve the full FP precision and range.

Bit operations always "integerize" their operands. This can be as
simple as skipping a TONUM or inserting a TOBIT conversion. But it
also recognizes FP ADD/SUB and emits an INT ADD/SUB instead. This
effectively backpropagates and/or eliminates the (semi-expensive)
TOBIT as far as possible.

Together with CSE you get pure integer code for bit-intensive
workloads like crypto algorithms. For code like bxor(a+b, c-d) the
FP ADD/SUB are left as dead and are never encoded.

> The table access?

Well, the indexes here are constant integers. In the general case a
similar backpropagation algorithm is used. E.g. the FP ADD from a
t[i+100] is replaced with an INT ADDOV (checking for overflow).
This is only useful if 'i' is already an integer. That's why the
trace recorder does some ad-hoc range analysis to detect integer
induction variables (and avoid overflow checks for the increments).

The result of a table access has to be type-checked and converted
to an integer of course (except if store-to-load forwarding works).
I could unroll the loop to unpack the string into 32 bit chunks to
avoid that. But that awaits the planned integration of a
struct/lpack-like library (not right now).

You don't see an instance of the TOBIT conversion in the machine
code excerpt because it's done on the first load and all further
conversions are CSEd. You only get to see some loads from stack
slots, but these are just restores for spilled operands.

--Mike

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Alex Davies
Mike Pall wrote:
Since Lua doesn't have a standard bit manipulation library (yet),
I took the liberty to define its semantics:

Much needed addition. And I like the rules, can see how they can provide for efficient optimizations, but may I ask why it's necessary to make non-integral numbers behave in an implementation defined behaviour on modern processors? (Is it to allow for more algebraic simplifications?)

I see now of course that the mov esi, [esp+0x60]s are stack loads, when I first glanced at it I thought they may have been the x[12]/x[1] which looked like table loads (missing the esp completely - it was late)

In any case looks like a great addition, much better then userdata bitfields or similar.

- Alex
Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Mike Pall-89
Alex Davies wrote:
> Mike Pall wrote:
>> Since Lua doesn't have a standard bit manipulation library (yet),
>> I took the liberty to define its semantics:
>
> Much needed addition. And I like the rules, can see how they can provide 
> for efficient optimizations, but may I ask why it's necessary to make 
> non-integral numbers behave in an implementation defined behaviour on 
> modern processors? (Is it to allow for more algebraic simplifications?)

Well, truncation would make the most sense. But this is not the
default rounding mode, and flipping it all the time can be quite
expensive.

There's another difficulty: the double -> 32 bit int conversions
return 0x80000000 for values out of range and cannot be used. But
the conversion to 64 bit ints is only available for x87 and not
for SSE2 (except in x64 mode).

So I have to use the 2^52 + 2^51 biasing trick for SSE2. But
changing the SSE rounding mode is very expensive. And roundsd
(with selectable rounding modes) is only available starting with
SSE4.1 (45nm Core2). Ah, the joys of the x86 instruction set. :-|

Also thinking ahead about implementations for non-x86 machines,
it's probably best to leave the behaviour of bit operations for
non-integral numbers implementation-defined.

BTW: the JavaScript guys now have a big problem because the
official spec defined some very rigid rules for conversions to
integers. Trying to conform _and_ make this fast proves to be a
challenge.

--Mike

Reply | Threaded
Open this post in threaded view
|

Re: Lua string library partially reimplemented in Lua

Mark Meijer-2
This may seem like a weird question... but is it actually necessary
for a variable used as a field of bits, to have a valid representation
as a (lua)number, and to be able to convert between the two? In other
words, what exactly justifies treating a bit field as a number, or a
number as a bit field? Especially if the value of a bit field (when
interpreted as an integer datatype - which could be considered an
implicit cast) has a different representation from the same value as a
lua number (i.e. usually a double).

Perhaps bit fields should be treated more as an opaque "type", only
allowing operations to set and clear specific bits by means of the
usual bitwise operators. This may be limiting in some ways, but it
probably also means much simpler semantics, and no unnecessary
conversion issues.

I'm not saying it should be this way or that way, it's just a thought.

Cheers


2008/9/20 Mike Pall <[hidden email]>:
> Alex Davies wrote:
>> Mike Pall wrote:
>>> Since Lua doesn't have a standard bit manipulation library (yet),
>>> I took the liberty to define its semantics:
>>
>> Much needed addition. And I like the rules, can see how they can provide
>> for efficient optimizations, but may I ask why it's necessary to make
>> non-integral numbers behave in an implementation defined behaviour on
>> modern processors? (Is it to allow for more algebraic simplifications?)
>
> Well, truncation would make the most sense. But this is not the
> default rounding mode, and flipping it all the time can be quite
> expensive.
>
> There's another difficulty: the double -> 32 bit int conversions
> return 0x80000000 for values out of range and cannot be used. But
> the conversion to 64 bit ints is only available for x87 and not
> for SSE2 (except in x64 mode).
>
> So I have to use the 2^52 + 2^51 biasing trick for SSE2. But
> changing the SSE rounding mode is very expensive. And roundsd
> (with selectable rounding modes) is only available starting with
> SSE4.1 (45nm Core2). Ah, the joys of the x86 instruction set. :-|
>
> Also thinking ahead about implementations for non-x86 machines,
> it's probably best to leave the behaviour of bit operations for
> non-integral numbers implementation-defined.
>
> BTW: the JavaScript guys now have a big problem because the
> official spec defined some very rigid rules for conversions to
> integers. Trying to conform _and_ make this fast proves to be a
> challenge.
>
> --Mike
>

12