LuaJIT FFI: Semantics of undefined numeric conversions (e.g. huge double to int)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

LuaJIT FFI: Semantics of undefined numeric conversions (e.g. huge double to int)

Philipp Kutin
Hi,

I'm one of the developers/maintainers of a certain (open source) game
port and one of our upcoming ideas is to replace the existing scripting
language with something based on Lua; more specifically LuaJIT because
of its efficiency and the fact that we mostly target x86/x64 anyway.

 From reading the documentation, it looks like LuaJIT's FFI is the
perfect tool for making engine and game functions/structures accessible
from the scripting language, where of course only "safe" functions
should be made visible directly (for example, those taking only scalar
numbers). But as far as I can see, the FFI docs don't specify what
happens for numeric conversions that are undefined per C standard, such
as downcasting a number greater than INT_MAX to an int.

A few experiments with the following C function

     void printint(int x) { printf("%d\n",x); }

give these results from LuaJIT for me:

     printint(2147483648+1)  --> -2147483648
     printint(4*2147483648)  --> -2147483648
     printint(-2147483648-1)  --> -2147483648
     printint(0/0)  --> -2147483648
     printint(1/0)  --> -2147483648

So, in every "interesting" case, the out-of-range double is apparently
converted to INT_MIN, but can I rely on this or is this a coincidental
side-effect of the implementation?

The closest thing to an answer I could find is the "Conversion between C
types" section in the FFI documentation:
http://luajit.org/ext_ffi_semantics.html#convert_between ,
but again I'm not sure what to make of the undefined cases here. For
example, the double to int conversion is listed as

     double -->^trunc int,

but what is "truncation" supposed to mean in this particular context? It
can't conceptually mean "round the double to an infinite-precision
integer and take its 32 lowest-order bits", since it would be undefined
for inf/nan and it's inconsistent with my experimental findings.


Greetings,
Philipp

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT FFI: Semantics of undefined numeric conversions (e.g. huge double to int)

Mike Pall-31
Philipp Kutin wrote:
> From reading the documentation, it looks like LuaJIT's FFI is the
> perfect tool for making engine and game functions/structures
> accessible from the scripting language, where of course only "safe"
> functions should be made visible directly (for example, those taking
> only scalar numbers).

Umm, why? If the scripts originate from the same trust domain,
there's no point in adding any restrictions. You can blow up
memory or cause a hang (not caught by hooks) just with the
standard Lua library. And if the scripts are untrusted, better
watch out -- a truly perfect sandbox for Lua is hard.

> But as far as I can see, the FFI docs don't specify what happens
> for numeric conversions that are undefined per C standard, such
> as downcasting a number greater than INT_MAX to an int.

It relies on C and/or machine code to do the conversion, so it has
the same amount of 'undefinedness'.

Actually, I can guarantee that it doesn't cause an exception. But
the result for undefined inputs is definitely platform-dependent.
It could be anything, e.g. 0, INT_MIN, INT_MAX or a random result,
depending on the input.

> So, in every "interesting" case, the out-of-range double is
> apparently converted to INT_MIN, but can I rely on this or is this a
> coincidental side-effect of the implementation?

No, you cannot rely on this. It just so happens to be the standard
behavior on x86 and x64. But I might add optimizations to the JIT
compiler that explicitly rely on the inputs being defined, which
may cause other results for out-of-range inputs.

> For example, the double to int conversion is listed as
>
>     double -->^trunc int,
>
> but what is "truncation" supposed to mean in this particular
> context?

Truncation means rounding towards zero: 1.5 -> 1, -1.5 -> -1

This doesn't say anything about out-of-range inputs. But at the
top of the semantics page it says that it follows the C standard
wherever possible, so out-of-range inputs are undefined.

--Mike

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT FFI: Semantics of undefined numeric conversions (e.g. huge double to int)

Philipp Kutin

On 09/19/2011 12:39 AM, Mike Pall wrote:
> Umm, why? If the scripts originate from the same trust domain,
> there's no point in adding any restrictions. You can blow up
> memory or cause a hang (not caught by hooks) just with the
> standard Lua library. And if the scripts are untrusted, better
> watch out -- a truly perfect sandbox for Lua is hard.
The scripts are coded by gamers who wish to extend the functionality of
the game, so it's not at the same trust level as the engine/game C code.
The sandbox isn't so much supposed to be hardened against malicious
attacks, but operations visible to the script coder should definitely be
consistent across platforms and versions.

> It relies on C and/or machine code to do the conversion, so it has
> the same amount of 'undefinedness'.
>
> Actually, I can guarantee that it doesn't cause an exception. But
> the result for undefined inputs is definitely platform-dependent.
> It could be anything, e.g. 0, INT_MIN, INT_MAX or a random result,
> depending on the input.
OK, that's good to know, even though it means that pretty much all
functions will have to be wrapped.

> Truncation means rounding towards zero: 1.5 ->  1, -1.5 ->  -1
>
> This doesn't say anything about out-of-range inputs. But at the
> top of the semantics page it says that it follows the C standard
> wherever possible, so out-of-range inputs are undefined.
Yeah, I figured that after my post, but still, the documentation doesn't
explicitly mention that carrying out C-undefined conversions gives
undefined results. In fact, it suggests otherwise if you search for
"undefined":

[quote]
Indexing a pointer/array: (...) An error is raised if the element size
is undefined or a write access to a constant element is attempted.

Pointer arithmetic: (...) An error is raised if the element size is
undefined.

64 bit integer arithmetic: (...) The undefined cases for the division,
modulo and power operators return 2LL ^ 63 or 2ULL ^ 63.
[/quote]

Especially the last might suggest to a reader that all invalid
conversions are somehow trapped and produce a special constant, and
phrasings like "These conversion rules are *more or less* the same as
the standard C conversion rules" add to the uncertainty.

So, can something like the following be added to the spec?:

"
... it closely follows the C language semantics, wherever possible.
Specifically, <this and that> should be considered undefined behaviour
as per C standard.
"

It would give a lot more comfort to have this explicitly spelled out to
bone-headed people like me :)


--Philipp

Reply | Threaded
Open this post in threaded view
|

Re: LuaJIT FFI: Semantics of undefined numeric conversions (e.g. huge double to int)

Mike Pall-31
Philipp Kutin wrote:
> OK, that's good to know, even though it means that pretty much all
> functions will have to be wrapped.

Not to stop your endeavors, but you realize that most of Lua's
standard library has exactly the same issues? Do you really want
to wrap e.g. every call to string.sub() just because both rounding
and overflow for its arguments is undefined and platform-dependent?

Dumbing down an API has never been a good idea. Users that don't
even know what 'undefined' means will find clever ways to
circumvent all of your brittle safety nets just by accident.
However all of your users, even the experts, will suffer from
pointless wrappers.

I suggest to educate your users and/or stop worrying too much. :-)

> So, can something like the following be added to the spec?:
>
> "
> ... it closely follows the C language semantics, wherever possible.
> Specifically, <this and that> should be considered undefined
> behaviour as per C standard.
> "

Sorry, no. Because <this and that> would amount to paraphrasing a
substantial amount of the C standards. I guess that would be more
text than the whole page has right now.

--Mike