problem with string.format %d and very large integers

classic Classic list List threaded Threaded
61 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

problem with string.format %d and very large integers

Norman Ramsey
I'm not quite sure if this problem counts as a bug or not, but I
tracked a real bug in a real application to unexpected behavior of
string.format, so I'm reporting it to the list.

I expect that if an integer n has an exact representation as a Lua
number, then the following laws hold:

   tonumber(tostring(n)) == n

   tonumber(string.format('%d', n)) == n

The first law holds, but if n is sufficiently large, the second does not:

    : nr@labrador 12309 ; lua
    Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
    > g = 1000 * 1000 * 1000
    > big = 2.5 * g
    > =big
    2500000000
    > =string.format('%d', big)
    -2147483648
    > bad = tonumber(string.format('%d', big))
    > =bad
    -2147483648
    > =tonumber(tostring(big))
    2500000000
    >

The documentation for string.format says that "the format string
follows the same rules as the printf family of standard C functions."
I'm sure this is a very large loophole.   But I don't think the
current behavior of '%d' serves any useful purpose.  Nor do I think it
is good for Lua's users to have to know about certain detailed
situations in which a Lua number is momentarily coerced to a 32-bit
integer.  It would be better if numbers behaved like IEEE
floating-point numbers in all situations.

I don't know of a simple, portable way to solve the problem in pure
ANSI C.  Using int64_t would solve the problem, but this type is
guaranteed to be present only in C99.  There is, however, an ANSI C
solution for the common case in which the "precision" is left
implicit: Lua could implement %d by using '%.0f' internally.
And in any case, I would prefer to see string.format call lua_error()
rather than silently produce wrong answers.


Norman Ramsey

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Luiz Henrique de Figueiredo
>     > =big
>     2500000000
>     > =string.format('%d', big)
>     -2147483648

This works fine in 5.2:

Lua 5.2.0  Copyright (C) 1994-2011 Lua.org, PUC-Rio
> big=2500000000
> =string.format('%d', big)
2500000000

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Florian Weimer
In reply to this post by Norman Ramsey
* Norman Ramsey:

> The first law holds, but if n is sufficiently large, the second does not:
>
>     : nr@labrador 12309 ; lua
>     Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio

Whay kind of architecture do you use?  It works for me as expcted with
Lua 5.1.4 on Debian squeeze amd64.

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Xavier Wang
2011/7/28 Florian Weimer <[hidden email]>:

> * Norman Ramsey:
>
>> The first law holds, but if n is sufficiently large, the second does not:
>>
>>     : nr@labrador 12309 ; lua
>>     Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
>
> Whay kind of architecture do you use?  It works for me as expcted with
> Lua 5.1.4 on Debian squeeze amd64.
>
>

Obviously he is on a 32x machine :)  -2147483648 == -2^32

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Florian Weimer
* Xavier Wang:

> 2011/7/28 Florian Weimer <[hidden email]>:
>> * Norman Ramsey:
>>
>>> The first law holds, but if n is sufficiently large, the second does not:
>>>
>>>     : nr@labrador 12309 ; lua
>>>     Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
>>
>> Whay kind of architecture do you use?  It works for me as expcted with
>> Lua 5.1.4 on Debian squeeze amd64.

> Obviously he is on a 32x machine :)  -2147483648 == -2^32

Not so obvious.  %d expects an int argument, which is 32 bit on amd64,
too.  Perhaps the C varargs calling convention is fixing things up on
amd64, but I'm not sure.

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Lorenzo Donati-2
On 28/07/2011 11.27, Florian Weimer wrote:

> * Xavier Wang:
>
>> 2011/7/28 Florian Weimer<[hidden email]>:
>>> * Norman Ramsey:
>>>
>>>> The first law holds, but if n is sufficiently large, the second does not:
>>>>
>>>>      : nr@labrador 12309 ; lua
>>>>      Lua 5.1.4  Copyright (C) 1994-2008 Lua.org, PUC-Rio
>>>
>>> Whay kind of architecture do you use?  It works for me as expcted with
>>> Lua 5.1.4 on Debian squeeze amd64.
>
>> Obviously he is on a 32x machine :)  -2147483648 == -2^32
>
> Not so obvious.  %d expects an int argument, which is 32 bit on amd64,
> too.  Perhaps the C varargs calling convention is fixing things up on
> amd64, but I'm not sure.
>
>
>
It doesn't work for me:

-- Lua 5.1.4 on Windows XP (SP2)
-- CPU: Intel Mobile Core 2 Duo T7500
x = 2^31
print( ("%.0f     %d"):format(x,x) )
--> 2147483648     -2147483648



-- Lorenzo

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Stuart P. Bentley
In reply to this post by Norman Ramsey
On Wed, 27 Jul 2011 16:57:06 -0400, Norman Ramsey <[hidden email]> wrote:

> There is, however, an ANSI C
> solution for the common case in which the "precision" is left
> implicit: Lua could implement %d by using '%.0f' internally.

I'd like to see this solved with defines in a config, with something like

#define LUA_STRFORMAT_FLOATINGINT sizeof(int) < sizeof(lua_Number)
#if LUA_STRFORMAT_FLOATINGINT
#define LUA_STRFORMAT_INTEGER = "%.0f"
#else
#define LUA_STRFORMAT_INTEGER = "%d"
#endif

I've actually been impacted by this problem before, with large negative  
numbers (I solved it by changing "%d" to "-%d" and the input from v to -v).


Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Lorenzo Donati-2
In reply to this post by Luiz Henrique de Figueiredo
On 27/07/2011 23.10, Luiz Henrique de Figueiredo wrote:

>>      >  =big
>>      2500000000
>>      >  =string.format('%d', big)
>>      -2147483648
>
> This works fine in 5.2:
>
> Lua 5.2.0  Copyright (C) 1994-2011 Lua.org, PUC-Rio
>> big=2500000000
>> =string.format('%d', big)
> 2500000000
>
>
>
Strangely enough, it doesn't work for me and neither this one :

-- Lua 5.2.0 on Windows XP (SP2)
-- CPU: Intel Mobile Core 2 Duo T7500
-- Compiled with TDM-GCC 4.5.2
x = 2^31
print( ("%.0f     %d"):format(x,x) )
--> 2147483648     -2147483648

-- Lorenzo

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Luiz Henrique de Figueiredo
In reply to this post by Stuart P. Bentley
> I'd like to see this solved with defines in a config, with something like
>
> #define LUA_STRFORMAT_FLOATINGINT sizeof(int) < sizeof(lua_Number)
> #if LUA_STRFORMAT_FLOATINGINT

That cannot work because sizeof is not evaluated at preprocessing time.

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Dimiter 'malkia' Stanev
Yup, but one can do it inlined

(sizeof(int)<sizeof(lua_Number)) ? "%0.f" : "%d"

or (but requires some non-header change)

const char* formatters[2] = { "%d", "%0.f" }
formatters[ sizeof(int) < sizeof( lua_Number) ]

Or this stupid trick, without requiring above:

("%d\0\0%0.f" + ((sizeof(int) < sizeof(lua_Number))<<2))

\0 is supposed to be 0 (I'm never sure how to encode this in "C")

#define LUA_STRFORMAT_FLOATINGINT sizeof(int) < sizeof(lua_Number)
#if LUA_STRFORMAT_FLOATINGINT
#define LUA_STRFORMAT_INTEGER = "%.0f"
#else
#define LUA_STRFORMAT_INTEGER = "%d"

On 7/28/2011 11:20 AM, Luiz Henrique de Figueiredo wrote:
>> I'd like to see this solved with defines in a config, with something like
>>
>> #define LUA_STRFORMAT_FLOATINGINT sizeof(int)<  sizeof(lua_Number)
>> #if LUA_STRFORMAT_FLOATINGINT
>
> That cannot work because sizeof is not evaluated at preprocessing time.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Lorenzo Donati-2
In reply to this post by Luiz Henrique de Figueiredo
On 27/07/2011 23.10, Luiz Henrique de Figueiredo wrote:

>>      >  =big
>>      2500000000
>>      >  =string.format('%d', big)
>>      -2147483648
>
> This works fine in 5.2:
>
> Lua 5.2.0  Copyright (C) 1994-2011 Lua.org, PUC-Rio
>> big=2500000000
>> =string.format('%d', big)
> 2500000000
>
>
>
Well, now I'm really puzzled. I tested again:
x = 2^31
print( ("%.0f     %d"):format(x,x) )

on 5.2.0-alpha and all the betas (rc1 through rc7).

and I keep getting the same results:

--> 2147483648     -2147483648

Since the issue should have been fixed in 5.2, either I'm doing
something really silly or my toolchain has something wrong. Could it be
that msvcrt.dll has some issue that mingw cannot work around?

Is there someone else that gets the same results on WinXP (32bit) with
(TDM) mingw?

Any hint appreciated.

-- Lorenzo

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Dirk Laurie
On Fri, Jul 29, 2011 at 09:22:33AM +0200, Lorenzo Donati wrote:
> Is there someone else that gets the same results on WinXP (32bit) with
> (TDM) mingw?
>
> Any hint appreciated.

Install cygwin and gcc.

Dirk

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Xavier Wang
In reply to this post by Lorenzo Donati-2
2011/7/29 Lorenzo Donati <[hidden email]>:

> On 27/07/2011 23.10, Luiz Henrique de Figueiredo wrote:
>>>
>>>     >  =big
>>>     2500000000
>>>     >  =string.format('%d', big)
>>>     -2147483648
>>
>> This works fine in 5.2:
>>
>> Lua 5.2.0  Copyright (C) 1994-2011 Lua.org, PUC-Rio
>>>
>>> big=2500000000
>>> =string.format('%d', big)
>>
>> 2500000000
>>
>>
>>
> Well, now I'm really puzzled. I tested again:
> x = 2^31
> print( ("%.0f     %d"):format(x,x) )
>
> on 5.2.0-alpha and all the betas (rc1 through rc7).
>
> and I keep getting the same results:
>
> --> 2147483648     -2147483648
>
> Since the issue should have been fixed in 5.2, either I'm doing something
> really silly or my toolchain has something wrong. Could it be that
> msvcrt.dll has some issue that mingw cannot work around?
>
> Is there someone else that gets the same results on WinXP (32bit) with (TDM)
> mingw?
>
> Any hint appreciated.
>
> -- Lorenzo
>
>

I used Win7 and Lua-5.2.0-beta (not rc), this is result:
D:\Work\Source\lua-5.2.0-beta\src>lua.exe
Lua 5.2.0 (beta)  Copyright (C) 1994-2011 Lua.org, PUC-Rio
> x = 2^31
> print(("%.0f %d"):format(x,x))
2147483648 -2147483648
>

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Lorenzo Donati-2
In reply to this post by Dirk Laurie
On 29/07/2011 10.54, Dirk Laurie wrote:
> On Fri, Jul 29, 2011 at 09:22:33AM +0200, Lorenzo Donati wrote:
>> Is there someone else that gets the same results on WinXP (32bit) with
>> (TDM) mingw?
>>
>> Any hint appreciated.
>
> Install cygwin and gcc.

Thanks, but that is not an option for me. I need native Windows
executables, without compatibility layers (moreover, at the time I last
checked, cygwin.dll had a quite restrictive license).
>
> Dirk
>
>
>
-- Lorenzo

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Lorenzo Donati-2
In reply to this post by Luiz Henrique de Figueiredo
On 27/07/2011 23.10, Luiz Henrique de Figueiredo wrote:

>>      >  =big
>>      2500000000
>>      >  =string.format('%d', big)
>>      -2147483648
>
> This works fine in 5.2:
>
> Lua 5.2.0  Copyright (C) 1994-2011 Lua.org, PUC-Rio
>> big=2500000000
>> =string.format('%d', big)
> 2500000000
>
>
>

Now this is interesting: I tried to understand lstrlib.c and the code of
str_format. I added some debug prints in the case branch for "%d":

---------------------------------------------------------------------
case 'd':  case 'i':
case 'o':  case 'u':  case 'x':  case 'X': {
lua_Number n = luaL_checknumber(L, arg);

/* DEBUG */
printf("long long width: %d bit\n", sizeof(long long) * CHAR_BIT );
long long nnn = 2LL << 30; /* 2^31 */
double ddd = nnn;
nnn = ddd; /* check round-trip */
printf( "ddd = %.0f\n", ddd );
printf( "nnn = %lld\n", nnn );
/* END DEBUG */

LUA_INTFRM_T r = (n < 0) ? (LUA_INTFRM_T)n :
                            (LUA_INTFRM_T)(unsigned LUA_INTFRM_T)n;
addlenmod(form, LUA_INTFRMLEN);
nb = sprintf(buff, form, r);
break;
---------------------------------------------------------------------

When executing this lua code with the "hacked" interpreter:

x = 2^31; print( ("%.0f     %d"):format(x,x) )

it outputs:

long long width: 64 bit
ddd = 2147483648
nnn = -2147483648
2147483648     -2147483648

so it would seem that %lld conversion support in printf is broken.

But then I created a simple test.c file with the same debug prints:

---------------------------------------------------------------------
#include <stdio.h>
#include <limits.h>

int main()
{

    printf("long long width: %d bit\n", sizeof(long long) * CHAR_BIT );
    long long nnn = 2LL << 30; /* 2^31 */
    double ddd = nnn;
    nnn = ddd; /* check round-trip */
    printf( "ddd = %.0f\n", ddd );
    printf( "nnn = %lld\n", nnn );

}
---------------------------------------------------------------------

And compiled it this way:
gcc test.c -c -O2 -Wall -pedantic -std=c90
gcc test.o -o test.exe -O2 -Wall -pedantic -std=c90

Besides some warnings about C90 not supporting "long long ints", all
went ok, and when running test.exe it gives:u

long long width: 64 bit
ddd = 2147483648
nnn = 2147483648

I works!

So there must be some strange combination of compiler flags in the
standard makefile that makes GCC do the wrong thing for %lld in printf.

I also checked with DependencyWalker that test.exe uses the same runtime
(only msvcrt.dll). So it seems that it isn't msvcrt's fault either.

So now I'm completely puzzled, and I reached the limits of my C hacking
ability.  :-(

-- Lorenzo







Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Lorenzo Donati-2
In reply to this post by Xavier Wang
On 29/07/2011 10.57, Xavier Wang wrote:
[..]
>
> I used Win7 and Lua-5.2.0-beta (not rc), this is result:

Just to be clear, I should have pointed out that 5.2.0-beta-rc7 and
5.2.0-beta are exactly the same; rc7 was frozen and became "the beta".

> D:\Work\Source\lua-5.2.0-beta\src>lua.exe
> Lua 5.2.0 (beta)  Copyright (C) 1994-2011 Lua.org, PUC-Rio
>> x = 2^31
>> print(("%.0f %d"):format(x,x))
> 2147483648 -2147483648
>>
>
>
>

-- Lorenzo

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

KHMan
In reply to this post by Lorenzo Donati-2
On 7/29/2011 3:22 PM, Lorenzo Donati wrote:

> On 27/07/2011 23.10, Luiz Henrique de Figueiredo wrote:
>>> > =big
>>> 2500000000
>>> > =string.format('%d', big)
>>> -2147483648
>>
>> This works fine in 5.2:
>>
>> Lua 5.2.0 Copyright (C) 1994-2011 Lua.org, PUC-Rio
>>> big=2500000000
>>> =string.format('%d', big)
>> 2500000000
>>
>>
>>
> Well, now I'm really puzzled. I tested again:
> x = 2^31
> print( ("%.0f %d"):format(x,x) )
>
> on 5.2.0-alpha and all the betas (rc1 through rc7).
>
> and I keep getting the same results:
>
> --> 2147483648 -2147483648
>
> Since the issue should have been fixed in 5.2, either I'm doing
> something really silly or my toolchain has something wrong. Could
> it be that msvcrt.dll has some issue that mingw cannot work around?
>
> Is there someone else that gets the same results on WinXP (32bit)
> with (TDM) mingw?

WinXP, MinGW gcc 4.5.2 (TDM)

In luaconf.h, LUA_WIN does not define LUA_USE_LONGLONG. So in
lstrlib.c, LUA_INTFRM uses 'l'/long instead of 'll'/long long.

IIRC, 'll' does not work for msvcrt.dll.

Behaviour is the same for official MinGW gcc 4.5.2.

Would be nice to have proper behaviour on MinGW...

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

KHMan
In reply to this post by Lorenzo Donati-2
On 7/29/2011 7:47 PM, Lorenzo Donati wrote:

> On 27/07/2011 23.10, Luiz Henrique de Figueiredo wrote:
>>> > =big
>>> 2500000000
>>> > =string.format('%d', big)
>>> -2147483648
>>[snip]
> ---------------------------------------------------------------------
> #include <stdio.h>
> #include <limits.h>
>
> int main()
> {
>
> printf("long long width: %d bit\n", sizeof(long long) * CHAR_BIT );
> long long nnn = 2LL << 30; /* 2^31 */
> double ddd = nnn;
> nnn = ddd; /* check round-trip */
> printf( "ddd = %.0f\n", ddd );
> printf( "nnn = %lld\n", nnn );
>
> }
> ---------------------------------------------------------------------
>
> And compiled it this way:
> gcc test.c -c -O2 -Wall -pedantic -std=c90
> gcc test.o -o test.exe -O2 -Wall -pedantic -std=c90
> [snip]
> So there must be some strange combination of compiler flags in the
> standard makefile that makes GCC do the wrong thing for %lld in
> printf.

A wild guess, perhaps -std=c90 pulls in the mingwex versions of
*printf. Without the -std=c90, it doesn't work. But putting
-std=c90 and building the Lua DLL does not appear to work...
documentation on mingwex is thin on the Internet.

What does work is using __mingw_sprintf in str_format().

Alternatively, Win32 has the "I64" specifier. I dunno how portable
it is across versions of Win32 >=Win2K.

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

KHMan
In reply to this post by Lorenzo Donati-2
On 7/29/2011 7:47 PM, Lorenzo Donati wrote:

> On 27/07/2011 23.10, Luiz Henrique de Figueiredo wrote:
>>> [snipped all]
> So there must be some strange combination of compiler flags in the
> standard makefile that makes GCC do the wrong thing for %lld in
> printf.
>
> I also checked with DependencyWalker that test.exe uses the same
> runtime (only msvcrt.dll). So it seems that it isn't msvcrt's
> fault either.
> [snip]

Adding -D__USE_MINGW_ANSI_STDIO into SYSCFLAGS in the Makefile
will give the proper 'll' result, after fixing the LUA_WIN defines
in luaconf.h to include LUA_USE_LONGLONG. This pulls in a more
complete *printf from mingwex.

Or _mingw.h says this works too:

#define __MINGW_FEATURES__     __USE_MINGW_ANSI_STDIO

I dunno whether this or "I64" is the better solution.

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia

Reply | Threaded
Open this post in threaded view
|

Re: problem with string.format %d and very large integers

Steve Litt
In reply to this post by Lorenzo Donati-2
On Friday, July 29, 2011 07:30:10 AM Lorenzo Donati wrote:

> On 29/07/2011 10.54, Dirk Laurie wrote:
> > On Fri, Jul 29, 2011 at 09:22:33AM +0200, Lorenzo Donati wrote:
> >> Is there someone else that gets the same results on WinXP
> >> (32bit) with (TDM) mingw?
> >>
> >> Any hint appreciated.
> >
> > Install cygwin and gcc.
>
> Thanks, but that is not an option for me. I need native Windows
> executables, without compatibility layers (moreover, at the time I
> last checked, cygwin.dll had a quite restrictive license).

Yeah, when I read the cygwin suggestion it sounded like a workaround
that might not be practical.

I have no idea how often you need correct functionality of %d in
string.format, but maybe, until this unexpected inconsistent behavior
is converted to a consistent behavior, you could write your own
decimal2string(num, width, decimalplaces) function, and then just use
%s in string.format. Crude, ugly, should be unnecessary, but these
things happen, and I've found sometimes the easiest thing is to write
a workaround until the next version fixes the bug.

You could implement this function in C if practical, or otherwise in
Lua if where it's deployed can't compile C. The C would obviously be
faster and perhaps easier (I think you just use itoa() or ltoa or
sprintf() with a %f or something similar), but if you don't use this
function all over the place or in tight inner loops, the speed
differential would be meaningless.

HTH

SteveT

Steve Litt
Author: The Key to Everyday Excellence
http://www.troubleshooters.com/bookstore/key_excellence.htm
Twitter: http://www.twitter.com/stevelitt


1234