Hi all,
(I will avoid opinions in the following. Y'all are free to decide whatever y'all wish to do. :-p) Tests were done on generic MinGW binaries of lua-5.4.0-work1. The 32-bit binary has the double-rounding issue due to extended precision that was discussed in the past few days. The 64-bit binary uses the 64-bit FPU datapath and is free of double-rounding. Tests were done on the output of math.random(), which is between 0 and 1. Any zeros are skipped. The first tests are A+B, A-B, A*B and A/B. A and B do not change for the 4 ops. The number of different results between the 32-bit binary and the 64-bit binary is counted. Operation / Op_Count / Diff_Count ================================= A+B 1,000,000,000 0 A-B 1,000,000,000 0 A*B 1,000,000,000 243,584 A/B 1,000,000,000 244,107 For A*B and A/B, this is roughly 1 in 4000. All different results differ by 1 ULP. Nothing more than 1 ULP was seen. Here is a look at one result: Operation : * n1 = 0.18987016622488206 0x1.84daa65360288p-3 n2 = 0.98339548253950804 0x1.f77f9cd91528bp-1 32b = 0.18671746373457448 0x1.7e65b9c2a810ep-3 64b = 0.18671746373457451 0x1.7e65b9c2a810fp-3 Plugging the values into an online calculator that has binary128 [1], we can do a check with a 'better' result, assuming n1 and n2 is precise to many more digits. The comparison is as follows: fpu8087 = 0.18671746373457448 binary128 = 0.1867174637345744950213132394217624 fpu64bit = 0.18671746373457451 So results that differ have intermediate results that are almost exactly in-between, this is why there is a double-rounding issue. A casual check with a few other numbers show the same trend. However, note that those decimal representations of the 64-bit floats are not exactly equal to the exact value of the binary representation -- they just have enough digits for round-tripping. Here are the hex representations of the significands: fpu8087 = 7e65b9c2a810e binary128 = 7E65B9C2A810E5EAB7A33195161D fpu64bit = 7e65b9c2a810f Using this comparison, it appears that the binary128 result is nearer to fpu8087 than to fpu64bit. We can also verify this by getting a more accurate decimal version of the 64-bit binary representation (done by manual fiddling, so not definitive): fpu8087 = 0.1867174637345744847571893387794262 binary128 = 0.1867174637345744950213132394217624 fpu64bit = 0.1867174637345745125127649544083397 Thus it is confirmed that the fpu8087 result is closer to the binary128 result, at least for this example. This ends the first part. In order to generate hits for A+B and A-B, the magnitude of B is changed. B is divided by 1e3, 1e6, 1e9 or 1e12, and then the addition is performed and the comparison made. Operation / Op_Count / Diff_Count ================================= A+(B/1e3) 1,000,000,000 0 A+(B/1e6) 1,000,000,000 0 A+(B/1e9) 1,000,000,000 244,158 A+(B/1e12) 1,000,000,000 243,489 Now we can see the double-rounding differences. For the last two operations, the rate is also roughly 1 in 4000. Again, all different results differ by 1 ULP. Nothing more than 1 ULP was seen. This ends the second part. [1] http://weitz.de/ieee/ -- Cheers, Kein-Hong Man (esq.) Selangor, Malaysia |
On Jun 8, 2018, at 5:47 AM, KHMan <[hidden email]> wrote: Operation / Op_Count / Diff_Count ================================= A+B 1,000,000,000 0 A-B 1,000,000,000 0 A*B 1,000,000,000 243,584 A/B 1,000,000,000 244,107 For A*B and A/B, this is roughly 1 in 4000. All different results differ by 1 ULP. Nothing more than 1 ULP was seen. I found another blog doing the same test, error also about 1:4000 Just some math behind the observation: Double rounding occurs when both rounded in the same direction. Difference between 53-bits and 64-bits are 11-bits. --> chances of same direction rounding is about 50% --> chances of double rounding = 50% / 2^11 = 1 / 4096 With just 1 op, of course the error is at most 1 ULP Operation : * Here are the hex representations of the significands: fpu8087 = 7e65b9c2a810e binary128 = 7E65B9C2A810E5EAB7A33195161D fpu64bit = 7e65b9c2a810f Using this comparison, it appears that the binary128 result is nearer to fpu8087 than to fpu64bit. We can also verify this by getting a more accurate decimal version of the 64-bit binary representation (done by manual fiddling, so not definitive): You do realize we have to work in 53-bits for comparison. Getting more precise result with more bits does not show anything. My guess is you set (n1, n2) as decimal string -> __float128. Had you use the actual hexfloats, you should get this: n1 * n2 = 0x1.84daa65360288p-3 * 0x1.f77f9cd91528bp-1 = 0x1.7e65b9c2a810e 800eb4c1577ec p-3 --> value should round-up, to 0x1.7e65b9c2a810fp-3 --> 53-bits rounding is doing the best it can with 2 doubles. |
Free forum by Nabble | Edit this page |