

In the last week I've been working on a new arbitrary precision integer arithmetic library for Lua. I've mainly created it because I needed a library that followed Lua 5.3+ integer semantics but could work with large integers with efficiency in pure Lua, as none of the existing lua libraries fitted my requirements I made this new one. Fully documented, with a test suite covering all the code, with many examples and being used already in some of my github projects. Features: * Small, simple and self contained * Efficient (for a pure Lua integer library) * Works with a fixed width integer * Follows Lua 5.3+ integer arithmetic semantics by default * All integer overflows wraps around * Can work with large integer widths with reasonable speed (such as 1024bit integers) * Implements all lua arithmetic operators (+, , *, //, /, %) and bitwise operators (&, , ~, <<, >>) * Provide methods to work with unsigned arithmetic only * Supports signed integers by default using twocomplement arithmetic rules on unsigned operations * Allow to mix any operation with lua numbers, promoting to lua floats where needed For more information check the project page: https://github.com/edubart/luabint/And it's documentation: https://edubart.github.io/luabint/
It is available in luarocks.
 Eduardo Bart


The fact it uses the same fixedwidth for all further operations (using a "static" storage from the initialization) limits the usability as a genericpurpose "library": you cannot reuse the library in projects that need different precision. Otherwise they will all use the same precision (256 bits by default) and any change (using different "initializations") will destroy the results with existing instances. Ideally, the bigints should have a member containing their number of bits (and an indicator that unsigned arithmetic occurs), so that each instance carries it. As well the library should include primitives to support "rotates" Le ven. 10 juil. 2020 à 00:13, Eduardo Bart < [hidden email]> a écrit : In the last week I've been working on a new arbitrary precision integer arithmetic library for Lua. I've mainly created it because I needed a library that followed Lua 5.3+ integer semantics but could work with large integers with efficiency in pure Lua, as none of the existing lua libraries fitted my requirements I made this new one. Fully documented, with a test suite covering all the code, with many examples and being used already in some of my github projects. Features: * Small, simple and self contained * Efficient (for a pure Lua integer library) * Works with a fixed width integer * Follows Lua 5.3+ integer arithmetic semantics by default * All integer overflows wraps around * Can work with large integer widths with reasonable speed (such as 1024bit integers) * Implements all lua arithmetic operators (+, , *, //, /, %) and bitwise operators (&, , ~, <<, >>) * Provide methods to work with unsigned arithmetic only * Supports signed integers by default using twocomplement arithmetic rules on unsigned operations * Allow to mix any operation with lua numbers, promoting to lua floats where needed For more information check the project page: https://github.com/edubart/luabint/And it's documentation: https://edubart.github.io/luabint/
It is available in luarocks.
 Eduardo Bart


The fact it uses the same fixedwidth for all further operations (using a "static" storage from the initialization) limits the usability as a genericpurpose "library": you cannot reuse the library in projects that need different precision. Otherwise they will all use the same precision (256 bits by default) and any change (using different "initializations") will destroy the results with existing instances. Ideally, the bigints should have a member containing their number of bits (and an indicator that unsigned arithmetic occurs), so that each instance carries it. As well the library should include primitives to support "rotates"
It depends on what you use the library for. When you implement cryptographic primitives (e.g. elliptic curve signing or elliptic curve DiffieHellman) you want the operations to take the same amount of time for all possible inputs to prevent timing attacks, a fixed size big number representation is then more or less required. A typical size would be 256 bits for elliptic curves, or 10244096 bits for RSA.
 
Gė


That is a good point, thus I've slightly changed the library initialization APIs to avoid conflicts when using different precisions between different project parts, this is good now too because allows working with different integer precisions in the same project in case needed. Although mixing operations with integers of different sizes is not permitted and was a design choice from the beginning in favor of efficiency and simplicity. I've also added functions for bitwise rotation. Em sex., 10 de jul. de 2020 às 07:50, Philippe Verdy < [hidden email]> escreveu: The fact it uses the same fixedwidth for all further operations (using a "static" storage from the initialization) limits the usability as a genericpurpose "library": you cannot reuse the library in projects that need different precision. Otherwise they will all use the same precision (256 bits by default) and any change (using different "initializations") will destroy the results with existing instances. Ideally, the bigints should have a member containing their number of bits (and an indicator that unsigned arithmetic occurs), so that each instance carries it. As well the library should include primitives to support "rotates"
Le ven. 10 juil. 2020 à 00:13, Eduardo Bart < [hidden email]> a écrit : In the last week I've been working on a new arbitrary precision integer arithmetic library for Lua. I've mainly created it because I needed a library that followed Lua 5.3+ integer semantics but could work with large integers with efficiency in pure Lua, as none of the existing lua libraries fitted my requirements I made this new one. Fully documented, with a test suite covering all the code, with many examples and being used already in some of my github projects. Features: * Small, simple and self contained * Efficient (for a pure Lua integer library) * Works with a fixed width integer * Follows Lua 5.3+ integer arithmetic semantics by default * All integer overflows wraps around * Can work with large integer widths with reasonable speed (such as 1024bit integers) * Implements all lua arithmetic operators (+, , *, //, /, %) and bitwise operators (&, , ~, <<, >>) * Provide methods to work with unsigned arithmetic only * Supports signed integers by default using twocomplement arithmetic rules on unsigned operations * Allow to mix any operation with lua numbers, promoting to lua floats where needed For more information check the project page: https://github.com/edubart/luabint/And it's documentation: https://edubart.github.io/luabint/
It is available in luarocks.
 Eduardo Bart


Counting the number of bits can be improved a lot instead of using successive shifts by 1 bit, you can at least compute the number of bits set in a 32 bit integer in a very few operations, without any loop.
Le ven. 10 juil. 2020 à 15:41, Eduardo Bart < [hidden email]> a écrit : That is a good point, thus I've slightly changed the library initialization APIs to avoid conflicts when using different precisions between different project parts, this is good now too because allows working with different integer precisions in the same project in case needed. Although mixing operations with integers of different sizes is not permitted and was a design choice from the beginning in favor of efficiency and simplicity. I've also added functions for bitwise rotation.
Em sex., 10 de jul. de 2020 às 07:50, Philippe Verdy < [hidden email]> escreveu: The fact it uses the same fixedwidth for all further operations (using a "static" storage from the initialization) limits the usability as a genericpurpose "library": you cannot reuse the library in projects that need different precision. Otherwise they will all use the same precision (256 bits by default) and any change (using different "initializations") will destroy the results with existing instances. Ideally, the bigints should have a member containing their number of bits (and an indicator that unsigned arithmetic occurs), so that each instance carries it. As well the library should include primitives to support "rotates"
Le ven. 10 juil. 2020 à 00:13, Eduardo Bart < [hidden email]> a écrit : In the last week I've been working on a new arbitrary precision integer arithmetic library for Lua. I've mainly created it because I needed a library that followed Lua 5.3+ integer semantics but could work with large integers with efficiency in pure Lua, as none of the existing lua libraries fitted my requirements I made this new one. Fully documented, with a test suite covering all the code, with many examples and being used already in some of my github projects. Features: * Small, simple and self contained * Efficient (for a pure Lua integer library) * Works with a fixed width integer * Follows Lua 5.3+ integer arithmetic semantics by default * All integer overflows wraps around * Can work with large integer widths with reasonable speed (such as 1024bit integers) * Implements all lua arithmetic operators (+, , *, //, /, %) and bitwise operators (&, , ~, <<, >>) * Provide methods to work with unsigned arithmetic only * Supports signed integers by default using twocomplement arithmetic rules on unsigned operations * Allow to mix any operation with lua numbers, promoting to lua floats where needed For more information check the project page: https://github.com/edubart/luabint/And it's documentation: https://edubart.github.io/luabint/
It is available in luarocks.
 Eduardo Bart


Counting number of bits in what context? The library doesn't have this function, I do shift in one bit in integer pow functions but it's not much with the intention of counting bits.
What I would like to optimize more is the conversion from base 2 to base 10 in the library, it is used when printing big numbers in base 10 as string. My `tobase` function calls too many times `udivmod` (unsigned integer division with modulus) for really large integers (8192 bits or so) and this make tostring very slow for big numbers. If you have any ideas how I could optimize that base conversion (or perhaps udivmod) I would appreciate it.
Em sex., 10 de jul. de 2020 às 15:04, Philippe Verdy < [hidden email]> escreveu: Counting the number of bits can be improved a lot instead of using successive shifts by 1 bit, you can at least compute the number of bits set in a 32 bit integer in a very few operations, without any loop.
Le ven. 10 juil. 2020 à 15:41, Eduardo Bart < [hidden email]> a écrit : That is a good point, thus I've slightly changed the library initialization APIs to avoid conflicts when using different precisions between different project parts, this is good now too because allows working with different integer precisions in the same project in case needed. Although mixing operations with integers of different sizes is not permitted and was a design choice from the beginning in favor of efficiency and simplicity. I've also added functions for bitwise rotation.
Em sex., 10 de jul. de 2020 às 07:50, Philippe Verdy < [hidden email]> escreveu: The fact it uses the same fixedwidth for all further operations (using a "static" storage from the initialization) limits the usability as a genericpurpose "library": you cannot reuse the library in projects that need different precision. Otherwise they will all use the same precision (256 bits by default) and any change (using different "initializations") will destroy the results with existing instances. Ideally, the bigints should have a member containing their number of bits (and an indicator that unsigned arithmetic occurs), so that each instance carries it. As well the library should include primitives to support "rotates"
Le ven. 10 juil. 2020 à 00:13, Eduardo Bart < [hidden email]> a écrit : In the last week I've been working on a new arbitrary precision integer arithmetic library for Lua. I've mainly created it because I needed a library that followed Lua 5.3+ integer semantics but could work with large integers with efficiency in pure Lua, as none of the existing lua libraries fitted my requirements I made this new one. Fully documented, with a test suite covering all the code, with many examples and being used already in some of my github projects. Features: * Small, simple and self contained * Efficient (for a pure Lua integer library) * Works with a fixed width integer * Follows Lua 5.3+ integer arithmetic semantics by default * All integer overflows wraps around * Can work with large integer widths with reasonable speed (such as 1024bit integers) * Implements all lua arithmetic operators (+, , *, //, /, %) and bitwise operators (&, , ~, <<, >>) * Provide methods to work with unsigned arithmetic only * Supports signed integers by default using twocomplement arithmetic rules on unsigned operations * Allow to mix any operation with lua numbers, promoting to lua floats where needed For more information check the project page: https://github.com/edubart/luabint/And it's documentation: https://edubart.github.io/luabint/
It is available in luarocks.
 Eduardo Bart


It was thus said that the Great Eduardo Bart once stated:
> Counting number of bits in what context? The library doesn't have this
> function, I do shift in one bit in integer pow functions but it's not much
> with the intention of counting bits.
>
> What I would like to optimize more is the conversion from base 2 to base 10
> in the library, it is used when printing big numbers in base 10 as string.
> My `tobase` function calls too many times `udivmod` (unsigned integer
> division with modulus) for really large integers (8192 bits or so) and this
> make tostring very slow for big numbers. If you have any ideas how I could
> optimize that base conversion (or perhaps udivmod) I would appreciate it.
You want to as large of a division as possible. On a 32bit system, you
want to do a "64 bits divided by 32 bits" repeatedly, and on a 64bit
system, you'll want to do a "128 bits divided by 64 bits" repeatedly [1].
For best results, you would need to hit assembly, but a reasonable C
compiler will probably produce decent results with code like:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <assert.h>
#define MAX 256
unsigned long xremainder(
unsigned long dest[MAX],
unsigned long const src[MAX],
unsigned long demon
)
{
unsigned long long carry = 0;
unsigned long long quo;
unsigned long long rem;
size_t i;
for (i = 0 ; i < MAX ; i++)
{
carry <<= (sizeof(long) * CHAR_BIT);
carry = src[i];
quo = carry / demon;
rem = carry % demon;
dest[i] = quo;
carry = rem;
}
return rem;
}
int main(void)
{
unsigned long src[MAX];
unsigned long dest[MAX];
unsigned long zero[MAX];
char buffer[BUFSIZ];
char *p;
assert(sizeof(unsigned long long) > sizeof(unsigned long));
memset(zero, 0,sizeof(zero));
memset(src, 255,sizeof(src)); /* maximum number */
p = &buffer[BUFSIZ];
*p = '\0';
do
{
unsigned long rem = xremainder(dest,src,10);
assert(rem <= 9);
*p = rem + '0';
memcpy(src,dest,sizeof(src));
} while(memcmp(src,zero,sizeof(zero)));
puts(p);
return 0;
}
If you can somehow do "larger" than "bitatatime" division, that should
speed up the code.
spc
[1] All the architectures I'm familiar with that have a DIV instruction
will allow one to divide a number twice the "word size" (32 bits on
a 16 bit machine, 64 bits on a 32 bit machine, 128 bits on a 64 bit
machine) by the word size.
Conversely, the MUL instruction will usually result in a result
twice the normal word size (even the 8bit 6809, which has an 8bit
MUL instruction will produce a 16bit result).


It was thus said that the Great Sean Conner once stated:
>
> You want to as large of a division as possible. On a 32bit system, you
> want to do a "64 bits divided by 32 bits" repeatedly, and on a 64bit
> system, you'll want to do a "128 bits divided by 64 bits" repeatedly [1].
> For best results, you would need to hit assembly, but a reasonable C
> compiler will probably produce decent results with code like:
Please note that to do such divisions, the data needs to be correctly
adjusted for the endianness of the system. For my sample code here:
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <limits.h>
> #include <assert.h>
>
> #define MAX 256
>
> unsigned long xremainder(
> unsigned long dest[MAX],
> unsigned long const src[MAX],
> unsigned long demon
> )
> {
> unsigned long long carry = 0;
> unsigned long long quo;
> unsigned long long rem;
> size_t i;
>
> for (i = 0 ; i < MAX ; i++)
> {
> carry <<= (sizeof(long) * CHAR_BIT);
> carry = src[i];
> quo = carry / demon;
> rem = carry % demon;
> dest[i] = quo;
> carry = rem;
> }
>
> return rem;
> }
>
> int main(void)
> {
> unsigned long src[MAX];
> unsigned long dest[MAX];
> unsigned long zero[MAX];
> char buffer[BUFSIZ];
> char *p;
>
> assert(sizeof(unsigned long long) > sizeof(unsigned long));
>
> memset(zero, 0,sizeof(zero));
> memset(src, 255,sizeof(src)); /* maximum number */
>
> p = &buffer[BUFSIZ];
> *p = '\0';
>
> do
> {
> unsigned long rem = xremainder(dest,src,10);
> assert(rem <= 9);
> *p = rem + '0';
> memcpy(src,dest,sizeof(src));
> } while(memcmp(src,zero,sizeof(zero)));
>
> puts(p);
> return 0;
> }
I treat the array as a bigendian system, but on an x86 system, each
unsigned long in the array is stored littleendian. Just be aware of this
when doing the math.
spc (Just thought I should mention this)


Sean Conner, that algorithm worked great, thanks for the enlightenment! I've incorporated it in the library in pure lua yet and now printing big integers is substantially faster.
Now I just wonder if I could have a more efficient unsigned integer division. Currently I am doing basically this algorithm https://courses.cs.vt.edu/~cs1104/BuildingBlocks/divide.030.htmlHowever it does not take much the advantage of the fact that integers are composed of multiple 32bit words. I need to shift a single bit many times in that algorithm. For example when dealing with 8192bit integers would mean that much shifts for a single division. I wonder if there is a better (but reasonably simple) algorithm that could take advantage of the multiple words. Em sex., 10 de jul. de 2020 às 20:52, Sean Conner < [hidden email]> escreveu: It was thus said that the Great Eduardo Bart once stated:
> Counting number of bits in what context? The library doesn't have this
> function, I do shift in one bit in integer pow functions but it's not much
> with the intention of counting bits.
>
> What I would like to optimize more is the conversion from base 2 to base 10
> in the library, it is used when printing big numbers in base 10 as string.
> My `tobase` function calls too many times `udivmod` (unsigned integer
> division with modulus) for really large integers (8192 bits or so) and this
> make tostring very slow for big numbers. If you have any ideas how I could
> optimize that base conversion (or perhaps udivmod) I would appreciate it.
You want to as large of a division as possible. On a 32bit system, you
want to do a "64 bits divided by 32 bits" repeatedly, and on a 64bit
system, you'll want to do a "128 bits divided by 64 bits" repeatedly [1].
For best results, you would need to hit assembly, but a reasonable C
compiler will probably produce decent results with code like:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <assert.h>
#define MAX 256
unsigned long xremainder(
unsigned long dest[MAX],
unsigned long const src[MAX],
unsigned long demon
)
{
unsigned long long carry = 0;
unsigned long long quo;
unsigned long long rem;
size_t i;
for (i = 0 ; i < MAX ; i++)
{
carry <<= (sizeof(long) * CHAR_BIT);
carry = src[i];
quo = carry / demon;
rem = carry % demon;
dest[i] = quo;
carry = rem;
}
return rem;
}
int main(void)
{
unsigned long src[MAX];
unsigned long dest[MAX];
unsigned long zero[MAX];
char buffer[BUFSIZ];
char *p;
assert(sizeof(unsigned long long) > sizeof(unsigned long));
memset(zero, 0,sizeof(zero));
memset(src, 255,sizeof(src)); /* maximum number */
p = &buffer[BUFSIZ];
*p = '\0';
do
{
unsigned long rem = xremainder(dest,src,10);
assert(rem <= 9);
*p = rem + '0';
memcpy(src,dest,sizeof(src));
} while(memcmp(src,zero,sizeof(zero)));
puts(p);
return 0;
}
If you can somehow do "larger" than "bitatatime" division, that should
speed up the code.
spc
[1] All the architectures I'm familiar with that have a DIV instruction
will allow one to divide a number twice the "word size" (32 bits on
a 16 bit machine, 64 bits on a 32 bit machine, 128 bits on a 64 bit
machine) by the word size.
Conversely, the MUL instruction will usually result in a result
twice the normal word size (even the 8bit 6809, which has an 8bit
MUL instruction will produce a 16bit result).


>>>>> "Eduardo" == Eduardo Bart < [hidden email]> writes:
Eduardo> What I would like to optimize more is the conversion from base
Eduardo> 2 to base 10 in the library, it is used when printing big
Eduardo> numbers in base 10 as string. My `tobase` function calls too
Eduardo> many times `udivmod` (unsigned integer division with modulus)
Eduardo> for really large integers (8192 bits or so) and this make
Eduardo> tostring very slow for big numbers. If you have any ideas how
Eduardo> I could optimize that base conversion (or perhaps udivmod) I
Eduardo> would appreciate it.
There are two standard tricks for this (and you should use both).
First, don't repeatedly divide by the base. Instead, precompute powers
of the base (I'll assume 10 for example purposes): 10^2, 10^4, 10^8,
10^16, etc., up to a value that's close to (but less than) half of the
input bits. That way, the remainder after dividing by that largest size
will always fit in half the bits, and no more than three (often only
two) such fulllength divisions are needed to break the input into
convenient chunks, which you then divide into halves, etc.
Second, don't use division at all (except possibly for that first step,
if you don't want to have to deal with doing an N*N>2N bit multiply, or
a "high half" multiply which will suffice). Instead, do division by
way of multiplication by scaled reciprocal (which you can precompute).

Andrew.


Eduardo...
On Fri, Jul 10, 2020 at 8:21 PM Eduardo Bart < [hidden email]> wrote:
> What I would like to optimize more is the conversion from base 2 to base 10 in the library, it is used when printing big numbers in base 10 as string. My `tobase` function calls too many times `udivmod` (unsigned integer division with modulus) for really large integers (8192 bits or so) and this make tostring very slow for big numbers. If you have any ideas how I could optimize that base conversion (or perhaps udivmod) I would appreciate it.
I'm not soo sure I've fully understand your library, but I'm under the
impression you store the numbers as an array of 32 bit words. That's a
number in base B=2^32. But It seems you operate with them as a bit
string.
To work with this kind of things what is normally done is to, assume
you have native multiplication and division on elements ( which it
seems you can ) operate on base B natively as much as possible. I
mean, have a function LSudivmod(L num, S denom)=>L quot, S rem ( I'm
using L for Long numbers, S for shorts ) which can be done in a single
pass, and, to convert to base 10, use the larger usable denominator (
that would be 10^9 for B=2^32 ). This way you can extract nine digits
in every pass.
Similarly for long division, it seems to me you are doing bit a time
division, where you can do it directly in base 32, just do it the same
way you do it when you divide by hand, word aligning, so you can
entirely avoid partial word shifts. You may have to do a little
"search" on each baseB digit of the quotient, the same you do in
decimal in base 10 directly in your head, but as you can code a
multiplysubstract without temporary storage it should be much faster.
Another thing. Multiprecision libraries have normally several
different representations for the numbers, like u32 array when they
are used for unsigned arithmetic + bit ops but base10^n ( 10000 for
machines where you have u16xu16=u32, 10_000_000_000 where you have
32*32=64 ), many times with signmagnitude representation among other
things to speed up the decimal printing. You may want to consider
having a conversion to decimal function but making the default
tostring() format hexadecimal, which would preserve the round trip
value when printing numbers but is much faster.
Francisco Olarte.


On 20200711 10:23 a.m., Francisco Olarte wrote:
> Similarly for long division, it seems to me you are doing bit a time
> division
I agree, for me it looks like bit division for bitpacked long numbers:
https://github.com/edubart/luabint/blob/master/bint.lua#L1089I think, the more large BASE you have, the better. Ideally BASE should
be machine register size.
For division there is a method to guess quotient digit with error no
more than two units by dividing first two digits of numerator by first
digit of denominator. But you should scale both numer and denom before
that by BASE / (greatest_denom_digit + 1). And correct results
afterwards by dividing this scale factor before function return.
This method is presented in Knuth's Art of Computer Programming,
4.3.1, Algorithm D, Division of nonnegative integers. (But quite hard
to understand and expressed badly, IMHO.)
As a side note, your project reminded me my 15year old project in
Delphi, I had mood and a couple of hours to implement simple division
of long numbers in Lua. There naive algorithm used for calculating
next quotient digit, which requires near BASE / 2 long subtractions.
But there is stub function for Knuth's algorithm.
Gist is:
https://gist.github.com/martineden/f015c1f08872a59ae8c1294524bb904aTest run:
$lua5.3 long_div.lua
BASE 10
long_div_mod(8_7_9_3, 2_7_7) = 3_1, 2_0_6
(I'm not going to hijack this thread, but feel that creating new
theme with "[ANN]" is way too much while still wish to keep mention
of snippet.)
 Martin


Thanks for pointing me to the method of Knuth's Art of Computer Programming and preparing a concept code for it, this method looks like to be what I was looking for the division algorithm. The algorithm you did with just the naive method of finding the quotient looks like it will be more inefficient than the bit division I am doing at the moment because I have to use a large base (2^32) and will take lots of iterations in the naive function, however with `get_quotient_digit_advanced` implemented using the Knuth's algorithm I see the potential to be more efficient, I'm going to research and play with it later. Em seg., 13 de jul. de 2020 às 01:33, Martin < [hidden email]> escreveu: On 20200711 10:23 a.m., Francisco Olarte wrote:
> Similarly for long division, it seems to me you are doing bit a time
> division
I agree, for me it looks like bit division for bitpacked long numbers:
https://github.com/edubart/luabint/blob/master/bint.lua#L1089
I think, the more large BASE you have, the better. Ideally BASE should
be machine register size.
For division there is a method to guess quotient digit with error no
more than two units by dividing first two digits of numerator by first
digit of denominator. But you should scale both numer and denom before
that by BASE / (greatest_denom_digit + 1). And correct results
afterwards by dividing this scale factor before function return.
This method is presented in Knuth's Art of Computer Programming,
4.3.1, Algorithm D, Division of nonnegative integers. (But quite hard
to understand and expressed badly, IMHO.)
As a side note, your project reminded me my 15year old project in
Delphi, I had mood and a couple of hours to implement simple division
of long numbers in Lua. There naive algorithm used for calculating
next quotient digit, which requires near BASE / 2 long subtractions.
But there is stub function for Knuth's algorithm.
Gist is:
https://gist.github.com/martineden/f015c1f08872a59ae8c1294524bb904a
Test run:
$lua5.3 long_div.lua
BASE 10
long_div_mod(8_7_9_3, 2_7_7) = 3_1, 2_0_6
(I'm not going to hijack this thread, but feel that creating new
theme with "[ANN]" is way too much while still wish to keep mention
of snippet.)
 Martin


Peecomputing scaled reciprocal is unfeasible for large words. Your lookup table for example with 32bit words would have 2^32 entries (excluding the reciprocal of 0). That lookup would be enormous. However for small tables this is feasible so you can build a small table of 256 reciprocals, and then handle the division by group of 4 bits (i.e. work in base 16 and make the traditional division digit by digit in that base, given that you'll handle two digits at once for the product). If you can use a larger lookup table (e.g. 1024 entries, i.e. 10 bits, you'll handle the divisions in base 32; with 4096 entries, the division will be made in base 64; above this limit the lookup table is already too large: 16K entries would turn into a too large module). But there are other ways to perform divisions than the classic Euclidian method.
Le sam. 11 juil. 2020 à 10:22, Andrew Gierth < [hidden email]> a écrit : >>>>> "Eduardo" == Eduardo Bart <[hidden email]> writes:
Eduardo> What I would like to optimize more is the conversion from base
Eduardo> 2 to base 10 in the library, it is used when printing big
Eduardo> numbers in base 10 as string. My `tobase` function calls too
Eduardo> many times `udivmod` (unsigned integer division with modulus)
Eduardo> for really large integers (8192 bits or so) and this make
Eduardo> tostring very slow for big numbers. If you have any ideas how
Eduardo> I could optimize that base conversion (or perhaps udivmod) I
Eduardo> would appreciate it.
There are two standard tricks for this (and you should use both).
First, don't repeatedly divide by the base. Instead, precompute powers
of the base (I'll assume 10 for example purposes): 10^2, 10^4, 10^8,
10^16, etc., up to a value that's close to (but less than) half of the
input bits. That way, the remainder after dividing by that largest size
will always fit in half the bits, and no more than three (often only
two) such fulllength divisions are needed to break the input into
convenient chunks, which you then divide into halves, etc.
Second, don't use division at all (except possibly for that first step,
if you don't want to have to deal with doing an N*N>2N bit multiply, or
a "high half" multiply which will suffice). Instead, do division by
way of multiplication by scaled reciprocal (which you can precompute).

Andrew.


>>>>> "Philippe" == Philippe Verdy < [hidden email]> writes:
Philippe> Peecomputing scaled reciprocal is unfeasible for large words.
Philippe> Your lookup table for example with 32bit words would have
Philippe> 2^32 entries (excluding the reciprocal of 0). That lookup
Philippe> would be enormous.
You only need to precompute the divisor, so only values B^(2^n) for a
small range of n and whatever values of B you actually want to optimize
for (usually only 10, since 8 and 16 are better handled with bit shifts
and using other bases is vanishingly rare).
Any compiler worth the name these days does this as a matter of course 
look at the code generated for, say, n / 100000000U in C where n is a
uint64_t, and you'll see it does a multiply by 12379400392853802749 and
a 90bit rightshift on the 128bit product. The multiplier there is
ceil(2^90 / 100000000), with the scaling factor chosen so that the
product requires no more than 128 bits and always truncates to the
correct result (this isn't possible for all divisors, but usually is for
ones that you care about).

Andrew.

