Dear Torbjörn,
> yes, see Section 2.4.2 of Modern Computer Arithmetic, where we call it
> "Montgomery-Svoboda". The quotient selection becomes trivial, which means
> one can reduce the latency between two mpn_addmul_1 calls.
>
> It really becomes a mul_basecase except that the first
Paul Zimmermann writes:
yes, see Section 2.4.2 of Modern Computer Arithmetic, where we call it
"Montgomery-Svoboda". The quotient selection becomes trivial, which means
one can reduce the latency between two mpn_addmul_1 calls.
It really becomes a mul_basecase except that the first round t
Dear Torbjörn,
> IIRC, Svoboda's division trick for N/D is to find a small multiplier m
> such that, for the division mN/(mD) we have mD = 10...XX... with the
> high limb of mD being 1000...0.
>
> This idea works also for 2-adic division. Find m = D^(-1) mod \beta
> where \beta is the
IIRC, Svoboda's division trick for N/D is to find a small multiplier m
such that, for the division mN/(mD) we have mD = 10...XX... with the
high limb of mD being 1000...0.
This idea works also for 2-adic division. Find m = D^(-1) mod \beta
where \beta is the lomb base. Then do mN/(mD) or mN