Some more measurements. The first 3 result columns are for division,
for gradually harder cases; B=0xcafebabedeadbeef. The last columns is
widening multiplication.
cpu type 17/3 B/4711 2^64/3 mul presumed algorithm
skylake 25 25 86 1 SRT-8
broadwell
t...@gmplib.org (Torbjörn Granlund) writes:
Last time I explored this, the division instructions (div/idiv) ran at
varying speed as long as the upper dividend word is zero. When it is,
the timing is rougly linear in either log2(dividend) or log2(quotient).
But the fastst time is still som
ni...@lysator.liu.se (Niels Möller) writes:
paul zimmermann writes:
> page 1: the division instruction is now much faster than before on modern
> processors
According to https://gmplib.org/~tege/x86-timing.pdf, they're still an
order of magnitute slower than multiplication. E.g 8
paul zimmermann writes:
> page 1: the division instruction is now much faster than before on modern
> processors
According to https://gmplib.org/~tege/x86-timing.pdf, they're still an
order of magnitute slower than multiplication. E.g 86 vs 3 cycles on
Intel skylake. And in addition, multip
Dear Niels,
> From: ni...@lysator.liu.se (Niels Möller)
> Date: Sun, 05 Aug 2018 09:59:46 +0200
>
> ni...@lysator.liu.se (Niels Möller) writes:
>
> > static inline mp_limb_t
> > divappr_2 (mp_limb_t n2, mp_limb_t n1,
> >mp_limb_t d1, mp_limb_t d0, mp_limb_t dinv)
> > {
> > mp_li