second one is mul_fft, thanks to the patch named
"mpn/generic/mul_fft.c: Use _bknp1," [...], available at
https://gmplib.org/repo/gmp/rev/f9cbcda05f7e
I attach an image (fakte+eble_18327.png), showing the relative speed of
the
code after/before the patch.
The green line represents the
Ciao Paul,
Il 2022-03-08 16:20 Paul Zimmermann ha scritto:
Uhm, the last line of code just before that ones is:
cc = mpn_sub_1 (r, r, m, cc) + 1;
/* add 1 to cc instead of rd since rd might overflow */
it seems you are right!
Well, I pushed a clean-up for that portion of the code too
Dear Marco,
> Uhm, the last line of code just before that ones is:
>
>cc = mpn_sub_1 (r, r, m, cc) + 1;
>/* add 1 to cc instead of rd since rd might overflow */
>
> So that I'd say that cc is 1 or 2.
> Then the case cc=2, m=n-1, r[m]=0, and rd=GMP_NUMB_MAX seems very very
> unlik
Ciao Paul,
Il 2022-03-08 12:56 Paul Zimmermann ha scritto:
Since you deeply know how this code works, I ask you one more
question.
The last lines of the function mpn_fft_mul_2exp_modF (branch m < n)
contain:
/* now subtract cc and rd from r[m..n] */
r[n] = -mpn_sub_1 (r + m, r + m, n -
Hi Marco,
> Since you deeply know how this code works, I ask you one more question.
> The last lines of the function mpn_fft_mul_2exp_modF (branch m < n)
> contain:
>
>/* now subtract cc and rd from r[m..n] */
>
>r[n] = -mpn_sub_1 (r + m, r + m, n - m, cc);
>r[n] -= mpn_sub_1
Ciao Paul,
Il 2022-03-07 10:28 Paul Zimmermann ha scritto:
Date: Sun, 06 Mar 2022 11:14:49 +0100
From: Marco Bodrato
Specifically I'd focus into a suspicious piece of code, shared by both
our current code and Vivien's.
if (cy >= 0)
cy = mpn_add_1 (tmp, tmp, Kl, cy);
Dear Marco,
> Date: Sun, 06 Mar 2022 11:14:49 +0100
> From: Marco Bodrato
>
> Ciao,
>
> I looked into the code published by Samuel Vivien.
> I realised that in mul_fft there are a lot of _add_1 and _sub_1. At
> least some of them can be easily replac
Ciao,
Il 2022-03-06 12:16 Torbjörn Granlund ha scritto:
Marco Bodrato writes:
The comment before the mpn_mul_fft_decompose function says "We must
have nl <= 2*K*l", this means that we should never have ((dif = nl -
Kl) > Kl), and the code in that branch should never be used.
I notic
Marco Bodrato writes:
This is not really needed to solve a bug.
The comment before the mpn_mul_fft_decompose function says "We must
have nl <= 2*K*l", this means that we should never have ((dif = nl -
Kl) > Kl), and the code in that branch should never be used.
I noticed that at some poi
Ciao,
I looked into the code published by Samuel Vivien.
I realised that in mul_fft there are a lot of _add_1 and _sub_1. At
least some of them can be easily replaced by _INCR_U or _DECR_U...
Specifically I'd focus into a suspicious piece of code, shared by both
our current code and Viv
Torbjörn Granlund writes:
> Unfortunately, x86 is not as easy. The subtract-with-borrow
> instructions clobber all flags, making the new adcx/adox pretty
> pointless. And no, there are no non-clobbering negation (or one's
> complement) instructions.
I revisited the x86 docs for another reason,
ni...@lysator.liu.se (Niels Möller) writes:
I think add_n_sub_n was originally motivated by improved locality (could
apply at different levels of memory hierarcy).
But maybe we could get close to twice the speed using newer instructions
with multiple carry flags (I guess that's what
pow
Paul Zimmermann writes:
> 1) the use of mpn_add_n_sub_n is not activated by default in mul_fft.c.
>It might give a small speedup in some cases.
I think add_n_sub_n was originally motivated by improved locality (could
apply at different levels of memory hierarcy).
But maybe we could get clos
Hi,
about https://gmplib.org/list-archives/gmp-devel/2021-June/005976.html,
the second part of this mail is nonsense, since --enable-old-fft-full
(which was introduced in revision 13152 on Dec 21 2009) has never been
working as expected, since revision 13145 on Dec 20 2009 disabled it
compl
Hi,
with Samuel Vivien (in cc) we noticed two things related to the FFT code:
1) the use of mpn_add_n_sub_n is not activated by default in mul_fft.c.
It might give a small speedup in some cases.
2) the previous code to multiply integers (mpn_mul_fft_full) which did
two multiplicatio
15 matches
Mail list logo