Use on large mul_fft [was: New mulmod_bknp1]

2022-04-16 Thread Marco Bodrato
second one is mul_fft, thanks to the patch named "mpn/generic/mul_fft.c: Use _bknp1," [...], available at https://gmplib.org/repo/gmp/rev/f9cbcda05f7e I attach an image (fakte+eble_18327.png), showing the relative speed of the code after/before the patch. The green line represents the

Re: mul_fft, cleaning up some details of the code

2022-03-08 Thread Marco Bodrato
Ciao Paul, Il 2022-03-08 16:20 Paul Zimmermann ha scritto: Uhm, the last line of code just before that ones is: cc = mpn_sub_1 (r, r, m, cc) + 1; /* add 1 to cc instead of rd since rd might overflow */ it seems you are right! Well, I pushed a clean-up for that portion of the code too

Re: mul_fft, cleaning up some details of the code

2022-03-08 Thread Paul Zimmermann
Dear Marco, > Uhm, the last line of code just before that ones is: > >cc = mpn_sub_1 (r, r, m, cc) + 1; >/* add 1 to cc instead of rd since rd might overflow */ > > So that I'd say that cc is 1 or 2. > Then the case cc=2, m=n-1, r[m]=0, and rd=GMP_NUMB_MAX seems very very > unlik

Re: mul_fft, cleaning up some details of the code

2022-03-08 Thread Marco Bodrato
Ciao Paul, Il 2022-03-08 12:56 Paul Zimmermann ha scritto: Since you deeply know how this code works, I ask you one more question. The last lines of the function mpn_fft_mul_2exp_modF (branch m < n) contain: /* now subtract cc and rd from r[m..n] */ r[n] = -mpn_sub_1 (r + m, r + m, n -

Re: mul_fft, cleaning up some details of the code

2022-03-08 Thread Paul Zimmermann
Hi Marco, > Since you deeply know how this code works, I ask you one more question. > The last lines of the function mpn_fft_mul_2exp_modF (branch m < n) > contain: > >/* now subtract cc and rd from r[m..n] */ > >r[n] = -mpn_sub_1 (r + m, r + m, n - m, cc); >r[n] -= mpn_sub_1

Re: mul_fft, cleaning up some details of the code

2022-03-08 Thread Marco Bodrato
Ciao Paul, Il 2022-03-07 10:28 Paul Zimmermann ha scritto: Date: Sun, 06 Mar 2022 11:14:49 +0100 From: Marco Bodrato Specifically I'd focus into a suspicious piece of code, shared by both our current code and Vivien's. if (cy >= 0) cy = mpn_add_1 (tmp, tmp, Kl, cy);

Re: mul_fft, cleaning up some details of the code

2022-03-07 Thread Paul Zimmermann
Dear Marco, > Date: Sun, 06 Mar 2022 11:14:49 +0100 > From: Marco Bodrato > > Ciao, > > I looked into the code published by Samuel Vivien. > I realised that in mul_fft there are a lot of _add_1 and _sub_1. At > least some of them can be easily replac

Re: mul_fft, cleaning up some details of the code

2022-03-06 Thread Marco Bodrato
Ciao, Il 2022-03-06 12:16 Torbjörn Granlund ha scritto: Marco Bodrato writes: The comment before the mpn_mul_fft_decompose function says "We must have nl <= 2*K*l", this means that we should never have ((dif = nl - Kl) > Kl), and the code in that branch should never be used. I notic

Re: mul_fft, cleaning up some details of the code

2022-03-06 Thread Torbjörn Granlund
Marco Bodrato writes: This is not really needed to solve a bug. The comment before the mpn_mul_fft_decompose function says "We must have nl <= 2*K*l", this means that we should never have ((dif = nl - Kl) > Kl), and the code in that branch should never be used. I noticed that at some poi

mul_fft, cleaning up some details of the code

2022-03-06 Thread Marco Bodrato
Ciao, I looked into the code published by Samuel Vivien. I realised that in mul_fft there are a lot of _add_1 and _sub_1. At least some of them can be easily replaced by _INCR_U or _DECR_U... Specifically I'd focus into a suspicious piece of code, shared by both our current code and Viv

Re: mul_fft

2021-10-06 Thread Niels Möller
Torbjörn Granlund writes: > Unfortunately, x86 is not as easy. The subtract-with-borrow > instructions clobber all flags, making the new adcx/adox pretty > pointless. And no, there are no non-clobbering negation (or one's > complement) instructions. I revisited the x86 docs for another reason,

Re: mul_fft

2021-06-30 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I think add_n_sub_n was originally motivated by improved locality (could apply at different levels of memory hierarcy). But maybe we could get close to twice the speed using newer instructions with multiple carry flags (I guess that's what pow

Re: mul_fft

2021-06-30 Thread Niels Möller
Paul Zimmermann writes: > 1) the use of mpn_add_n_sub_n is not activated by default in mul_fft.c. >It might give a small speedup in some cases. I think add_n_sub_n was originally motivated by improved locality (could apply at different levels of memory hierarcy). But maybe we could get clos

mul_fft

2021-06-30 Thread Paul Zimmermann
Hi, about https://gmplib.org/list-archives/gmp-devel/2021-June/005976.html, the second part of this mail is nonsense, since --enable-old-fft-full (which was introduced in revision 13152 on Dec 21 2009) has never been working as expected, since revision 13145 on Dec 20 2009 disabled it compl

mul_fft

2021-06-16 Thread Paul Zimmermann
Hi, with Samuel Vivien (in cc) we noticed two things related to the FFT code: 1) the use of mpn_add_n_sub_n is not activated by default in mul_fft.c. It might give a small speedup in some cases. 2) the previous code to multiply integers (mpn_mul_fft_full) which did two multiplicatio