Re: mpn_mul is embarrassingly slow

2018-04-24 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > ni...@lysator.liu.se (Niels Möller) writes: > > I would prefer the opposite change for GMP7, to have all multiplication > functions return, but *not* store, the high limb of the product. Which > also should work nicely with tail calls. > > I beli

Re: mpn_mul is embarrassingly slow

2018-04-24 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I would prefer the opposite change for GMP7, to have all multiplication functions return, but *not* store, the high limb of the product. Which also should work nicely with tail calls. I believe this would work nicely for mul_basecase but not for t

Re: mpn_mul is embarrassingly slow

2018-04-24 Thread Victor Shoup
Well, if you do, please change the name of the function backward compatibility... > On Apr 24, 2018, at 9:49 AM, Niels Möller wrote: > > I would prefer the opposite change for GMP7, to have all multiplication > functions return, but *not* store, the high limb of the product. Which > also sho

Re: mpn_mul is embarrassingly slow

2018-04-24 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > What do you think about this stopgap change? The idea is to speed up > small operands, adding very little overhead to larger operands. Looks reasonable to me. Assuming speedup for small operands is measurable. > (We should really get rid of mpn_mul'

Re: mpn_mul is embarrassingly slow

2018-04-24 Thread Vincent Lefevre
On 2018-04-24 14:11:34 +0200, paul zimmermann wrote: >Dear Torbjörn, > > > What do you think about this stopgap change? > > I would entirely drop all the squaring-related stuff from mpn_mul: > the user/developer should call mpn_sqr instead (see my previous mail). It is not clear that the

Re: mpn_mul is embarrassingly slow

2018-04-24 Thread Torbjörn Granlund
paul zimmermann writes: I would entirely drop all the squaring-related stuff from mpn_mul: the user/developer should call mpn_sqr instead (see my previous mail). That's tempting, but a 30% slowdown on some user code would not be nice. (I believe it is vn and not un that should be compared

Re: MPN_FILL vs MPN_ZERO

2018-04-24 Thread Vincent Lefevre
On 2018-04-24 12:46:27 +0200, paul zimmermann wrote: > I understand MPN_ZERO(p,n) is implemented as "if (n) MPN_FILL(p, n, 0)". > > Then in case MPN_FILL is implemented using memset, since memset also checks > for the case n=0, MPN_ZERO(p,n) will perform two tests for n > 0. Why not > directly cal

Re: mpn_mul is embarrassingly slow

2018-04-24 Thread Victor Shoup
I'm not sure that's a great idea from a backward compatibility point of view. Also: when exactly was mpn_sqr added to the public interface? That's something I'll have to take into account in writing GMP client code. > On Apr 24, 2018, at 8:11 AM, paul zimmermann wrote: > > Dear Torbjörn

Re: mpn_mul is embarrassingly slow

2018-04-24 Thread paul zimmermann
Dear Torbjörn, > What do you think about this stopgap change? I would entirely drop all the squaring-related stuff from mpn_mul: the user/developer should call mpn_sqr instead (see my previous mail). Then the code would become: if (BELOW_THRESHOLD (vn, MUL_TOOM22_THRESHOLD)) {

Re: mpn_mul is embarrassingly slow

2018-04-24 Thread paul zimmermann
> It is surely silly that we don't have any mpn call for when it is known > that the multiplier and multiplicand are distinct. but now that mpn_sqr is in the GMP interface (since GMP 6 I guess), why check for {up,un} = {vp,vn} in mpn_mul? Shouldn't the user or the GMP developer call mpn_sqr direct

Re: mpn_mul is embarrassingly slow

2018-04-24 Thread Torbjörn Granlund
What do you think about this stopgap change? The idea is to speed up small operands, adding very little overhead to larger operands. (We should really get rid of mpn_mul's return value for the slightly incompatible GMP 7; that will allow cheap tail calls here.) *** /tmp/extdiff.rFSg_D/gmp-main.0

MPN_FILL vs MPN_ZERO

2018-04-24 Thread paul zimmermann
Hi, I just discovered the internal macro MPN_FILL. I understand MPN_ZERO(p,n) is implemented as "if (n) MPN_FILL(p, n, 0)". Then in case MPN_FILL is implemented using memset, since memset also checks for the case n=0, MPN_ZERO(p,n) will perform two tests for n > 0. Why not directly call M