Re: dead code in div_q.c?

2018-04-20 Thread Niels Möller
paul zimmermann writes: > together with Raphaël Rieu-Hleft (in cc), we believe we have found some dead > code in > mpn/generic/div_q.c around lines 173-182: > > else if (UNLIKELY (qh != 0)) > { > /* This happens only when the quotient is close to B^n and

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Marc Glisse
On Fri, 20 Apr 2018, Marc Glisse wrote: On Fri, 20 Apr 2018, Marc Glisse wrote: On Fri, 20 Apr 2018, Marc Glisse wrote: On Fri, 20 Apr 2018, Vincent Lefevre wrote: On 2018-04-20 04:14:15 +0200, Fredrik Johansson wrote: For operands with 1-4 limbs, that is; on my machine, mpn_mul takes up

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Niels Möller
"Marco Bodrato" writes: > On the generic (or no-asm) side, we could at least swap the first branches > in mpn_mul. Currently we have: > > if (un == vn) > { > if (up == vp) > mpn_sqr (prodp, up, un); > else > mpn_mul_n (prodp, up, vp, un); > } > else if (vn < MU

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Victor Shoup
Interesting. I see that this paper compares to NTL as well. I spent the morning seeing what I could do to improve the situation for NTL, whose mul routine has essentially the same functionality as mpz_mul (takes care of memory allocation and signs). I reduced some overheads and for small inputs ju

Re: Lazy mpz allocation

2018-04-20 Thread Vincent Lefevre
On 2018-04-20 20:26:05 +0200, Vincent Lefevre wrote: > Yes, but this just means that the user must not call these > functions in such a case. But he can do some work before > calling these functions. In particular, mpq_numref and > mpq_denref should work. In particular, the user can write wrappers

Re: Lazy mpz allocation

2018-04-20 Thread Vincent Lefevre
On 2018-04-20 18:29:55 +0200, Trevor Spiteri wrote: > >>> Only 0 can have lazy allocation, and I think we document that it isn't > >>> legal to put 0 on the denominator. > >> where is this documented? > > That was in a "I think" sentence. Now that I looked a bit more, I don't > > find it... Well,

Re: Lazy mpz allocation

2018-04-20 Thread Trevor Spiteri
>>> Only 0 can have lazy allocation, and I think we document that it isn't >>> legal to put 0 on the denominator. >> where is this documented? > That was in a "I think" sentence. Now that I looked a bit more, I don't > find it... Well, you can't call any mpq function that reads that mpq_t, > but

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Marco Bodrato
Ciao, Il Ven, 20 Aprile 2018 7:36 pm, Marc Glisse ha scritto: > On Fri, 20 Apr 2018, Marco Bodrato wrote: >> Il Ven, 20 Aprile 2018 12:39 pm, Marc Glisse ha scritto: >>> there is, the timings are: >>> >>> mpn_mul: .56 >>> mpn_mul_n: .36 >>> mpn_mul_basecase: .16 >> >> Did you try also the document

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Marco Bodrato
Ciao, Il Ven, 20 Aprile 2018 1:38 pm, Torbjörn Granlund ha scritto: > ni...@lysator.liu.se (Niels Möller) writes: > Fredrik Johansson writes: > > > It would be possible to have mpn_mul itself assembly-coded to do > something > > like this: > > > > case 1x1: ... > > case 2x1: ... > >

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Marc Glisse
On Fri, 20 Apr 2018, Marco Bodrato wrote: Il Ven, 20 Aprile 2018 12:39 pm, Marc Glisse ha scritto: I just tried (LTO+PGO) on a trivial testcase, and gcc didn't manage to do anything clever with it. Doing it by hand to see how much potential gain there is, the timings are: mpn_mul: .56 mpn_mul_

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Marco Bodrato
Ciao, Il Ven, 20 Aprile 2018 12:39 pm, Marc Glisse ha scritto: > I just tried (LTO+PGO) on a trivial testcase, and gcc didn't manage to do > anything clever with it. Doing it by hand to see how much potential gain > there is, the timings are: > > mpn_mul: .56 > mpn_mul_n: .36 > mpn_mul_basecase: .

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Marc Glisse
On Fri, 20 Apr 2018, Marc Glisse wrote: On Fri, 20 Apr 2018, Marc Glisse wrote: On Fri, 20 Apr 2018, Vincent Lefevre wrote: On 2018-04-20 04:14:15 +0200, Fredrik Johansson wrote: For operands with 1-4 limbs, that is; on my machine, mpn_mul takes up to twice as long as mpn_mul_basecase, and

dead code in div_q.c?

2018-04-20 Thread paul zimmermann
Hi, together with Raphaël Rieu-Hleft (in cc), we believe we have found some dead code in mpn/generic/div_q.c around lines 173-182: else if (UNLIKELY (qh != 0)) { /* This happens only when the quotient is close to B^n and mpn_*_d

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Fredrik Johansson writes: > It would be possible to have mpn_mul itself assembly-coded to do something > like this: > > case 1x1: ... > case 2x1: ... > case 2x2: ... > generic case, small n: (basecase loop) > generic case, large n: (f

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Niels Möller
paul zimmermann writes: >Niels, > >> Such an assembly routine would need access to the threshold between >> basecase and generic, which in the case of fat builds isn't a compile >> time constant. > > but you could determine at compile time a lower bound for fat builds, no? If it's too i

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread paul zimmermann
Niels, > Such an assembly routine would need access to the threshold between > basecase and generic, which in the case of fat builds isn't a compile > time constant. but you could determine at compile time a lower bound for fat builds, no? Paul ___

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Niels Möller
Fredrik Johansson writes: > It would be possible to have mpn_mul itself assembly-coded to do something > like this: > > case 1x1: ... > case 2x1: ... > case 2x2: ... > generic case, small n: (basecase loop) > generic case, large n: (fall back to calling an mpn_mul_generic function > that selects

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Marc Glisse
On Fri, 20 Apr 2018, Marc Glisse wrote: On Fri, 20 Apr 2018, Vincent Lefevre wrote: On 2018-04-20 04:14:15 +0200, Fredrik Johansson wrote: For operands with 1-4 limbs, that is; on my machine, mpn_mul takes up to twice as long as mpn_mul_basecase, and inline assembly for 1x1, 2x1 or 2x2 multip

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Marc Glisse
On Fri, 20 Apr 2018, Vincent Lefevre wrote: On 2018-04-20 04:14:15 +0200, Fredrik Johansson wrote: For operands with 1-4 limbs, that is; on my machine, mpn_mul takes up to twice as long as mpn_mul_basecase, and inline assembly for 1x1, 2x1 or 2x2 multiplication is even faster. The problem is th

Re: mpn_mul is embarrassingly slow

2018-04-20 Thread Vincent Lefevre
On 2018-04-20 04:14:15 +0200, Fredrik Johansson wrote: > For operands with 1-4 limbs, that is; on my machine, mpn_mul takes up to > twice as long as mpn_mul_basecase, and inline assembly for 1x1, 2x1 or 2x2 > multiplication is even faster. The problem is that there are three function > calls (mpn_m

Re: Lazy mpz allocation

2018-04-20 Thread Marc Glisse
On Fri, 20 Apr 2018, paul zimmermann wrote: Only 0 can have lazy allocation, and I think we document that it isn't legal to put 0 on the denominator. where is this documented? That was in a "I think" sentence. Now that I looked a bit more, I don't find it... Well, you can't call any mpq fun

Re: Lazy mpz allocation

2018-04-20 Thread paul zimmermann
> Only 0 can have lazy allocation, and I think we document that it isn't > legal to put 0 on the denominator. where is this documented? In mpfr_set_q we use the fact that the user can set q to 1/0 for example to represent +Inf. Paul ___ gmp-devel mail

mpn_mul is embarrassingly slow

2018-04-20 Thread Fredrik Johansson
For operands with 1-4 limbs, that is; on my machine, mpn_mul takes up to twice as long as mpn_mul_basecase, and inline assembly for 1x1, 2x1 or 2x2 multiplication is even faster. The problem is that there are three function calls (mpn_mul -> mpn_mul_n -> mpn_mul_basecase) + branches between the use

Re: Lazy mpz allocation

2018-04-20 Thread Marc Glisse
On Fri, 20 Apr 2018, Marco Bodrato wrote: Ciao, Il Gio, 19 Aprile 2018 4:37 pm, Marc Glisse ha scritto: I finally pushed it. It seemed unsafe to keep mpq unaware of lazy allocation, in case people start swapping the numerator of a rational with a lazy 0 integer or something like that. If we