Re: div_qr_1 interface

2013-10-20 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: I think x86-64, x86-32, arm32, arm64, powerpc-64, sparc-64 matter. Unfortunately, powerpc-64 (and -32) return these types onto the stack via an implicit pointer. Ok, I think I'll stick to

Re: div_qr_1 interface

2013-10-20 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: I'm about to push the first step, with C implementations of mpn_div_qr_1 and mpn_div_qr_1n_pi1. Done now, including some tuning code. It would be interesting to have DIV_QR_1N_PI1_METHOD DIV_QR_1_NORM_THRESHOLD DIV_QR_1_UNNORM_THRESHOLD added

Re: div_qr_1 interface

2013-10-20 Thread Niels Möller
Torbjorn Granlund t...@gmplib.org writes: Which tail call? In the normalized case, the checked in mpn_div_qr_1 does something like *qh = ...; ... return mpn_div_qr_1n_pi1(...); Which is a nice tail call. With the struct-returning version one gets instead res.qh = ...; ... res.r =

Re: div_qr_1 interface

2013-10-20 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: I'll try to get the x86_64 assembly for mpn_div_qr_1n_pi1 in soon. Pushed first working version now, see http://gmplib.org:8000/gmp/file/tip/mpn/x86_64/div_qr_1n_pi1.asm On my core2 laptop: $ ./speed -s 2-10,100,500 -C

Re: div_qr_1 interface

2013-10-20 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: On my core2 laptop: $ ./speed -s 2-10,100,500 -C mpn_divrem_1.0x mpn_div_qr_1.0x overhead 6.13 cycles, precision 1 units of 8.33e-10 secs, CPU freq 1200.00 MHz mpn_divrem_1.0x