C++ long long

2021-06-06 Thread Marc Glisse
Hello, I am tempted to go with something like the attached patch to support long long in gmpxx.h. (the patch is not quite ready) Essentially, it adds a way to build mpz_class from long long, and for all other operations, long long is converted to long if it fits and to mpz_class otherwise. So

Re: div_qr_1n_pi1

2021-06-06 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: And I don't quite trust these cycle numbers, they should probably be twice as large, on the order of 10 cycles/limb for all variants. Less than 5 cycles is too good to be true, right? Yes. "Turbo" messes things up. The TSC cycle counterstays it

Re: div_qr_1n_pi1

2021-06-06 Thread Torbjörn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Maybe we should have some macrology for that? Or do all relevant processors and compilers support efficient cmov these days? I'm sticking to masking expressions for now. Let's not trust results from compiler generated code for these things. The mi

Re: div_qr_1n_pi1

2021-06-06 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > $ ./speed -p 100 -s 2-20 -C mpn_div_qr_1n_pi1.0x8765432108765432 > mpn_div_qr_1n_pi1_1.0x8765432108765432 mpn_div_qr_1n_pi1_2.0x8765432108765432 > mpn_div_qr_1n_pi1_3.0x8765432108765432 mpn_div_qr_1n_pi1_4.0x8765432108765432 > overhead 2.63 cycle

Re: div_qr_1n_pi1

2021-06-06 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > You're idea of conditonally adding the invariant d * B2 at the right > place is also interesting, I've tried it out. Works nicely, but no speedup on my machine. I'm attaching another patch. There are then 4 methods: method 1: Old loop around udiv_qrn

Re: div_qr_1n_pi1

2021-06-06 Thread Niels Möller
Marco Bodrato writes: > Using masks does not always give the fastest code. I tried the > following variation on Niels' code, and, on my laptop with "g++-10 -O2 > -mtune=icelake-client -march=icelake-client", the resulting code is > comparable (faster?) with the current asm. Maybe we should have