Re: How to calculate cycles/limb in assembly routines

2024-04-04 Thread Niels Möller
by multiplier throughput (O(N^2)); it should be possible to arrange the order you add up the N^2 terms so that your carry chain corresponds to the size of the product (O(N)). Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email

Re: Options for x86_64 "versions" for system types

2024-03-07 Thread Niels Möller
pu-vendor-os. I'd expect GMP to support any kind of cpu names supported and encouraged by autoconf, plus the more specific cpu model names recognized by gmp. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is

Re: Has there been historical work done to investigate small integer optimization?

2024-02-11 Thread Niels Möller
the tag bits (which is fine, because pointers usually require some alignment anyway). To reason about potential gains, you'd need to keep in mind the pretty high cost of a mis-predicted branch, compared to the cost of a memory access, for relevant assumptions on cachedness. Regards, /Ni

Re: Tuneup using size_ratio

2023-12-27 Thread Niels Möller
change in a few days. Have a nice Christmas! Pushed now. > May it be useful to add also the "/r" option for mpn_nussbaumer_mul? Makes sense, but not yet done. > And now, have a happy new year! Happy new year! Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1

Re: Tuneup using size_ratio

2023-12-27 Thread Niels Möller
PTIONAL }, >   { "mpn_toom54_mul",    speed_mpn_toom54_mul, FLAG_SR_OPTIONAL }, Good catch, adding those. I intend to push this change in a few days. Have a nice Christmas! Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF3

Tuneup using size_ratio (was: Re: What's a reasonable size ratio for toom32?)

2023-12-19 Thread Niels Möller
L_TOOM42_TO_TOOM53_THRESHOLD"; param.min_size = MPN_TOOM53_MUL_MINSIZE * 20 / 11; one (&thres, ¶m); mul_toom42_to_toom53_threshold = thres * 11 / 20; print_define ("MUL_TOOM42_TO_TOOM53_THRESHOLD", mul_toom42_to_toom53_threshold); + s.size_ratio = 0.5; param.func

Re: What's a reasonable size ratio for toom32?

2023-11-15 Thread Niels Möller
\ + if (size1 < 0) size1 = -size1 - s->size;\ + } \ SPEED_RESTRICT_COND (size1 >= 1); \ SPEED_RESTRICT_C

Re: What's a reasonable size ratio for toom32?

2023-10-15 Thread Niels Möller
Torbjörn Granlund writes: > Niels Möller writes: > > Pushed now. I've done some benchmarks on shell, on tip-of-tree GMP (no > local changes). See numbers at the end of this message, comparing > mpn_mul_basecase, mpn_toom22_mul and mpn_toom32_mul. > > Every sin

Re: What's a reasonable size ratio for toom32?

2023-10-11 Thread Niels Möller
Niels Möller writes: > See patch below. Makes both toom22 and toom32 recurse only to themselves > (or to schoolbook), and use provided scratch space, without any > additional allocations. Passes make check (with asserts enabled) on my > machine, but I probably don't hit the c

Re: What's a reasonable size ratio for toom32?

2023-09-27 Thread Niels Möller
Niels Möller writes: > Niels Möller writes: > >> And now I found that speed's handling of toom22, toom32 and friends >> always use fixed ratios, so I have to extend that to be able to get any >> interesting benchmarks when varying the ratio. > > Below patch

Re: What's a reasonable size ratio for toom32?

2023-09-27 Thread Niels Möller
Niels Möller writes: > And now I found that speed's handling of toom22, toom32 and friends > always use fixed ratios, so I have to extend that to be able to get any > interesting benchmarks when varying the ratio. Below patch seems to work. Most subtle part was to specify re

Re: What's a reasonable size ratio for toom32?

2023-09-27 Thread Niels Möller
Niels Möller writes: > Paul Zimmermann writes: > >>Dear Niels, >> >> ./speed -p 100 -c -s 10-200 -f1.1 mpn_mul.0.6 would be more readable, >> although the change in speed.h would be larger. > > See below patch, to support mpn_mul/0.6. Example ou

Re: What's a reasonable size ratio for toom32?

2023-09-18 Thread Niels Möller
\ +if (s->size_ratio > 0.0) \ + size1 = s->size_ratio * s->size; \ +else \ + { \ + size1 = (s->r == 0 ? s->size : s->r); \ + if (size1 < 0) size1 = -size1 - s->size;\ + } \ SPEED_RESTRICT_COND (size1 >= 1); \ SPEED_RESTRICT_COND (s->size >= size1);\ \ -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: What's a reasonable size ratio for toom32?

2023-09-14 Thread Niels Möller
Niels Möller writes: > I'm having some difficulty with the intuition, > perhaps it would help to draw the diagrams of ratio for unbalanced sub > product as a function of the input ratio, for both toom22 and toom32 I've done a bit of the math. If input ratio is 1/2 <= r

Re: What's a reasonable size ratio for toom32?

2023-09-14 Thread Niels Möller
Niels Möller writes: > Nice speedup for some sices, in particular 15. Some notable regressions, > in particular size 30x24 and 42x21, if I read the table correctly. > > To understand what's going on, one might need to separate the rather > small changes to algorithm selec

Re: What's a reasonable size ratio for toom32?

2023-08-24 Thread Niels Möller
f I read the table correctly. To understand what's going on, one might need to separate the rather small changes to algorithm selection, to the reduced toom32 overhead. But overall, rather small changes. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.

Re: What's a reasonable size ratio for toom32?

2023-08-23 Thread Niels Möller
1 947193.20 103 8372.21 113 9886.82 124 11556.88 136 13422.05 149 16084.65 163 17925.33 179 20106.22 196 23446.72 Does that make sense? Or can you suggest some other interface to specify a fixed ratio? Regards, /Niels

Re: [PATCH] Revert "Move popcount and hamdist back from z14 to z13 after needed edits."

2023-08-03 Thread Niels Möller
me pseudo op that can be used to tell the assembler that the file targets z13, and get errors at compile time for unavailable instructions? Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale govern

Re: What's a reasonable size ratio for toom32?

2023-07-25 Thread Niels Möller
Niels Möller writes: > Bottom line: If mpn_mul is fixed to not call toom32 in the above range > (toom22 could be a reasonable replacement), and stricter ranges for > toom22 and toom32 are properly documented, then we can change toom32 to > not recurse using the top-level mpn_mul. A

Re: Why different runtimes for constant exponent and (huge) modulus for mpz_powm()?

2023-07-25 Thread Niels Möller
etimes worse. Maybe it would be helpful to monitor performance related measurements such as cpu frequency (or cycle counter), temperature, and count of cache misses as computation progresses? Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is s

What's a reasonable size ratio for toom32?

2023-07-21 Thread Niels Möller
a reasonable replacement), and stricter ranges for toom22 and toom32 are properly documented, then we can change toom32 to not recurse using the top-level mpn_mul. And it should also get a lot easier to adapt toom22 and toom32 to the itch/scratch convention, if we want to go that way. Regards

Re: Requests from Microsoft IP Addresses

2023-06-18 Thread Niels Möller
as indeed the root cause. (And I would hope github scripts run behind some kind of http cache; if appropriate environment variables are set, wget would use that by default). Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to w

Re: Requests from Microsoft IP Addresses

2023-06-17 Thread Niels Möller
equire any CPU compression work per request)? Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: Status of latest release for macos and arm64?

2023-05-10 Thread Niels Möller
Torbjörn Granlund writes: > I think we should perhaps releae a 6.3 instead. Makes a lot of sense too. /nisse -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. ___

Status of latest release for macos and arm64?

2023-05-10 Thread Niels Möller
7;m thinking that the x18 issue is one possible explanation). Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmpl

Paper with an improved bound for half-gcd matrix sizes

2023-02-08 Thread Niels Möller
much smaller than the expected M->n + M1->n. */ and it clarifies some of the subtler details. Thanks to Paul for helping out in getting the paper published, and suggesting publishing on HAL. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Intern

Re: Fast constant-time gcd computation and modular inversion

2022-09-29 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > I've now implemented inverse too. And now I've tried a different variant, using redc for the cofactor updates. Cofactors are now canonically reduced (which seems unexpectedly cheap). In the case that m is not normalized, so that 2m fi

Re: Fast constant-time gcd computation and modular inversion

2022-08-31 Thread Niels Möller
addition chain) as a side effect of implementing square root, since in some cases they can share much of the addition chain, and that work touched field prime arithmetic only. Sorry we're getting a bit off topic, we should take nettle discussion elsewhere. Regards, /Niels -- Ni

Re: Fast constant-time gcd computation and modular inversion

2022-08-31 Thread Niels Möller
mod the group order, as needed by ecdsa. Except for the somewhat obscure gost curves (Russian standards), where sec_invert is used for all inverses, and A/B for gost-gc512a is almost 30%. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF

Re: Fast constant-time gcd computation and modular inversion

2022-08-30 Thread Niels Möller
Torbjörn Granlund writes: > ni...@lysator.liu.se (Niels Möller) writes: > > count = (49 * bits + 57) / 17; > > Odd. For sure. This isn't based on local progress of the algorithm (there ain't no guaranteed progress for a short sequence of reduction steps), bu

Re: Fast constant-time gcd computation and modular inversion

2022-08-24 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > If we require the largest matrix element, 2^k, to fit in a 64-bit > signed, we can have at most k = 62. I've implemented a first version doing GMP_LIMB_BITS-2 bits at a time. For a start, gcd only, not cofactors/inverse. Code appended b

Re: Fast constant-time gcd computation and modular inversion

2022-08-23 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > Then 96 bits seems to be the maximum number of low-end bits that can be > eliminated, under the constraint that elements of the corresponding > transformation matrix should fit in 64-bit (signed) limbs. I had another look into this (tha

Re: Fast constant-time gcd computation and modular inversion

2022-06-06 Thread Niels Möller
Torbjörn Granlund writes: > ni...@lysator.liu.se (Niels Möller) writes: > > Extract least significant 96 bits of each number. > > Is that 3 32-bit limbs or 1.5 64-bit limbs? I was thinking about 64-bit archs. Then 96 bits seems to be the maximum number of low-end bits that ca

Re: Fast constant-time gcd computation and modular inversion

2022-06-06 Thread Niels Möller
e eliminated with some final processing (or maybe treated specially, in case numbers are in redc form). Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. ___

Re: Fast constant-time gcd computation and modular inversion

2022-05-29 Thread Niels Möller
side-channel silent extraction of top bits, floor (a / 2^{n-k-1}) a bit tricky, since the way these bits straddles a boundaries becomes data dependent. And the need for the conditional negations (lines 18 -- 21) seems related to rounding errors from ignored middle bits. Regar

Re: Fast constant-time gcd computation and modular inversion

2022-05-24 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > The new algorithm in the above paper uses a small auxillary state > variable d (or \delta in the paper) to decide the update in the case > both a, b are odd. If I write the iteration step using the same notation > as above, it'

Fast constant-time gcd computation and modular inversion

2022-05-24 Thread Niels Möller
ess is not clear to me (I haven't tried to read all the proofs), but I guess d should be understood as an approximation of bitsize(b) - bitsize(a) As I read the claimed bounds, if initial a and b are \ell bits, then there's a bound m, close to 2.88 \ell, guaran

Re: stronglucas.c

2022-05-17 Thread Niels Möller
agree your change improves readability a bit. > It's also possible lines 122-124 should also be indented Indentation looks right to me, at the current place. But perhaps more consistent with nearby code if comments are moved below the "else if" line (and then indented accordingly

Bignums anno 1864

2022-04-24 Thread Niels Möller
"Supposing the velocity of the moving parts of the engine be not greater then forty feet per minute" (0.2 m/s) was one second per addition, one minute per multiplication, all operating on 50-digit values. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Inte

Re: Use on small mulmod_bnm1 [was: New mulmod_bknp1]

2022-04-20 Thread Niels Möller
hird split in mulmod_bnm1 seems to pay off measurably. So just rounding up to a multiple of 24 = 2^3 * 3 might be a reasonable initial strategy (above some threshold of a likely few hundred limbs). Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet ema

Re: Use on small mulmod_bnm1 [was: New mulmod_bknp1]

2022-04-19 Thread Niels Möller
, but not sure if that's possible. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: New mulmod_bknp1

2022-02-23 Thread Niels Möller
xceeds one limb. Mod by constant single limb is pretty fast in GMP. Regards, /Niels -- Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp

Re: New mulmod_bknp1

2022-02-22 Thread Niels Möller
(at least not on a B boundary). If we go up to B^72-1, this would split as (B^36+1)(B^18+1)(B^9+1)(B^9-1), where B^9-1 can't be easily split, but the other factors are of the requied form B^{3n}+1. And we also have mpn_fft_next_size, but I'm not familiar

Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-08 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: >> dnl AMD64 mpn_addsubmul_1msb0, R = Au - Bv, u,v < 2^63. > > This comment obviously wrong ;-) > > But that function could be implemented by adding two "not %rdx" in the > right places of the loop, plus small adju

Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-08 Thread Niels Möller
l_1 on my machine. I'd prefer to next have a look at addsubmul_1msb0, before trying to optimize this loop further. diff -r 3dee30523768 ChangeLog --- a/ChangeLog Fri Oct 08 16:05:05 2021 +0200 +++ b/ChangeLog Fri Oct 08 16:41:04 2021 +0200 @@ -1,5 +1,8 @@ 2021-10-08 Niels Möller + * mpn/x

Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-07 Thread Niels Möller
adoxc0, l0 adoxhi, c1 adc l0, l1 mov l1, (rp, n, 8) inc n mov (ap, n, 8), %rdx mulxu0, l0, hi mov (bp, n, 8), %rdx mulxv0, l1, c0 adoxc1, l0 adoxhi, c0 adc l0, l1

Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-06 Thread Niels Möller
\ + } /* For mpn_mul, mpn_mul_basecase, xsize=r, ysize=s->size. */ #define SPEED_ROUTINE_MPN_MUL(function) \ -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale govern

Re: mul_fft

2021-10-06 Thread Niels Möller
t clobbering flags, one would have to combine not and cmov. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org ht

Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-06 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > If we have adox/adcx, use same strategy as suggested for > addaddmul_1msb0, but subtract rather than add in the chain with long > lived carry. Here's a sketch of a loop, that should work for both addaddmul_1msb0 and addsubmul

Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-06 Thread Niels Möller
e to 1 c/l. That's the fun thing with GMP, there's always ways to improve the code you wrote some years ago ;-) Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _

Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-06 Thread Niels Möller
Marco Bodrato writes: > I agree. Let's choose between the two last versions, and I'll push it. Nice with a few instructions less, feel free to push the version you like best. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email

Re: Risc V greatly underperforms

2021-10-06 Thread Niels Möller
s a lot more complexity than super-scalarity. And on top of that, there's also a severe penalty on carry *input*. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. __

Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-05 Thread Niels Möller
> EPILOGUE() So I think your version is an improvement as is, and perhaps not worth the effort to try to eliminate a few more instructions if this rather obscure function. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesal

Carry propagation (was: Re: Risc V greatly underperforms)

2021-10-04 Thread Niels Möller
Seems quite unlikely to be a win in software, but might be useful if it's possible to compute lots of (s, g, p) items in a single cycle, using very wide SIMD registers and/or very wide super-scalar architecture. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 36

Re: Please update addaddmul_1msb0.asm to support ABI in mingw64

2021-10-04 Thread Niels Möller
ll part of the complete gcd computation). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: Risc V greatly underperforms

2021-09-20 Thread Niels Möller
nstructions. (My toy architecture does have a single condition flag, which can be used as carry, see https://git.lysator.liu.se/nisse/instr16). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to

Polynomials over Z_2?

2021-09-09 Thread Niels Möller
lear to me what applications there are for arithmetic on *large* polynomials over Z_2, though. The applications I'm aware of, in crypto and coding, all use a modulo polynomial of quite modest size, to representing GF(2^n) for n at most a few hundred. Regards, /Niels -- Niels Möller. PGP-enc

Re: div_qr_1n_pi1

2021-07-21 Thread Niels Möller
Torbjörn Granlund writes: > ni...@lysator.liu.se (Niels Möller) writes: > > Same as in the current (from 2013) version. Delaying the write is a bit > tricky, since we already use all registers. But it would be better to > update the quotient limbs in memory only in the unl

Re: div_qr_1n_pi1

2021-07-09 Thread Niels Möller
between runs on this machine, than they were on my previous laptop. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@

Re: div_qr_1n_pi1

2021-07-08 Thread Niels Möller
le-limb number, dnl using a single-limb inverse. dnl Contributed to the GNU project by Niels Möller dnl Copyright 2013, 2021 Free Software Foundation, Inc. dnl This file is part of the GNU MP Library. dnl dnl The GNU MP Library is free software; you can redistribute it and/or modify dnl

Re: div_qr_1n_pi1

2021-07-05 Thread Niels Möller
. :-) That's a more subtle comparison. Would also be good to annotate which C method (if any) the assembly routine is based on. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.

Re: div_qr_1n_pi1

2021-07-03 Thread Niels Möller
wins on a few machines (https://gmplib.org/devel/thres/DIV_QR_1N_PI1_METHOD), with all of method 1, 2 and 4 appearing as runner up on some machine. /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale governmen

Re: div_qr_1n_pi1

2021-07-01 Thread Niels Möller
Torbjörn Granlund writes: > Syntax errors included. The nightbuild report page is bright red. Danger of "easy" last minute changes... Fix pushed now. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesa

Re: div_qr_1n_pi1

2021-07-01 Thread Niels Möller
Torbjörn Granlund writes: > ni...@lysator.liu.se (Niels Möller) writes: > > I'm tempted to commit this code. I.e., new variants (not enabled) + > tuneup changes. To see which variants are favorites on the various test > machines. Should give some guidance as to what&

Re: div_qr_1n_pi1

2021-06-30 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > ni...@lysator.liu.se (Niels Möller) writes: > >> You're idea of conditonally adding the invariant d * B2 at the right >> place is also interesting, > > I've tried it out. Works nicely, but no speedup on my machine. I&

Re: mul_fft

2021-06-30 Thread Niels Möller
d be fairly efficient). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: pointers vs arrays

2021-06-20 Thread Niels Möller
ntax for emphasis and cross references (or delete, if you deem it unnecessary). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list

Re: div_qr_1n_pi1

2021-06-06 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > $ ./speed -p 100 -s 2-20 -C mpn_div_qr_1n_pi1.0x8765432108765432 > mpn_div_qr_1n_pi1_1.0x8765432108765432 mpn_div_qr_1n_pi1_2.0x8765432108765432 > mpn_div_qr_1n_pi1_3.0x8765432108765432 mpn_div_qr_1n_pi1_4.0x8765432108765432 > o

Re: div_qr_1n_pi1

2021-06-06 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > You're idea of conditonally adding the invariant d * B2 at the right > place is also interesting, I've tried it out. Works nicely, but no speedup on my machine. I'm attaching another patch. There are then 4 methods: meth

Re: div_qr_1n_pi1

2021-06-06 Thread Niels Möller
f MPN_INCR_U is a data dependent loop anyway. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: div_qr_1n_pi1

2021-06-03 Thread Niels Möller
asm. Cool! For assembly, it looks like we currently only have assembly for x86_64/ and x86_64/k8/. I think it's possibly to do something more clever on more recent processors with mulx, e.g, it will get pretty neat to keep the u1 recurrency variable in the special %rdx register. Regard

Re: div_qr_1n_pi1

2021-06-03 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > The current (METHOD == 2) code some somethign similar, it conditionally > adds in B2 at the right place in the next iteration. It uses the > name u2 for the carry that lives unti the next iteration, > > umul_ppmm

Re: div_qr_1n_pi1

2021-06-03 Thread Niels Möller
Torbjörn Granlund writes: > ni...@lysator.liu.se (Niels Möller) writes: > > The critical path, via the u1 variable, is > > umul_ppmm (p1, p0, u1, B2); > add_mss (cy, u1, u0, u0, up[j], p1, p0); > u1 -= cy & d; > > Nice! > > T

div_qr_1n_pi1

2021-06-02 Thread Niels Möller
flow. */ add_ss (q2, q1, q2, q1, CNST_LIMB(0), t1 - cy); q0 = t0; /* Final q update */ qp[j+1] = q1; MPN_INCR_U (qp+j+2, n-j-2, q2); } q1 = (u1 >= d); u1 -= (-q1) & d; udiv_qrnnd_preinv (t, u0, u1, u0, d, dinv); add_ss (q1, q0, q1, q0, C

Re: strange install directory

2021-02-26 Thread Niels Möller
ut the convention, except if we're cross # compiling. We use lib${ABI} if /usr/lib${ABI} exists and # appears to not be a symlink to a different name. (Comment being 10 years old, and maybe not completely accurate). Regards, /Niels -- Niels Möller. PGP-en

Re: strange install directory

2021-02-26 Thread Niels Möller
ng on the api built for, OS flavor, current structure under /usr/lib*, etc. Does it work to override it by explicitly passing --libdir=$PREFIX/lib? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. __

Re: gcd_11 without abs

2020-11-21 Thread Niels Möller
s makes the critical path longer. To avoid that, one would need to compute both ctz(a-b) and ctz(a+b). Or even shift two candidate values, |a-b| and (a+b) in parallel. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government

Re: gcd_11 without abs

2020-11-21 Thread Niels Möller
cond sub instruction in parallell with the first would be a start. Back to the no-abs version. That seems to require cmovs on the critical path *after* the shift, and that would tend to make the critical path longer. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid

gcd_11 without abs

2020-11-19 Thread Niels Möller
n't use carry from the first addition anyway, we can use lea for that and save a mov. But then we still need a mov + add to get the carry later on; here negation works against us, since that makes the compare instruction useless. Maybe there's some other architecture where negation i

Re: mpz_prevprime

2020-10-16 Thread Niels Möller
t "negative_mod_ui" or "distance_ui" to replace "nextmod_func" > and "increment_ui" to replace "nextseq_fun" > Also potentially adding the comment > > - m = nextmod_fuct(p, prime) > + /* Distance to next multiple of prime */ > +

mpz_add_si (was: Re: mpz_prevprime)

2020-10-16 Thread Niels Möller
"Marco Bodrato" writes: >>> mpz_add_si (p, p, diff * (odds_in_composite_sieve-1)); > > We don't have such a function :-) Ooops. Maybe we should have? We do have both _ui and _si versions of other basic mpz functions, including mpz_set, mpz_mul, and mpz_cm

Re: mpz_prevprime

2020-10-15 Thread Niels Möller
inters is fine. I'd prefer different names though, mod_ui instead of nextmod_func, and something similar for the other one. Or at least drop the _func suffix, which might be descriptive of a typedef name, but not as the name of a argument or a local variable. Regards, /Niels -- Niels Möller.

Re: mpz_prevprime

2020-10-15 Thread Niels Möller
mpz_srcptr n) Interface looks good to me. And if we later add a function to find the first prime in an arithmetic progression, I think that should fit well. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government survei

Re: mpn_set_str_bits

2020-09-30 Thread Niels Möller
nd mp_limb_t == unsigned char. */ > limb = (unsigned int) sp[sn] >> (bits - shift); > } > } > if (limb != 0) > rp[rn++] = limb; > else > rn = mpn_normalized_size (rp, rn); > return rn; > } > > It seems simple enough. Any

Re: mpn_set_str_bits

2020-09-30 Thread Niels Möller
is smaller, correct? I think so, but it's a bit subtle, so I'd prefer the explicit cast. Maybe worth a comment mentioning the problem case: mp_limb_t == unsigned char, bits == 8, shift == 0 Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Inter

Re: mpn_set_str_bits

2020-09-30 Thread Niels Möller
else > limb = 0; Do we really need to support bits == GMP_LIMB_BITS here? If not, the above 5 lines could be simplified to just limb = sp[sn] >> (bits - shift); Hmm, at this point, we always have shift < bits, right? So if we write it as limb = (sp[sn] &g

Re: State of PRNG code in GMP

2020-06-09 Thread Niels Möller
t...@gmplib.org (Torbjörn Granlund) writes: > ni...@lysator.liu.se (Niels Möller) writes: > > Does generic code really need to copy the algorithm specific data? > > It seems nicer to have generic code do as much as possible. And it > saves having pointers in the struct fo

Re: State of PRNG code in GMP

2020-06-03 Thread Niels Möller
nd the application that wants that will likely not get the enum value out of the blue, but have a list of known/supported values mapped to command line arguments or the like. So then it's not much extra work for the application to have call the right initializer from it's own switch stat

Re: [PATCH] mini-gmp: pass correct old_size to custom reallocate function

2020-05-27 Thread Niels Möller
_check (p); > - block = (size_t *) realloc (block, sizeof(size_t) + new_size + > sizeof(block_end)); > + size_t *old_block = block_check (p), *block; Declare block on a line of its own, size_t *block; Thanks! /Niels Möller -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: mini-gmp warnings

2020-04-26 Thread Niels Möller
Marco Bodrato writes: > The following line works in my environment: > > make CPPFLAGS="-DMINI_GMP_LIMB_TYPE=char" check-mini-gmp Thanks, seems to work fine (and make clean-mini-gmp before changing limb size). Regards, /Niels -- Niels Möller. PGP-encrypted email is prefer

Re: mini-gmp warnings

2020-04-26 Thread Niels Möller
7;ll do that, then. Is there an easy way to run mini-gmp tests with small limb size? Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing

Re: mini-gmp warnings

2020-04-26 Thread Niels Möller
ni...@lysator.liu.se (Niels Möller) writes: > There are a couple of other uses of LOCAL_SHIFT_BITS, > LOCAL_GMP_LIMB_BITS, LOCAL_CHAR_BIT, added as part of the support for > testing with small limb sizes. Is it for some reason important to refer > to these constants via int variabl

mini-gmp warnings

2020-04-26 Thread Niels Möller
ff with something matching LOCAL_SHIFT_BITS, whatever its value is. There are a couple of other uses of LOCAL_SHIFT_BITS, LOCAL_GMP_LIMB_BITS, LOCAL_CHAR_BIT, added as part of the support for testing with small limb sizes. Is it for some reason important to refer to these constants via int variables?

Re: mini-gmp "bug" missing mpz_fits_sint_p / mpz_fits_uint_p

2020-04-20 Thread Niels Möller
Marco Bodrato writes: > Please Niels, choose the implementation you prefer and change also > mpz_fits_slong_p accordingly. Pushed, including the new tests from your patch. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subj

Re: mini-gmp "bug" missing mpz_fits_sint_p / mpz_fits_uint_p

2020-04-19 Thread Niels Möller
that could be used to test it. Some C interpreter, not targeting any particular hardware, would do fine for this purpose. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. __

Re: mini-gmp "bug" missing mpz_fits_sint_p / mpz_fits_uint_p

2020-04-19 Thread Niels Möller
e simpler using mpz_cmp_si, int mpz_fits_sint_p (const mpz_t u) { return mpz_cmp_si (u, INT_MAX) <= 0 && mpz_cmp_si (i, INT_MIN) >= 0; } BTW, do we have any C implementation where INT_MAX + INT_MIN == 0, i.e., not using two's complement? Regards, /Niels -- Niels Mölle

Re: Cleanup mpz_out_str in tests

2020-03-28 Thread Niels Möller
o mini-gmp/testsuite/). Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: tune/speed is up to date

2020-03-26 Thread Niels Möller
S, not sure that's really correct). And if you want to make changes to configure and Makefile, make sure to configure with --enable-maintainer-mode. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveill

Re: Moving LOOP_ON_SIEVE_* macros to gmp-impl.h

2020-03-16 Thread Niels Möller
I think. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: [PATCH] mini-gmp: pass correct old_size to custom reallocate function

2020-03-12 Thread Niels Möller
mp_default_alloc, with no indirection. The xrealloc function is analogous (but maybe not as widely used as xalloc). On the other hand, I don't know of any traditional function named xfree. So maybe the least confusing naming is to strike "x" from all the names (six in all, including the

  1   2   3   4   5   6   7   8   9   10   >