Re: mpn_mulmod_bnm1

2014-04-02 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: No immediate plans. To me, it seems stable enough, if documented together with mpn_mulmod_bnm1_itch and mpn_mulmod_bnm1_next_size. We should integrate the small primes FFT code,

Re: GMP 6.0.0 released

2014-03-26 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: It's usually defined in gmp-mparam.h for your machine, with a fallback definition in gmp-impl.h. But that definition isn't picked up by assembly files, so it should also be defined in config.m4, generated by configure. Not sure how configure

Re: GMP 6.0.0 released

2014-03-26 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: And the value you use (30) is different from the default in gmp-impl.h (10). I take it you think the larger value is more appropriate for powerpc? I looked at the measured values for the powerpc64 hardware we run on, and 30 seemed to be in the

Re: Small-base powm

2014-03-25 Thread Torbjorn Granlund
I think what you suggest is very close to the pseudo code at https://gmplib.org/devel/, under the header mpz_powm and mpz_powm_ui. But you suggest several additional refinements. I wasn't considering of the case when the base is just a single limb, but any time 2 * log(b) = log(m). These

Re: mini-gmp: mpz_congruent_p and mpz_probab_prime_p

2014-03-12 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: if (reps 25 mpz_cmpabs_ui (n, 5000*5000+5000 + 41) 0) reps = 25; else if (reps 5000) reps = 5000; I didn't follow this thread too closely, but that code seems to suggest that an argument of 5000 makes sense. Even the most

Re: mpz_probab_prime_p and negative inputs

2014-03-03 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: It seems mpz_probably_prime_p considers negated primes to also be prime. E.g, for n == -29 it returns 2, meaning definitely prime. Mathematically, I think -29 is usually considered neither prime, nor composite (its prime factorization is -1 * 29

Re: mini-gmp

2014-02-24 Thread Torbjorn Granlund
bodr...@mail.dm.unipi.it writes: Il Ven, 17 Gennaio 2014 1:10 pm, Vincent Lefevre ha scritto: you may also have optimizations based on the fact that some variable cannot be zero. But you have no types that don't include zero. The right solution is to make sure that the compiler knows

Re: mini-gmp

2014-02-23 Thread Torbjorn Granlund
Perhaps we should add a simple one-level version to mini-gmp? #define GMP_MINI_VERSION 17 It does not need to be bumped with GMP release if mini-gmp did not change. Perhaps it should be bumped at each checkin? Torbjörn Please encrypt, key id 0xC8601622

Re: Fat Binary - Haswell Detection Bug (and fix)

2014-02-20 Thread Torbjorn Granlund
John Sully j...@csquare.ca writes: While testing the latest development code we've discovered that it fails on specific Pentium D Haswell CPUs. These CPUs are odd in that they don't have the BMI2 instruction set. Because of this GMP will crash when it attempts to execute a MULX. The

Re: Bug found in nightbuilds

2014-02-18 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I'd suggest doing the below (also undoing Marco's previous fix). To fix the actual failure, one would also need to edit the two gmp-mparam.h files which set DIV_QR_1_NORM_THRESHOLD to zero. I checked that in yesterday. But I'm a bit

Re: Bug found in nightbuilds

2014-02-16 Thread Torbjorn Granlund
bodr...@mail.dm.unipi.it writes: Code says: if (d GMP_NUMB_HIGHBIT) { /* Normalized case */ uh = up[--n]; /* Here n goes to 0 */ ... if (BELOW_THRESHOLD (n, DIV_QR_1_NORM_THRESHOLD)) { while (n 0) udiv_qrnnd (...);

Bug found in nightbuilds

2014-02-15 Thread Torbjorn Granlund
We currently have many spurious failures flagged in red at https://gmplib.org/devel/tm-date.html, mainly due to hardware errors with the system `biko'. But the `hark' failure looks real: hark$ cd /var/tmp/gmp-obj/hark-stat-64 hark$ GMP_CHECK_RANDOMIZE=3526906869 tests/mpn/t-div

Re: Bug found in nightbuilds

2014-02-15 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Hmm. Look like it's the returning of the high limb via the separate *qp which is broken? And there's no (non-inline) assembly involved, its generic/mpn_div_qr_1.c. I traced it to a nn=0 call to the underlying pi1 call. Dunno if that's

Re: mpn_sec_powm

2014-02-12 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Given the current implementation, it's natural. But we could document that it is required that any left over bits in the top limb must be zero. Would that be better? My take on this is that asking users to keep that zero isn't a requirement

Re: mpn_sec_powm

2014-02-11 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: Please use something else than ebits, since that sounds like the arguments contains bits with individual meaning. IIRC enb would follow conventions used elsewhere in the manual. Naming

Re: mpz_limbs interface

2014-02-07 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Why isn't __gmp_extract_double's style OK for mpn_set_d? Is its conventions not neat enough, or are there efficiency reasons? I found the conventions of __gmp_extract_double hard to understand. And I think returning a base 2 exponent is

Re: mpz_limbs interface

2014-02-06 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Below is a patch to do this (and return value is long, not mp_bitcnt_t, since it needs to be signed). What do you think? I'm to busy to make an educated analysis. Why isn't __gmp_extract_double's style OK for mpn_set_d? Is its conventions

Re: sec_invert performance

2014-02-04 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Any idea what's going on? I quick guess is that the exponent is fixed for powm, not a function of the input size. ___ gmp-devel mailing list gmp-devel@gmplib.org

Re: SSE, XMM and GMP's configure

2014-02-03 Thread Torbjorn Granlund
For what it is worth, there are now 32-bit and 64-bit fbsd5 and fbsd56 systems in the test array: https://gmplib.org/devel/testsystems.html Both seem to allow XMM access. The problem might be limited to fbsd4. At some point, it would be nice to clean up the broken logics for this in GMP's

Re: SSE, XMM and GMP's configure

2014-02-02 Thread Torbjorn Granlund
Il Sab, 25 Gennaio 2014 7:15 pm, Torbjorn Granlund ha scritto: operating system support. Now, we suppress use of (some) gcc sse-related options which trigger bad behaviour (via the acinclude.m4 GMP_GCC_PENTIUM4_SSE2) and in that context check of the OS handles XMM (via

Re: mpn_sec_minvert name

2014-01-27 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I think the conclusion was that volatile was not very useful for this function, but if we add that later, does it make any sense to have const volatile *ap? The intended meaning would be that writes are invalid, and that no reads should be

SSE, XMM and GMP's configure

2014-01-25 Thread Torbjorn Granlund
Our configure logic for excluding XMM register use is flawed. We should to keep SSE2 availability from XMM availability apart, since a CPU which supports SSE2 will always handle SSE2+MMX, while XMM requires operating system support. Now, we suppress use of (some) gcc sse-related options which

Re: speed of mpn_sqrtrem vs mpn_rootrem

2014-01-22 Thread Torbjorn Granlund
Zimmermann Paul paul.zimmerm...@inria.fr writes: the issue reported in September 2010 is still present: Such things happens sometimes. It means that we volunteers did not have enough spare time for making the GMP gift better in that specific respect. Torbjörn Please encrypt, key id

Re: mpz_limbs interface

2014-01-22 Thread Torbjorn Granlund
bodr...@mail.dm.unipi.it writes: Maybe our printf/repl-vsnprintf.c is not tested enough? Oddly enough it is not even listed at e.g., https://gmplib.org/devel/lcov/hannahnbsd32v61/gmp/printf/index.html. Of the existing 21 function in printf/ only 17 are there. Great coverage analysis! :-(

Re: Problem with __gmp_expr

2014-01-22 Thread Torbjorn Granlund
Marc Glisse marc.gli...@inria.fr writes: By the way, do we have a policy about breaking binary compatibility? In this case, mixing old and new objects could result in crashes (almost certainly at -O0, seldom at -O3). It should be possible to prevent this issue by renaming __gmp_unary_expr

Re: mpz_limbs interface

2014-01-22 Thread Torbjorn Granlund
bodr...@mail.dm.unipi.it writes: Well, it is wrapped with #if ! HAVE_VSNPRINTF /* only need this file if we don't have vsnprintf */ [...] #endif /* ! HAVE_VSNPRINTF */ so, on many systems it is not compiled at all... (and that's a reason why it is less tested than other chunks

Re: mpz_limbs interface

2014-01-21 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I see. In this particular case, I think the right gmp interface change is to add mpn_urandomb and mpn_rrandomb (similar to current mpn_random and mpn_random2, but with a randstate argument). If I understand this correctly, the main obstacle is

New mpn random generators (Was: Re: mpz_limbs interface)

2014-01-21 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: ni...@lysator.liu.se (Niels Möller) writes: I see. In this particular case, I think the right gmp interface change is to add mpn_urandomb and mpn_rrandomb (similar to current mpn_random

Re: mpz_limbs interface

2014-01-21 Thread Torbjorn Granlund
Marc Glisse marc.gli...@inria.fr writes: We already have function mpz_array_init which encourages thinking of I removed its docs the other day. Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel

Re: mpz_limbs interface

2014-01-21 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: This assumes that C++ allows initializers with arbitrary non-constant expressions (does it?), and that we implement mpn_set_d. The top-level file extract-dbl.c kind-of does that already. Torbjörn ___

Cleaning out varargs

2014-01-19 Thread Torbjorn Granlund
I noticed that we still have tests for traditional C varargs.h versus ISO C90 stdarg.h everywhere a variying # of arguments are used. Since we cleaned out KR stuff a few years back, we could require stdarg.h without causing additional portability problems, right? Torbjörn Please encrypt, key id

Cleanups

2014-01-19 Thread Torbjorn Granlund
I did a lot of cleanup changes today: 1. All LGPL copyright headers should now have the same layout, except for file format mandated line prefixes. 2. Old KR varargs config checks and conditional code is now gone. 3. mpq_t now used everywhere in place of the old MP_RAT. 4. The old

Re: TODO for 5.2 v3

2014-01-16 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Question is, when is it useful for our purposes? First example, mpn_sec_add_1: mp_limb_t mpn_sec_add_1 (mp_limb_t *rp, mp_limb_t *ap, mp_size_t n, mp_limb_t b, mp_ptr scratch) { scratch[0] = b; MPN_ZERO

Re: TODO for 5.2 v3

2014-01-07 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: * Make some other sec functions from Niels' list public? Here's a first patch adding a couple of other functions. Benchmarking and testing is missing (except that the sec_minvert tests still pass

Re: Should we declare _itch functions __GMP_NOTHROW __GMP_ATTRIBUTE_PURE ?

2014-01-05 Thread Torbjorn Granlund
bodr...@mail.dm.unipi.it writes: Indeed. I pushed a fix. Any comment about marking them also with __GMP_NOTHROW ? Perhaps that too. I suppose __GMP_ATTRIBUTE_PURE should really be the stronger ATTRIBUTE_CONST, except that we don't yet have any name space clean way of doing that for

Re: TODO for 5.2

2013-12-30 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: I notice you make this non-public. Is it premature to make it part of the public interface? Pushed now, with declarations moved to gmp-h.in. And now some 450 nightly builds have run

Re: TODO for 5.2

2013-12-29 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: * Finalise and commit mpn_sec_minvert. Here's a new version, including tests. Seems to work. I'll try to get this committed fairly soon. Nice! I notice you make this non-public. Is it premature to make it part of the public interface?

Re: TODO for 5.2

2013-12-29 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I just put the declarations together with the other mpn_sec_* functions. I think it makes sense to make mpn_sec_div_*, mpn_sec_minvert and mpn_sec_powm public together. Does mpn_sec_powm need more work (besides the rename) before made public?

TODO for 5.2

2013-12-28 Thread Torbjorn Granlund
This is what I want to fix before the 5.2 release. Please remind me if I have forgotten something. * Strongly consider making mpn_sec_div_qr return high quotient limb and write just nn-dn quotient limbs to qp area. * Finalise and commit mpn_sec_minvert. * Add some other sec functions from

Re: Side-channel silent modular inverse

2013-12-27 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: As you can see, it depends on a couple of other functions, mpn_sec_add_1, mpn_cnd_neg, mpn_cnd_swap, mpn_sec_eq_ui, which would probably have to be written in assembly to ensure that they avoid operations with branches or data-dependent timing.

Interface of mpn_sec_ and mpn_cnd_ functions

2013-12-27 Thread Torbjorn Granlund
I think the mpn_sec_ and mpn_cnd_ functions should never allocate any memory. Instead, callers should pass all scratch areas to allow the use of secure memory. Or is this pointless? The stack frames may get sensitive data, as determined by the compiler used. When we allocate (small) scratch

Re: Side-channel silent modular inverse

2013-12-27 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Create zero vector, invoke mpn_sub_n. That doesn't make it conditional. And I see no obvious way to do conditional negation on top of mpn_cnd_sub_n. Oops. Compute T = 2 x A using mpn_add_n or mpn_lshift. Use mpn_cnd_sub_n with A, T as

Re: Side-channel silent modular inverse

2013-12-27 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Should work (except if T is computed mod B^n, one doesn't get the correct carry out, but that isn't needed here). But it's a bit awkward, I realise one needs some (straightforward) handling of carry out. and this is a performacne critical

Re: Side-channel silent modular inverse

2013-12-26 Thread Torbjorn Granlund
I suppose I already suggested that one computes a^{-1} mod b as a^{b-1} mod b, using a plain old modexp. I realise that this will be asymptotically slower, in this setting O(n^3) vs O(n^2), but it ought have a much lower constant factor. Torbjörn ___

Re: Including limits.h in gmp-impl.h

2013-12-25 Thread Torbjorn Granlund
Vincent Lefevre vinc...@vinc17.net writes: On 2013-12-25 12:13:39 +0100, Marc Glisse wrote: Oups, looks like I already asked about that: https://gmplib.org/list-archives/gmp-bugs/2011-November/002443.html and the reply was to try including tests.h before gmp-impl.h. I'd say

Re: Including limits.h in gmp-impl.h

2013-12-25 Thread Torbjorn Granlund
Vincent Lefevre vinc...@vinc17.net writes: I've tried to find something about that on Google, but couldn't find anything. Any reference? Perhaps ChangeLog has some history. Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org

Re: [PATCH v2] Support powerpc64le-linux platform

2013-12-13 Thread Torbjorn Granlund
Your powerpc64le patch is now in the main GMP repo, https://gmplib.org/repo/gmp/. Thanks for your contribution! (We're still waiting for any reaction from the FSF staff, but we have decided to time out after a reasonable time. Should some problem arise, we'll address it appropriately.)

Re: [PATCH] Support powerpc64le-linux platform

2013-12-08 Thread Torbjorn Granlund
Ulrich Weigand ulrich.weig...@de.ibm.com writes: Testing cpp symbols for ABI version makes me a bit nervous. Such things can easily get out-of-synch. It might be more resilient to check a generated object. Well, the _CALL_ELF check is what we use for all other packages that

Re: SSE2 basecase multiplication

2013-12-07 Thread Torbjorn Granlund
Vasili Burdo vasili.bu...@gmail.com writes: I implemented basecase multiplication and squaring for x86 using SSE2 instructions and Comba column-wise multiplication method. On Ivy Bridge (Intel Core i7 3517U) multiplication 10-20% faster than present GMP basecase MMX multiplication.

Re: [PATCH] Support powerpc64le-linux platform

2013-12-06 Thread Torbjorn Granlund
Ulrich Weigand uweig...@de.ibm.com writes: this patch updates GMP to support the little-endian PowerPC64 platform (powerpc64le-linux). This requires two changes: - Update configfsf.guess/sub to current upstream versions. I think Niels volunteered to do that... - Change

Re: divrem_1 and mod_1

2013-11-22 Thread Torbjorn Granlund
Just to make sure I start from the right spot: You're talking about Hensel norm division here, right? When Paul posted results in March, I thought your work was on plain old Euclidean norm. We (mainly I and Niels, I suppose) have spent much more time on Euclidean norm division/mod that on the

Curious slowdown in Toom-3

2013-11-12 Thread Torbjorn Granlund
GMP 4.3: shell$ ./speed -p1 -s100-1 -f10 mpn_toom3_mul_n overhead 0.2 secs, precision 1 units of 3.13e-10 secs, CPU freq 3200.00 MHz mpn_toom3_mul_n 100 0.05181 1000 0.000169392 1 0.005313959 100.159352000 GMP repo: shell$ ./speed

Re: Curious slowdown in Toom-3

2013-11-12 Thread Torbjorn Granlund
I think I understand this issue now. In the various toom functions, we suppress tests for recursive calls which cannot happen when each function is invoked for the intended range. These things are controlled by the relative TOOM threshold. This makes tune/speed measurements look bad, but

Re: squaring vs multiply

2013-11-08 Thread Torbjorn Granlund
Zimmermann Paul paul.zimmerm...@inria.fr writes: Moreover GMP is using Schönhage-Strassen's algorithms, where the pointwise multiplications are not negligible, thus we should have a ratio well above 2/3. However in GMP 5.1.3 the ratio is around 2/3, and sometimes even below: Any

Re: Amd64 relocation R_X86_64_32S in a static lib

2013-11-06 Thread Torbjorn Granlund
Exact decision for the change? I'm not sure what you mean by 'decision' there. If you're wondering about the _reason_ for the change (why we did it), the answer is so that ASLR is applied not just to the code in shared libraries but also the code in executables. If you're wondering

Re: Amd64 relocation R_X86_64_32S in a static lib

2013-11-05 Thread Torbjorn Granlund
Philip Guenther guent...@gmail.com writes: Ah, but you are, sorta. In OpenBSD 5.3, platforms where the compiler and toolchain support were for robust for it were switched to build PIE objects and executables by default. So yes, that object _is_ expected to be position independent.

Amd64 relocation R_X86_64_32S in a static lib

2013-11-04 Thread Torbjorn Granlund
I am working on getting the GMP bignum library to work better on OpenBSD. With current GMP sources (GMP 5.0.x, 5.1.x, and development head) a 'fat' build will not work on amd64 under OpenBSD 5.3 and 5.4. With older version of OpenBSD (I've tested 4.9, 5.0, 5.2) things work as expected. The

Re: div_qr_1 interface

2013-10-26 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: A long time ago, we choose an interface for sbpi1_div_qr which does *not* store the most significant limb; instead it returns it. I think it was the intention that a new top-level mpn_div_qr should follow that convention, and not store the top

Re: div_qr_1 interface

2013-10-25 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: The interesting thing is that the next higher function, mpn_div_qr_1, should return the high quotient limb separately. I am not sure I agree. Please explain. You're saying that en n-limb consecutive dividend should yield an (n-1)-limb consecutive

Re: div_qr_1 interface

2013-10-24 Thread Torbjorn Granlund
I pushed initial C versions of these functions: mpn_div_qr_1n_pi2 mpn_div_qr_1u_pi2 I have had these for a long time, judging from the file time stamps. These accept n-limb dividends in a single consecutive operand and generate n-limb quotients also in a consecutive operand. I now

Re: A contribution to GMP

2013-10-23 Thread Torbjorn Granlund
Marc Glisse marc.gli...@inria.fr writes: On the homepage gmplib.org: Externally supported: High-level floating-point accurately rounding arithmetic functions (mpfr). See the mpfr site for more information. Starting with GMP 4.2, mpfr is released separately from GMP. (New projects should

Massive test failures for haswell-freebsd9

2013-10-23 Thread Torbjorn Granlund
Regarding http://gmplib.org/devel/tm-date.html. Some of you might have spotted build errors for solaris, with -fat. This is due to a static allocation of their m4. I've worked around it, so next build round should pass. A much worse error happens with Intel Haswell under FreeBSD 8 ad 9; here

Re: div_qr_1 interface

2013-10-22 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: * The code is no win for AMD k10/k8 (although close to 10 c/l might well be possible) I tried replacing one masking op by cmov, as you suggested. We then get down to 11.25 c/l on K10. I put

Re: div_qr_1 interface

2013-10-22 Thread Torbjorn Granlund
I turned out the code was a bit slower on k8. This patch changes that. With it applied, things takes 11 c/l on both pipelines. This is also a 2 c/l improvement for piledriver. I have not tested that this is correct. If you like the patch, please consider putting the result in the k8 subdir.

Re: div_qr_1 interface

2013-10-22 Thread Torbjorn Granlund
I played more with the code, now trying to break the add-adc-sbb-cmov chain, for the benefit of most Intel processors. But I lack unit testing code for the function, making hacking quite cumbersome. I don't feel safe hacking *any* GMP assembly code without tests/devel/try.c's function and access

Re: div_qr_1 interface

2013-10-22 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: ni...@lysator.liu.se (Niels Möller) writes: But sure, support also in try.c would be good. Added now. Please have a look if it the changes are sane. I use the second source for the uh input, and I added a DATA_DIV_QR_1 to get it in the

Re: div_qr_1 interface

2013-10-22 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: And sure enough, it detects some bugs in the new assembly code. For size n==1, there's a missing mov. I'll add that shortly. Then there's another problem with n==2, which needs a bit more debugging. Good. So now you have debugged the new try.c

Re: div_qr_1 interface

2013-10-22 Thread Torbjorn Granlund
I added data for the new code at http://gmplib.org/devel/asm.html. There is a line for div_qr_1u_pi1 as well, since that will also be needed. It might actually be more common that the divisor is not normalised. I should try to wrap up div_qr_1n_pi2 and div_qr_1u_pi2 as well, and then add

Re: div_qr_1 interface

2013-10-21 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Will try that. I think one could also try to delay the quotient store one iteration, keeping Q1 in a register until the next iteration. Then one gets rid of the adc Q2,8(QP, UN, 8) in the loop, using only a single store per

Re: div_qr_1 interface

2013-10-21 Thread Torbjorn Granlund
I looked at the logic following this: sbb U2, U2 C 7 13 You negate the U2 copy in Q2. It seems that three adc by sbb could avoid the neg. I might also be possible to replace the early loop and stuff by cmov. Note that the carry flag survives dec, although that causes a

Re: div_qr_1 interface

2013-10-20 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: I think x86-64, x86-32, arm32, arm64, powerpc-64, sparc-64 matter. Unfortunately, powerpc-64 (and -32) return these types onto the stack via an implicit pointer. Ok, I think I'll stick

Re: div_qr_1 interface

2013-10-20 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: On my core2 laptop: $ ./speed -s 2-10,100,500 -C mpn_divrem_1.0x mpn_div_qr_1.0x overhead 6.13 cycles, precision 1 units of 8.33e-10 secs, CPU freq 1200.00 MHz mpn_divrem_1.0x

Re: div_qr_1 interface

2013-10-18 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: ni...@lysator.liu.se (Niels Möller) writes: (about using a small struct as return value) If the caller is going to store the returned value directly in memory anyway, there's little difference. And if the caller is going to operate on

Re: div_qr_1 interface

2013-10-17 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: ni...@lysator.liu.se (Niels Möller) writes: To get going, I've written C implementations of mpn_div_qr_1n_pi1 and mpn_divf_qr1n_pi1, and made divrem_1 call them. Below, also an mpn_div_qr_1, using these primitives (and with some

Re: mpn_divexact_1 comments

2013-10-16 Thread Torbjorn Granlund
I agree with Niels (don't understand + not appropriate for low-level...). We should replace mpn_divexact_1 with code that: (1) Uses Jebelean's trick with a Euclidean division working left-to- right and a simultaneous Hensel division working right-to-left. This is faster in the

Re: division-free binary-to-decimal conversion

2013-10-07 Thread Torbjorn Granlund
Zimmermann Paul paul.zimmerm...@inria.fr writes: we mean faster than GMP's conversion functions, but still using GMP for the low-level operations. Then please say so in the paper. not only. For large operands we believe there is still room to improve our code. In particular an

Re: Basecase assembly optimisation project

2013-10-03 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Another feature, which I looked into while ago without getting very far with the loopmixer, is to make it understand associativity. I.e, try reordering certain instructions with the same destination register, like xor %r8, %rax xor %r9,

Re: Basecase assembly optimisation project

2013-10-03 Thread Torbjorn Granlund
Ondřej Bílka nel...@seznam.cz writes: It is possible enchancement, but I am not yet at stage of calculating register dependencies on jumps. That's someting we do, but we only handle a simple jump-back for the loop. (That branch limitation is a slght problem for some division loops, which

Re: Basecase assembly optimisation project

2013-10-02 Thread Torbjorn Granlund
Ondřej Bílka nel...@seznam.cz writes: I am writing a tool that might be useful, a simple optimizer of assembly routines. You need to write a benchmark that measures performance and prints elapsed time and assembly file. Currently it has two optimization patterns, first is enclosing block

Basecase assembly optimisation project

2013-09-26 Thread Torbjorn Granlund
For the last few months, I have been working on writing and rewriting basecase code for X64-64 processors. The result is now in the mainline GMP repo. The basecase code I have focused on is: mul_basecase, sqr_basecase, mullo_basecase, and Hensel remainder via redc_1. At the start of this

Re: mpn_mul_fft type overflow issue

2013-09-18 Thread Torbjorn Granlund
Mark Sofroniou ma...@wolfram.com writes: Thanks. I wasn't completely sure what the right type was in all cases. Most of the changes are to use mp_size_t instead of int - these are the important ones. There are a couple (related to the variables K2 and K3) that change unsigned int to

Re: [PATCH v2] Fix common typos.

2013-07-21 Thread Torbjorn Granlund
I fixed some typos using a program I had, plus some of the typos you found. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel

Re: [RFC] Fix leading and trailing spaces.

2013-07-09 Thread Torbjorn Granlund
Ondřej Bílka nel...@seznam.cz writes: Are leading ws all spaces or tabs followed by less than 8 spaces? That's probably what we usually do. Are there some form-feeds and are they useful? I don't understand this question. I understand that some hackers find whitespace consistency to be

Haswell system, anyone?

2013-06-19 Thread Torbjorn Granlund
I'd like to test GMP on an Intel Haswell CPU. Could you perhaps offer a guest account for GMP use? These CPUs have a new bignum-oriented instruction, MULX, which avoids overwriting the carry flag. That should help GMP a bit, perhaps significantly. -- Torbjörn

Re: caching of transforms used for large multiplications

2013-06-14 Thread Torbjorn Granlund
Daniel Lichtblau d...@wolfram.com writes: If you do not manage to locate them I can scan and send a pdf. (Least I can do for someone who shared a room for two months with that Torbjörn fellow..) I started to write a reply, but decided against sending it after I read this unprovoked

Re: caching of transforms used for large multiplications

2013-06-14 Thread Torbjorn Granlund
Daniel Lichtblau d...@wolfram.com writes: I simply have no idea why you would choose to take such offense. If it serves any purpose, it is one I quite fail to see. That said, I'll not trouble you with further communication. If you cannot assume a professional attitude on the GMP lists,

Re: caching of transforms used for large multiplications

2013-06-11 Thread Torbjorn Granlund
Hello Daniel! We don't yet have any transform-only interface in GMP, but this will probably change at some point. The current FFT code uses coefficient rings mod 2^m+1, as per the Schönhage-Strassen algorithm. In this algorithm, m = O(sqrt(n)) where n = O(log(a) + log(b)) for multiplication of

Re: speed of unbalanced division

2013-06-02 Thread Torbjorn Granlund
Zimmermann Paul paul.zimmerm...@inria.fr writes: thank you for the feedback. Yes the new curve is not everywhere optimal, but the important thing is that it is much more regular, which is critical for algorithms assuming that when we cut both numerator and divisor (for a fixed-size

GMP testing shortcomings

2013-05-23 Thread Torbjorn Granlund
The ia64 mpn_divrem_2 bug reported today (and fixed yesterday...) highlights some shortcomings of GMP testing. For x86, x86_64 and (since GMP 5.1.2) arm32 we have calling conventions checking code via tests/*call.asm and tests/*check.c. but this is then invoked from tests/devel/try. Flaws: 1.

Re: Changes to mini-gmp and 5.1.2

2013-05-18 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: Should we move any of the mini-gmp changes to 5.1.2? I think the following would make sense to include: 2013-02-25 Niels Möller ni...@lysator.liu.se * mini-gmp/tests/t-double.c

Changes to mini-gmp and 5.1.2

2013-05-17 Thread Torbjorn Granlund
Should we move any of the mini-gmp changes to 5.1.2? -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel

Preparing GMP 5.1.2

2013-05-13 Thread Torbjorn Granlund
I think it is time for a 5.1.2 release, since we've found and fixed a couple of bugs since the last release. I am redirecting the nightly build scripts to use the 5.1 repo. The main repository will thus be untested for a while. Unless I hear protests, I'll make the new release towards the end of

Re: Preparing GMP 5.1.2

2013-05-13 Thread Torbjorn Granlund
Marc Glisse marc.gli...@inria.fr writes: I need to backport a couple changes that I made soon after 5.1 branched, I'll try to do that soon... Sorry, I had missed that. I will not make the release until you have the time to address this. -- Torbjörn

Re: _basecase or _sec? [

2013-05-03 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I think Newton analogues exist only when b is a power, not in general. And the most important case is prime b. I think it exists also for b can be factorised into prime powers... I am not familiar with the Jebelean (or Möller!) criteria for

Re: _basecase or _sec? [

2013-05-03 Thread Torbjorn Granlund
[I fixed the grammar in my self-quotations, hopefully not against some netiqette] We don't need to insist on keeping operands positive. Hmm. In general, one needs to replace the largest number, to make progres. But I guess in the case of many high bits being equal, it might not

Re: _basecase or _sec? [

2013-05-02 Thread Torbjorn Granlund
I started a web page on this: gmplib.org/devel/sec.html Feel free to make changes as usual. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel

Re: neon logops

2013-04-26 Thread Torbjorn Granlund
Richard Henderson r...@twiddle.net writes: Building on the copyi that tege committed the other day, use neon for the logical operations too. I committed the 128 bit version to arm/neon, making it become used for all Neon capable processors. I put it there since it is a speedup for A9 as

Re: ARM Neon multiplication (GNU and improved!)

2013-04-21 Thread Torbjorn Granlund
I've been busy improving addmul_1 and submul_1 for Cortex-A15 lately. It turned out to be possible to reach 2 c/l for addmul_1 using plain (non-SIMD) operations; such code is in the repo since a few days. The trick was to move the recurrency path away from multiply-accumulate instructions, and

Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-16 Thread Torbjorn Granlund
David Miller da...@davemloft.net writes: From: Torbjorn Granlund t...@gmplib.org Date: Tue, 16 Apr 2013 14:43:58 +0200 If we cannot make an configure test, we need to know if there is a release where the assembler can be trusted. After some discussions with my Oracle contact, I

Re: [PATCH] Improve and consolidate sparc PIC assembler.

2013-04-15 Thread Torbjorn Granlund
Where to go from here? If we want to clean up some old SPARC code, then we have learnt that we to test the result on several key platforms. We also don't want to create slower code, unless the old code is clearly broken (in more than a hypothetical way). For the 64bit case, it is safe to assume

  1   2   3   4   >