More cleaning

2012-02-26 Thread Torbjorn Granlund
The KR support was removed a year or two ago. But we still have some clutter remaining from it. I suppose __GMP_PROTO is an example. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel

Re: Division call in mpn_gcd

2012-02-25 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Perhaps this is the reason for keeping redc separate? IIRC, bdiv functions return a borrow, meaning that the remainder corresponding to the computed quotient is negative, while red returns a carry which means that the computed remainder is a

Re: _mp_alloc vs ALLOC

2012-02-24 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: Surely a plain TMP_ALLOC adds red zones? If not, that is something we ought to fix. But tp = TMP_ALLOC_LIMBS (2*n); xp = tp + n; does not add any between T and X (intended

Re: _mp_alloc vs ALLOC

2012-02-24 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: What about the test in #define TMP_ALLOC(n) \ (LIKELY ((n) 65536) ? TMP_SALLOC(n) : TMP_BALLOC(n)) That test will cost a cycle or two for each TMP_ALLOC call (with non-constant n), regardless of size, won't it? I think my previous

Division call in mpn_gcd

2012-02-24 Thread Torbjorn Granlund
Inspired by Marc{,o}'s cleanup commits, I decided to look for TMP_ALLOC* calls that should be made into TMP_SALLOC* or TMP_BALLOC*. Then I spotted this unrelated thing: tp = TMP_ALLOC_LIMBS(talloc); if (usize n) { mpn_tdiv_qr (tp, up, 0, up, usize, vp, n); if (mpn_zero_p

Re: Division call in mpn_gcd

2012-02-24 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: Except that redc does not accept independent sizes. Therefore we need to use bdiv_qr. I think it should be easy to generalize at least redc_1 and redc_2 to accept independent sizes. Might

Re: Sandybridge addmul_N challenge

2012-02-23 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Here's a sketch of an adddmul_2 iteration using Karatsuba. I assume we have vl, vh, vd = |vl - vh| and an appropriate sign vmask in registers before the loop. Carry input in c0, c1, carry out in r2, r3. mov (up), %rax mov

Re: Sandybridge addmul_N challenge

2012-02-23 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: One can decrease it a bit by adding c0, c1 earlier (do you think recurrency can be a problem if we add c0, c1 to the first product?) and doing an in-place add to (rp) and 8(rp) at the end. I could get it down to 30 instructions with a deep

Re: Sandybridge addmul_N challenge

2012-02-22 Thread Torbjorn Granlund
I doubt we can make addmul_1 run faster on sandybridge. But I'd like mul_basecase to run much faster than 3 c/l. Then sqr_basecase and redc_1, redc_2 should be fixed. An addmul_2 running better at 3 c/l or better would be great. That means we need to handle a tick in it using = 17 insns,

Re: _mp_alloc vs ALLOC

2012-02-22 Thread Torbjorn Granlund
Marc Glisse marc.gli...@inria.fr writes: is there any objection if I replace most uses of -_mp_alloc by calls to the ALLOC macro in mp[zqf] (and similarly for _mp_size, etc)? It helps when experimenting... I am also considering moving the NUM and DEN macros from test/mpq/t-cmp* to

Re: _mp_alloc vs ALLOC

2012-02-22 Thread Torbjorn Granlund
bodr...@mail.dm.unipi.it writes: Unrelated :-) We might define more macros like TMP_ALLOC_LIMBS_2 . I mean _3 and _4. So that they can be used to reduce the number of allocations. Do you agree? (I just touched mpz/gcdext.c, and _4 should be used there). I'd vote for killing

Re: _mp_alloc vs ALLOC

2012-02-22 Thread Torbjorn Granlund
Marc Glisse marc.gli...@inria.fr writes: That's for the alloca case. Without alloca, one call to malloc is better than two (although that usually also means the numbers are big and any gmp operation will dwarf allocation). Also, the threshold between alloca and malloc is quite high, and

Re: g++-3.4 bug

2012-02-20 Thread Torbjorn Granlund
Richard Guenther rguent...@suse.de writes: ... has have 4.2.x and 4.3.x. But it seems freebsd is stuck with 4.2.2, the last release with GPLv2. I suppose for freebsd testing should focus on LLVM. I think differently. I am developing GMP on FreeBSD systems, and use gcc there. I don't

Re: Status update: mini-gmp

2012-02-16 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: After copying the mini-gmp directory, I can build gmp with the attached patch. For the functions in dumbmp.c: Nice! * Most aren't needed any more. * Functions used in only one file are copied to that file. * The remaining few

Re: fixed-size mpn_mul_n for small n?

2012-02-13 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I have started to look a little into elliptic curve cryptography, and there the sizes are pretty small. E.g., Using the standard curve over a 256-bit prime means that numbers are just four limbs on a 64-bit machine. So in this case, I'd expect a

Re: Double factorial and primorial

2011-12-22 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: 95% of the time is spent in heap_remove, unsurprisingly. And to decrease the heap work, one could keep the small primes in an array outside of the heap (as you suggested). Or use a larger sieving table. Or use a more compact representation

Re: Divide and conquer binomial

2011-12-22 Thread Torbjorn Granlund
bodr...@mail.dm.unipi.it writes: I wrote some code to generalize the DC approach to binomial I tested it with bin(12345678901234567890123,k), here are the timings: k old-cycles new-cycles sizeinbase(res,2) 01630 1440 1 11830

Re: Double factorial and primorial

2011-12-20 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I don't know how important that is. Long-term, I think gmp should at least export some sieving functionality which us suitable as a building block for that. I agree we want a shared prime sieve, at least for internal use. It would be nice not

Re: Double factorial and primorial

2011-12-20 Thread Torbjorn Granlund
I'd suggest that we add both primorial functions: 1. product of all primes = n 2. product of the smalles n primes I have no idea about which of them is most common, or what they should be named. -- Torbjörn ___ gmp-devel mailing list

Re: Binomial improvements

2011-12-11 Thread Torbjorn Granlund
bodr...@mail.dm.unipi.it writes: On shell, for binomial(1234,k) the faster algorithm is: - with k 14, the current GMP 5.0.1 basecase; - with 14 k 540, your_bdiv_uiui code; - with 540 k (620), the code using prime sieving. Nice, more improvement that I had anticipated. It is

Re: Binomial improvements

2011-12-10 Thread Torbjorn Granlund
bodr...@mail.dm.unipi.it writes: I tested the speed of your code with the many implementations I tested last year (all available on my web page, in the same file as fac_ui). On shell, for binomial(1234,k) the faster algorithm is: - with k 35, the current GMP 5.0.1 basecase (or my

Re: Binomial improvements

2011-12-10 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: For the smallest n (where the result is known to fit in one limb), maybe one could do the simple thing and just compute n (n-1) ...(n-k+1) k^{-1} (k-1)^{-1} ... (mod B) one factor at a time (and with a table of the needed inverses)?

Re: Binomial improvements

2011-12-10 Thread Torbjorn Granlund
bodr...@mail.dm.unipi.it writes: I tested the speed of your code with the many implementations I tested last year (all available on my web page, in the same file as fac_ui). On shell, for binomial(1234,k) the faster algorithm is: - with k 35, the current GMP 5.0.1 basecase (or my slight

Re: Binomial improvements

2011-12-09 Thread Torbjorn Granlund
I rewrote the new code using the same basic algorithm. This is now probably close to correct. It passes lots of tests. The code is about 2.5 times faster than the old code, at least for arguments of the type 2n over n. (This code as well as the old code is still slow for large operands; we

Re: fac_ui rewrote.

2011-12-08 Thread Torbjorn Granlund
Torbjorn Granlund t...@gmplib.org writes: Using configure --enable-alloca=debug --enable-assert I tracked down this problem. I pushed a fix that I feel confident to be correct. Please review it. There ars still problems with kolga.bibsys.no, a POWER6 machine. It looks like a compiler

Re: Rethinking GMP's configure

2011-12-06 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: This may get in the way, but that should not be a real problem. An exported function like mpn_add_n should point to a wrapper which jumps via a table. User code should never see any pointer to the internal functions which are pointed to in that

Re: Support for w*ndows

2011-12-02 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: I put on my TODO list after you sent your initial message to setup nightly testing using wine. Does user-level instructions run natively on the CPU, or is there qemu-like translation going

Re: Support for w*ndows

2011-12-01 Thread Torbjorn Granlund
Torbjorn Granlund t...@gmplib.org writes: Many files are now converted to support Windoze-64. At least on one machine, the resulting performance is promising; it is just a few percent slower than on ELF systems. Now, --enable-fat works too on Windoze-64. -- Torbjörn

Re: Support for w*ndows

2011-11-26 Thread Torbjorn Granlund
How exactly does W64_ENTRY look like? (I suppose W64_EXIT is tiny.) While this idea is appealing, I worry that the overhead might hurt small bignum performance, compared to even the C code. Have you considered I more sophisticated approach with renaming registers? Something along these lines:

Re: New thresholds in table

2011-11-17 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I've checked a few machines, and there the very small thresholds seem bogus. Some day ago I changed min_size up again, from 10 to 50. But I still got some suspiciously small thresholds on several machines, including on shell. Now I've checked

Re: New thresholds in table

2011-11-14 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: I have tweaked the tuneup parameters a bit. I removed min_size for HGCD_APPR_THRESHOLD (was 30, now uses the default of 10). This threshold is definitely lower than I had expected. On http://gmplib.org/devel/HGCD_APPR_THRESHOLD.html, it's 10 on

New thresholds in table

2011-11-12 Thread Torbjorn Granlund
I have added HGCD_APPR_THRESHOLD and HGCD_REDUCE_THRESHOLD to http://gmplib.org/devel/thresholds.html. -- Torbjörn ___ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel

Re: Cancellation with hgcd / unbalanced mulmod_bnm1

2011-11-12 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Any other tricks? I think you forgot the save-the-transform trick. Since b is implicitly invariant over several a_i pieces, evaluating b once will be a slight win. -- Torbjörn ___ gmp-devel mailing list

Re: Cancellation with hgcd / unbalanced mulmod_bnm1

2011-11-12 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: I think you forgot the save-the-transform trick. Since b is implicitly invariant over several a_i pieces, evaluating b once will be a slight win. That's a bit difficult to do with current

Re: [PATCH] Improve System z support and add some tuning

2011-10-07 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Torbjorn Granlund t...@gmplib.org writes: (Perhaps we ought to stop calling GMP's choice of ABI+limbsize for ABI. We're abusing the term.) Well, the *GMP* ABI is defined by the system ABI + choice of limbsize and nails, right? Yes

<    1   2   3   4