The KR support was removed a year or two ago. But we still have some
clutter remaining from it. I suppose __GMP_PROTO is an example.
--
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel
ni...@lysator.liu.se (Niels Möller) writes:
Perhaps this is the reason for keeping redc separate?
IIRC, bdiv functions return a borrow, meaning that the remainder
corresponding to the computed quotient is negative, while red returns a
carry which means that the computed remainder is a
ni...@lysator.liu.se (Niels Möller) writes:
Torbjorn Granlund t...@gmplib.org writes:
Surely a plain TMP_ALLOC adds red zones? If not, that is something we
ought to fix.
But
tp = TMP_ALLOC_LIMBS (2*n);
xp = tp + n;
does not add any between T and X (intended
ni...@lysator.liu.se (Niels Möller) writes:
What about the test in
#define TMP_ALLOC(n) \
(LIKELY ((n) 65536) ? TMP_SALLOC(n) : TMP_BALLOC(n))
That test will cost a cycle or two for each TMP_ALLOC call (with
non-constant n), regardless of size, won't it?
I think my previous
Inspired by Marc{,o}'s cleanup commits, I decided to look for TMP_ALLOC*
calls that should be made into TMP_SALLOC* or TMP_BALLOC*. Then I
spotted this unrelated thing:
tp = TMP_ALLOC_LIMBS(talloc);
if (usize n)
{
mpn_tdiv_qr (tp, up, 0, up, usize, vp, n);
if (mpn_zero_p
ni...@lysator.liu.se (Niels Möller) writes:
Torbjorn Granlund t...@gmplib.org writes:
Except that redc does not accept independent sizes. Therefore we need
to use bdiv_qr.
I think it should be easy to generalize at least redc_1 and redc_2 to
accept independent sizes. Might
ni...@lysator.liu.se (Niels Möller) writes:
Here's a sketch of an adddmul_2 iteration using Karatsuba. I assume we
have vl, vh, vd = |vl - vh| and an appropriate sign vmask in registers
before the loop. Carry input in c0, c1, carry out in r2, r3.
mov (up), %rax
mov
ni...@lysator.liu.se (Niels Möller) writes:
One can decrease it a bit by adding c0, c1 earlier (do you think
recurrency can be a problem if we add c0, c1 to the first product?) and
doing an in-place add to (rp) and 8(rp) at the end.
I could get it down to 30 instructions with a deep
I doubt we can make addmul_1 run faster on sandybridge.
But I'd like mul_basecase to run much faster than 3 c/l. Then
sqr_basecase and redc_1, redc_2 should be fixed.
An addmul_2 running better at 3 c/l or better would be great. That
means we need to handle a tick in it using = 17 insns,
Marc Glisse marc.gli...@inria.fr writes:
is there any objection if I replace most uses of -_mp_alloc by calls
to the ALLOC macro in mp[zqf] (and similarly for _mp_size, etc)? It
helps when experimenting... I am also considering moving the NUM and
DEN macros from test/mpq/t-cmp* to
bodr...@mail.dm.unipi.it writes:
Unrelated :-) We might define more macros like TMP_ALLOC_LIMBS_2 . I mean
_3 and _4. So that they can be used to reduce the number of allocations.
Do you agree? (I just touched mpz/gcdext.c, and _4 should be used there).
I'd vote for killing
Marc Glisse marc.gli...@inria.fr writes:
That's for the alloca case. Without alloca, one call to malloc is
better than two (although that usually also means the numbers are big
and any gmp operation will dwarf allocation). Also, the threshold
between alloca and malloc is quite high, and
Richard Guenther rguent...@suse.de writes:
... has have 4.2.x and 4.3.x. But it seems freebsd is stuck with 4.2.2,
the last release with GPLv2. I suppose for freebsd testing should focus
on LLVM.
I think differently. I am developing GMP on FreeBSD systems, and use
gcc there. I don't
ni...@lysator.liu.se (Niels Möller) writes:
After copying the mini-gmp directory, I can build gmp with the attached
patch. For the functions in dumbmp.c:
Nice!
* Most aren't needed any more.
* Functions used in only one file are copied to that file.
* The remaining few
ni...@lysator.liu.se (Niels Möller) writes:
I have started to look a little into elliptic curve cryptography, and
there the sizes are pretty small. E.g., Using the standard curve over a
256-bit prime means that numbers are just four limbs on a 64-bit
machine. So in this case, I'd expect a
ni...@lysator.liu.se (Niels Möller) writes:
95% of the time is spent in heap_remove, unsurprisingly.
And to decrease the heap work, one could keep the small primes in an
array outside of the heap (as you suggested). Or use a larger sieving
table. Or use a more compact representation
bodr...@mail.dm.unipi.it writes:
I wrote some code to generalize the DC approach to binomial
I tested it with bin(12345678901234567890123,k), here are the timings:
k old-cycles new-cycles sizeinbase(res,2)
01630 1440 1
11830
ni...@lysator.liu.se (Niels Möller) writes:
I don't know how important that is. Long-term, I think gmp should at
least export some sieving functionality which us suitable as a building
block for that.
I agree we want a shared prime sieve, at least for internal use. It
would be nice not
I'd suggest that we add both primorial functions:
1. product of all primes = n
2. product of the smalles n primes
I have no idea about which of them is most common, or what they should
be named.
--
Torbjörn
___
gmp-devel mailing list
bodr...@mail.dm.unipi.it writes:
On shell, for binomial(1234,k) the faster algorithm is:
- with k 14, the current GMP 5.0.1 basecase;
- with 14 k 540, your_bdiv_uiui code;
- with 540 k (620), the code using prime sieving.
Nice, more improvement that I had anticipated. It is
bodr...@mail.dm.unipi.it writes:
I tested the speed of your code with the many implementations I tested
last year (all available on my web page, in the same file as fac_ui). On
shell, for binomial(1234,k) the faster algorithm is:
- with k 35, the current GMP 5.0.1 basecase (or my
ni...@lysator.liu.se (Niels Möller) writes:
For the smallest n (where the result is known to fit in one limb), maybe
one could do the simple thing and just compute
n (n-1) ...(n-k+1) k^{-1} (k-1)^{-1} ... (mod B)
one factor at a time (and with a table of the needed inverses)?
bodr...@mail.dm.unipi.it writes:
I tested the speed of your code with the many implementations I tested
last year (all available on my web page, in the same file as fac_ui). On
shell, for binomial(1234,k) the faster algorithm is:
- with k 35, the current GMP 5.0.1 basecase (or my slight
I rewrote the new code using the same basic algorithm. This is now
probably close to correct. It passes lots of tests.
The code is about 2.5 times faster than the old code, at least for
arguments of the type 2n over n. (This code as well as the old code is
still slow for large operands; we
Torbjorn Granlund t...@gmplib.org writes:
Using configure --enable-alloca=debug --enable-assert I tracked down
this problem. I pushed a fix that I feel confident to be correct.
Please review it.
There ars still problems with kolga.bibsys.no, a POWER6 machine.
It looks like a compiler
ni...@lysator.liu.se (Niels Möller) writes:
This may get in the way, but that should not be a real problem. An
exported function like mpn_add_n should point to a wrapper which jumps
via a table. User code should never see any pointer to the internal
functions which are pointed to in that
ni...@lysator.liu.se (Niels Möller) writes:
Torbjorn Granlund t...@gmplib.org writes:
I put on my TODO list after you sent your initial message to setup
nightly testing using wine. Does user-level instructions run natively
on the CPU, or is there qemu-like translation going
Torbjorn Granlund t...@gmplib.org writes:
Many files are now converted to support Windoze-64. At least on one
machine, the resulting performance is promising; it is just a few
percent slower than on ELF systems.
Now, --enable-fat works too on Windoze-64.
--
Torbjörn
How exactly does W64_ENTRY look like? (I suppose W64_EXIT is tiny.)
While this idea is appealing, I worry that the overhead might hurt
small bignum performance, compared to even the C code.
Have you considered I more sophisticated approach with renaming
registers?
Something along these lines:
ni...@lysator.liu.se (Niels Möller) writes:
I've checked a few machines, and there the very small thresholds seem
bogus. Some day ago I changed min_size up again, from 10 to 50. But I
still got some suspiciously small thresholds on several machines,
including on shell. Now I've checked
ni...@lysator.liu.se (Niels Möller) writes:
I have tweaked the tuneup parameters a bit. I removed min_size for
HGCD_APPR_THRESHOLD (was 30, now uses the default of 10). This threshold
is definitely lower than I had expected. On
http://gmplib.org/devel/HGCD_APPR_THRESHOLD.html, it's 10 on
I have added HGCD_APPR_THRESHOLD and HGCD_REDUCE_THRESHOLD to
http://gmplib.org/devel/thresholds.html.
--
Torbjörn
___
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel
ni...@lysator.liu.se (Niels Möller) writes:
Any other tricks?
I think you forgot the save-the-transform trick.
Since b is implicitly invariant over several a_i pieces,
evaluating b once will be a slight win.
--
Torbjörn
___
gmp-devel mailing list
ni...@lysator.liu.se (Niels Möller) writes:
Torbjorn Granlund t...@gmplib.org writes:
I think you forgot the save-the-transform trick.
Since b is implicitly invariant over several a_i pieces,
evaluating b once will be a slight win.
That's a bit difficult to do with current
ni...@lysator.liu.se (Niels Möller) writes:
Torbjorn Granlund t...@gmplib.org writes:
(Perhaps we ought to stop calling GMP's choice of ABI+limbsize for ABI.
We're abusing the term.)
Well, the *GMP* ABI is defined by the system ABI + choice of limbsize
and nails, right?
Yes
301 - 335 of 335 matches
Mail list logo