Your suggestion to instead use powm in GMP does indeed make things faster.
I used the powm function which has been in GMP for a long time (this
time using GMP 4.3.2 because powm itself has reportedly been improved
in GMP 5):
8 =>36422
10 => 1518
Also, the reason we haven't optimised things for your machine is that
at present none of our build farms contains such a machine, none of
the developers owns one, none of our friends own one. It is just very
hard to get hold of one.
It is *absolutely impossible* for us to optimise the assembly
fun
Yes, it comes across the same way in English! You got your point across.
We don't document functions that are not available in GMP because when
they implement the same functions (which they have now done) they
invariably choose a different name (which they did), which means we
have to change the n
Hi cheater,
> Some of the program benchmarks that we have in our full benchmark
> suite tell a completely different story, putting MPIR well ahead for
> those sorts of things. They show that in an overall program, we do
> quite well.
Those programs are FAKE! Unfortunately for you I'm able to read
I had an even better idea. Your p6/mmx supports, from what I can
tell, sse2. Therefore you can probably try:
./configure --build=pentium4-unknown-linux-gnu ABI=32
This may actually speed things up for you! (No guarantees, but there
is a good chance it will work).
If you are not using linux, jus
Brilliant!!
2010/1/10 Case Vanhorsen :
> On Sun, Jan 10, 2010 at 2:02 PM, Bill Hart
> wrote:
>> Thanks Case,
>>
>> those are very useful timings. I see more of what I expected to see
>> for unbalanced multiplication, especially here:
>>
>> 5NxN mpz multiplication: MPIR 1.3.0 GMP 5.0
On Sun, Jan 10, 2010 at 2:02 PM, Bill Hart wrote:
> Thanks Case,
>
> those are very useful timings. I see more of what I expected to see
> for unbalanced multiplication, especially here:
>
> 5NxN mpz multiplication: MPIR 1.3.0 GMP 5.0.0
> 1000 digits: 0.2064 sec 0.000
Thanks! I think it is more likely to be the tuning than the compiler
in this case. But that is only a guess, based on what I know of the
way the FFT code works.
I see that all the new FFT TABLE2 tuning values are missing entirely
(not your fault, but ours). What you might like to try, when you fin
Yes, of course.
$ls -lrt gmp-mparam.h
lrwxrwxrwx 1 giangian27 2010-01-10 17:19 gmp-mparam.h -> mpn/
x86/p6/mmx/gmp-mparam.h
$ cat gmp-mparam.h
/* Intel P6/mmx gmp-mparam.h -- Compiler/machine parameter header
file.
Copyright 1991, 1993, 1994, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006
Actually, here are some really old timings I have which illustrate
this point. They are for a K8 Opteron system and compare GMP 4.2.1 (a
very old version) with GMP 0.9.0 (our first public version). About the
*only* difference was the assembly code. See how the change in speed
of that affected nearl
Sure, but as I mentioned, right at the start, the speed of
multiplication is critical for almost everything. That is why it is
the most important benchmark.
And the speed of multiplication is critically dependent on the speed
of the basecase assembly case, which, on your machine, is slower in
MPIR
Let's abandon GMP5 alone for a while.
On my CPU, GMP432 get this values:
Category base=> 1546, 1104
Program rsa (weight 1.00) => 398, 284
Program pi (weight 1.00) => 4.46, 3.18
Program bpsw (weight 1.00) => 2.31, 1.65
Program wagstaff (weight 1.00) => 10.7, 7.62
Program mersenne
Thanks Case,
those are very useful timings. I see more of what I expected to see
for unbalanced multiplication, especially here:
5NxN mpz multiplication:MPIR 1.3.0 GMP 5.0.0
1000 digits: 0.2064 sec 0.1863 sec
5000 digits: 0.00023417 sec 0.0002170
On Sun, Jan 10, 2010 at 1:18 PM, Bill Hart wrote:
> You are of course welcome to choose whichever package best meets your
> needs. And indeed on your particular system, it seems GMP may well do
> that for you at present.
>
> One thing you should bear in mind however. Here are some times as they
>
Just to illustrate the last point, here is the list of all assembly
files in MPIR that have been added or improved since MPIR 1.2.0, i.e.
since the last release:
32 bit x86 assembly code
A /mpir/trunk/mpn/x86/applenopic/aorsmul_1.asm
A /mpir/trunk/mpn/x86/applenopic/c
You are of course welcome to choose whichever package best meets your
needs. And indeed on your particular system, it seems GMP may well do
that for you at present.
One thing you should bear in mind however. Here are some times as they
have changed over the past year and a half:
K8
Multiplicatio
Indeed the p6/mmx uses the default values for those missing tuning
values, which are in gmp-impl.h, for example MUL_TOOM4_THRESHOLD is
set to 400, which is reasonable.
It's curious that the FFT still triggers the assert with the last 6
lines replaced with the original ones that we provided.
Thank
It seems that also on your platform (32 bits you too?) MPIR is faster
only for one thing: multiplication (or squaring) above 10 digits,
up to 30%.
And slower almost everywhere... somewhere +100% or more...
This strengthen my decision...
Gian.
On 10 Gen, 18:47, Case Vanhorsen wrote:
> I'll t
Ok, I'm trying this too: selectively taking some output (all except
last 6 lines FFT_MUL and SQR_MUL) of make tune, and copy it into gmp-
mparar.h (to try to work around code instability that simply tuning
triggers)...
--MPIR-1.3.0-rc4--
$ mpir_bench_two/bench_two
Running MPIR benchmark
GenuineIn
That's very interesting. It clearly shows that the further from
"balanced" that the division is, the worse our (ancient) code
performs.
I'm certainly looking forward to sorting out our division code.
Bill.
2010/1/10 Case Vanhorsen :
> I'll toss in my benchmark results. :-)
>
>
Actually, if you take the tuning values that are missing at the end of
the tuning file and replace them with the values we supplied
(MUL_FFT_TABLE, SQR_FFT_TABLE, etc) you will probably find the
assertion goes away. The assertion was left there on purpose to
indicate that the FFT is most definitely
You'll find that all the very large stuff will be markedly worse after
tuning. One of the reasons we are replacing the FFT is that at present
the only people that can tune it are us, and it takes hours!
But thanks again for the comparisons.
Bill.
2010/1/10 Gianrico Fini :
> I see the comparison
I see the comparison is more balanced on you machine. The impressive
unbalance is in the numbers of the Mersenne and Fermat test.
By the way, I tested it all again with gcc-4.4 and with the result
from (cd tune;make tune) inserted in gmp-mparam.h... for all the three
libraries.
Results for GMP-5.
I'll toss in my benchmark results. :-)
GMPY performance benchmark
Decimal string to mpz: MPIR 1.3.0 GMP 5.0.0
10 digits: 0.0021 sec 0.0022 sec
100 digits: 0.0063 sec 0.0066 sec
500 digits: 0.
Sure, it's interesting to compare.
On my 64 bit machine (Selmer):
K10-2:
Squaring: MPIR 1.3.0 GMP 5.0.0
===
128 x 128 : 56715728 55997671
512 x 512 : 11350749 13487276
8192 x 8192 : 149696 151687
131072 x 131072 :
Sorry, I don't have a 64-bit processor... I'm working on somehow old
hardware, usually.
Anyway I think there is also another problem in my measure, I did not
"tune".
I was trying an update of the compiler to gcc-4.4...
Then I'll recompile the three libraries, retune them, recompile again,
and test
26 matches
Mail list logo