Re: [PATCH 2/2] Optimize 64-bit mpn_add_N and mpn_sub_N for sparc T3 and later.

2013-03-07 Thread Niels Möller
Torbjorn Granlund t...@gmplib.org writes: I optimised submul_1.asm, and then edited both addmul_1 and submul_1 to use as similar operand order as possible. So remaining differences are necessary? I don't remember much sparc assembly, but it seems carry handling is done slightly differently.

Re: mpn_cnd_add_n

2013-03-07 Thread Torbjorn Granlund
Here's a patch that reorders the arguments for mpn_addcnd_n and mpn_subcnd_n (I think it's best to keep this change separate from the renaming, since the potential problems are quite different). It's tested on x86_64, arm, and with --disable-assembly. I've run a regular make check and

T3/T3 mul_2 and addmul_2

2013-03-07 Thread Torbjorn Granlund
I wrote 4-way unrolled mul_2 and addmul_2 for T3/T4. The FAKE_T3 stuff includes missing.m4, which impelements some instructions missing from my old systems around here. I might retain that stuff for a while to allow local regression testing, even if it is a bit ugly. Could you please run time

Re: GMP and CUMP

2013-03-07 Thread Torbjorn Granlund
romes p romes_12...@yahoo.com writes: Hello developers I noticed that there is also a CUMP site http:/www.hpcs.cs.tsukuba.ac.jp/~nakayama/cump/ Sheesh, the guy has copyied and edited the GMP webpages and now claims the default all rights reserved with himself as owner. Not a serious

Re: T3/T3 mul_2 and addmul_2

2013-03-07 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org Date: Thu, 07 Mar 2013 20:58:51 +0100 I wrote 4-way unrolled mul_2 and addmul_2 for T3/T4. The FAKE_T3 stuff includes missing.m4, which impelements some instructions missing from my old systems around here. I might retain that stuff for a while to

Re: T3/T3 mul_2 and addmul_2

2013-03-07 Thread Torbjorn Granlund
I only now spotted FPMADDXHI and FPMADDX. No Sun/Oracle SPARC hae been a floating-point demon, and these intger multiply instructions are performed in the fpu. Multiply-accumulate instructions are tricky, since one may easily put the accumulation on a carry recurrency path, and thereby kill

Re: T3/T3 mul_2 and addmul_2

2013-03-07 Thread David Miller
From: Torbjorn Granlund t...@gmplib.org Date: Thu, 07 Mar 2013 20:58:51 +0100 I'm reasonably sure this is correct. Needs some work still: davem@patience:~/src/GMP/HG/build-sparc64-ultrasparct4/tests/devel$ ./try -s1-10 mpn_addmul_2 pagesize is 0x2000 bytes s[0] 0xf80100048000 to