Hi Some timings for bobcat in svn size=1000 limbs
./benchmpn cpu bobcat add_n 2538 sub_n 2528 mul_1 5064 this is the 1-way unrolled core2 code addmul_1 5308 again this is core2 code submul_1 5292 mul_2 addmul_2 submul_2 addadd_n 3715 addsub_n 3696 subadd_n 3719 lshift 3784 rshift 3546 lshift2 2536 rshift2 2776 lshift1 1912 rshift1 3546 addlsh1_n 3293 sublsh1_n 3541 addlsh_n 4913 sublsh_n 4918 inclsh_n 4926 declsh_n 4915 rsh1add_n 3027 rsh1sub_n 3033 sumdiff_n 4715 store 1157 copyi 1786 copyd 1659 rsblsh1_n addlsh2_n rsblsh2_n popcount 5876 nehalem code hamdist 6784 k10 code com 1789 not 1118 and_n 2531 xor_n 2530 ior_n 2542 nand_n 2537 nior_n 2544 xnor_n 2546 andn_n 2533 iorn_n 2534 lshiftc 4543 divexact_byff 2039 divexact_byfobm1 6261 divexact_by3 6246 divexact_1 15062 modexact_1c_odd 15084 add_err1_n 4810 sub_err1_n 4802 divrem_euclidean_qr_1 19548 divrem_euclidean_qr_2 40166 divrem_euclidean_r_1 8261 divrem_1 21478 divrem_2 28162 divrem_hensel_qr_1 12568 divrem_hensel_qr_1_1 14104 divrem_hensel_qr_1_2 12606 divrem_hensel_r_1 14064 divrem_hensel_rsh_qr_1 16603 rsh_divrem_hensel_qr_1 13170 rsh_divrem_hensel_qr_1_1 14112 rsh_divrem_hensel_qr_1_2 13176 mod_1_1 12003 mod_1_2 7556 mod_1_3 8062 C code was faster than asm , but slower than the the asm mod_1_2 !!! mod_1_4 mod_34lsub1 3045 mul_basecase runs at about 6c/l so clearly from the above we can do better A quick test gives us 6c for rax , 7c for rdx , thruput of 5c for mul , although these maximum values , could be better than this. I've got some more on this... Jason -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.