ni...@lysator.liu.se (Niels Möller) writes:
So it should be doable with the addmul_1 loop and two additional,
non-recurrency, not instructions per limb, and then maybe some extra
logic for the return value. One could aim for 4.25 c/l, I guess.
The below seems to give correct results. But
ni...@lysator.liu.se (Niels Möller) writes:
1. I guess one can expect submul_1 to always be a bit slower than
addmul_1, since submul_1 needs additional arithmetics besides the
umaal? One could perhaps do some negations on the fly, a - b C = -
((-a) + b*C), maybe that
ni...@lysator.liu.se (Niels Möller) writes:
For large operands, it's strictly between add_n and addmul_1, which I
guess is as expected. For small sizes, I had a look at the loop setup
for add_n, which checks bit 0 and 1 of n separately. If that's faster,
maybe one could borrow that logic.
ni...@lysator.liu.se (Niels Möller) writes:
ni...@lysator.liu.se (Niels Möller) writes:
So it should be doable with the addmul_1 loop and two additional,
non-recurrency, not instructions per limb, and then maybe some extra
logic for the return value. One could aim for 4.25 c/l, I
Torbjorn Granlund t...@gmplib.org writes:
Have you considered complementing C instead?
Not until now. Actually looks nice:
A - b C = A + b (~C) + b - b B^n
So this saves one not instruction, and we have to add and subtract the
scalar b from incoming and outgoing carry.
Regards,
/Niels
--
David,
First mul_1, renamed again, now encoding the load scheduling. Only the
6c variant is new. Please time it. If it doesn't run at 3 c/l, then
there are 2 simple things to try, indicated in a comment.
sparct34-mul_1-3c.asm
Description: Binary data
sparct34-mul_1-6c.asm
Description:
David Miller da...@davemloft.net writes:
First mul_1, renamed again, now encoding the load scheduling. Only the
6c variant is new. Please time it. If it doesn't run at 3 c/l, then
there are 2 simple things to try, indicated in a comment.
Looks exciting, I'll play around with this
From: Torbjorn Granlund t...@gmplib.org
Date: Thu, 04 Apr 2013 02:40:58 +0200
David Miller da...@davemloft.net writes:
First mul_1, renamed again, now encoding the load scheduling. Only the
6c variant is new. Please time it. If it doesn't run at 3 c/l, then
there are 2 simple
David Miller da...@davemloft.net writes:
Please don't do this, you checked in code that doesn't even compile
again.
Easy to fix. Please pull again.
I was just starting to work on getting the information for you
so this is very disappointing. :-/
Well, bugs happen.
--
Torbjörn
David Miller da...@davemloft.net writes:
I can tell by looking at the commit that it's still broken, can you
please stop jumping the gun and simply be patient enough for me to
test things out?
Since I am wrapping up, I wanted to push things and clean out unfinished
things.
Why is is
10 matches
Mail list logo