Torbjorn Granlund t...@gmplib.org writes:
* The code is no win for AMD k10/k8 (although close to 10 c/l might well be
possible)
I tried replacing one masking op by cmov, as you suggested. We then get
down to 11.25 c/l on K10. I put this modified version in the k10
subdirectory, since it was
ni...@lysator.liu.se (Niels Möller) writes:
Torbjorn Granlund t...@gmplib.org writes:
* The code is no win for AMD k10/k8 (although close to 10 c/l might well be
possible)
I tried replacing one masking op by cmov, as you suggested. We then get
down to 11.25 c/l on K10. I put
I turned out the code was a bit slower on k8.
This patch changes that. With it applied, things takes 11 c/l on both
pipelines. This is also a 2 c/l improvement for piledriver.
I have not tested that this is correct. If you like the patch, please
consider putting the result in the k8 subdir.
Torbjorn Granlund t...@gmplib.org writes:
I turned out the code was a bit slower on k8.
This patch changes that. With it applied, things takes 11 c/l on both
pipelines. This is also a 2 c/l improvement for piledriver.
Cool.
I have not tested that this is correct. If you like the patch,
I played more with the code, now trying to break the add-adc-sbb-cmov
chain, for the benefit of most Intel processors.
But I lack unit testing code for the function, making hacking quite
cumbersome. I don't feel safe hacking *any* GMP assembly code without
tests/devel/try.c's function and access
Torbjorn Granlund t...@gmplib.org writes:
But I lack unit testing code for the function, making hacking quite
cumbersome. I don't feel safe hacking *any* GMP assembly code without
tests/devel/try.c's function and access checks.
tests/mpn/t-div.c includes tests for mpn_div_qr_1, including
ni...@lysator.liu.se (Niels Möller) writes:
ni...@lysator.liu.se (Niels Möller) writes:
But sure, support also in try.c would be good.
Added now. Please have a look if it the changes are sane. I use the
second source for the uh input, and I added a DATA_DIV_QR_1 to get it in
the
ni...@lysator.liu.se (Niels Möller) writes:
ni...@lysator.liu.se (Niels Möller) writes:
But sure, support also in try.c would be good.
Added now.
And sure enough, it detects some bugs in the new assembly code. For size
n==1, there's a missing mov. I'll add that shortly. Then there's another
ni...@lysator.liu.se (Niels Möller) writes:
And sure enough, it detects some bugs in the new assembly code. For size
n==1, there's a missing mov. I'll add that shortly. Then there's another
problem with n==2, which needs a bit more debugging.
Good. So now you have debugged the new try.c
I added data for the new code at http://gmplib.org/devel/asm.html.
There is a line for div_qr_1u_pi1 as well, since that will also be
needed. It might actually be more common that the divisor is not
normalised.
I should try to wrap up div_qr_1n_pi2 and div_qr_1u_pi2 as well, and
then add
10 matches
Mail list logo