Re: Basecase assembly optimisation project

2013-10-03 Thread Torbjorn Granlund
ni...@lysator.liu.se (Niels Möller) writes: Another feature, which I looked into while ago without getting very far with the loopmixer, is to make it understand associativity. I.e, try reordering certain instructions with the same destination register, like xor %r8, %rax xor %r9,

Re: Basecase assembly optimisation project

2013-10-03 Thread Torbjorn Granlund
Ondřej Bílka nel...@seznam.cz writes: It is possible enchancement, but I am not yet at stage of calculating register dependencies on jumps. That's someting we do, but we only handle a simple jump-back for the loop. (That branch limitation is a slght problem for some division loops, which

Re: Basecase assembly optimisation project

2013-10-02 Thread Torbjorn Granlund
Ondřej Bílka nel...@seznam.cz writes: I am writing a tool that might be useful, a simple optimizer of assembly routines. You need to write a benchmark that measures performance and prints elapsed time and assembly file. Currently it has two optimization patterns, first is enclosing block

Re: Basecase assembly optimisation project

2013-10-02 Thread Ondřej Bílka
On Wed, Oct 02, 2013 at 08:19:29PM +0200, Niels Möller wrote: Torbjorn Granlund t...@gmplib.org writes: We don't handle alternatives currently, except with a loop around the loopmixer. One could think of several classes, some known to the tool, others not-always-valid only by source file

Basecase assembly optimisation project

2013-09-26 Thread Torbjorn Granlund
For the last few months, I have been working on writing and rewriting basecase code for X64-64 processors. The result is now in the mainline GMP repo. The basecase code I have focused on is: mul_basecase, sqr_basecase, mullo_basecase, and Hensel remainder via redc_1. At the start of this