Re: [PATCH 1/3] Optimize 32-bit sparc T1 multiply routines.

2013-03-05 Thread David Miller
From: Torbjorn Granlund Date: Tue, 05 Mar 2013 13:42:58 +0100 > Note that ALIGN between ASM_START and PROLOGUE is ineffective on this > platform. If stricter alignment is needed for function starts (but not > loop starts?) then we need to override the default PROLOGUE_cpu. Thanks for pointing t

Re: [PATCH 1/3] Optimize 32-bit sparc T1 multiply routines.

2013-03-05 Thread Torbjorn Granlund
David Miller writes: * mpn/sparc32/ultrasparct1/mul_1.asm (mpn_mul_1): Unroll main loop one time, align code on 32-byte boundary, add T2/T3/T4 timings. * mpn/sparc32/ultrasparct1/addmul_1.asm (mpn_addmul_1): Likewise. * mpn/sparc32/ultrasparct1/submul_1.asm (mpn_su

[PATCH 1/3] Optimize 32-bit sparc T1 multiply routines.

2013-03-05 Thread David Miller
* mpn/sparc32/ultrasparct1/mul_1.asm (mpn_mul_1): Unroll main loop one time, align code on 32-byte boundary, add T2/T3/T4 timings. * mpn/sparc32/ultrasparct1/addmul_1.asm (mpn_addmul_1): Likewise. * mpn/sparc32/ultrasparct1/submul_1.asm (mpn_submul_1): Likewise. ---