On 15 November 2011 09:19, Richard Sandiford
wrote:
> Revital Eres writes:
>>> chain, so what makes the SMS version of it worse than the non-SMS version?
>>
>> I attached the SMS dump file. The problematic loop is the one with
>> "SMS succeeded 36 2" (there are three loops in total in this file).
Hi,
> Anyway, I think this explains why the non-SMS loop executes more
> quickly than GCC expects, and why the SMS loop is slower than it
> needs to be. It might be worth comparing the two loops with
> -mtune=cortex-a8.
Thanks for the detailed explanation!
I see this regression on cortex-a8 as
Revital Eres writes:
>> chain, so what makes the SMS version of it worse than the non-SMS version?
>
> I attached the SMS dump file. The problematic loop is the one with
> "SMS succeeded 36 2" (there are three loops in total in this file).
> Due to these accumulators min ii is 36 which seems to ca