On 9/23/20 6:48 AM, LIU Zhiwei wrote:
>> + for (i = 0; i < opr_sz_8; i += 2) {
>> uint64_t d0, d1;
>> - d0 = n[i * 4 + 0] * (uint64_t)m_indexed[i * 4 + 0];
>> + d0 = a[i + 0];
> Add once.
>> + d0 += n[i * 4 + 0] * (uint64_t)m_indexed[i * 4 + 0];
>> d0
On 2020/9/23 22:46, Richard Henderson wrote:
On 9/23/20 3:01 AM, LIU Zhiwei wrote:
On 2020/9/19 2:37, Richard Henderson wrote:
For SVE, we potentially have a 4th argument coming from the
movprfx instruction. Currently we do not optimize movprfx,
so the problem is not visible.
Hi Richard,
On 9/23/20 3:01 AM, LIU Zhiwei wrote:
>
>
> On 2020/9/19 2:37, Richard Henderson wrote:
>> For SVE, we potentially have a 4th argument coming from the
>> movprfx instruction. Currently we do not optimize movprfx,
>> so the problem is not visible.
> Hi Richard,
>
> I am a little confused. If it
On 2020/9/19 2:37, Richard Henderson wrote:
For SVE, we potentially have a 4th argument coming from the
movprfx instruction. Currently we do not optimize movprfx,
so the problem is not visible.
Signed-off-by: Richard Henderson
---
target/arm/helper.h | 20 +++---
target/arm/
On 2020/9/19 2:37, Richard Henderson wrote:
For SVE, we potentially have a 4th argument coming from the
movprfx instruction. Currently we do not optimize movprfx,
so the problem is not visible.
Hi Richard,
I am a little confused. If it is not immediately preceded by a MOVPRFX
instruction,
For SVE, we potentially have a 4th argument coming from the
movprfx instruction. Currently we do not optimize movprfx,
so the problem is not visible.
Signed-off-by: Richard Henderson
---
target/arm/helper.h | 20 +++---
target/arm/sve.decode | 7 +-
target/arm/translate