Hi , 

For the below code x86_64 is able to vectorize.

#define LEN 32000
__attribute__((aligned(32))) float a[LEN], b[LEN],c[LEN];
void test()
{
for (int i = 0; i < LEN; i++) {
   if (b[i] > (float)0.) {
                a[i] = b[i];
   }
}
}

X86_64 ASM 

L2:
        vmovaps b(%rax), %ymm0
        vcmpltps        %ymm0, %ymm2, %ymm1 ⇐ Set masks 
        vmaskmovps      %ymm0, %ymm1, a(%rax) ⇐ store b[i] to a[i] when the 
mask is true
        addq    $32, %rax
        cmpq    $128000, %rax
        jne     .L2

In Aarch64, We have BIT and FCMLT instructions.
Is it possible to vectorize them like this?

        ldr     q1, [x1]
        fcmlt v0.4s, v1.4s, #0 ⇐ set mask 
        bit   v2.16b, v1.16b, v0.16b ⇐ select 
        str     q2, [x1] 

BIT instructions accepts only 8b or 16B.  But  my assumption is that it is 
doing bit by bit copying  and  fcmlt will set corresponding element bits to 1 
if condition is true else sets it to false.
We can use bit for any modes.


Regards,
Venkat.



Reply via email to