Hi , For the below code x86_64 is able to vectorize.
#define LEN 32000 __attribute__((aligned(32))) float a[LEN], b[LEN],c[LEN]; void test() { for (int i = 0; i < LEN; i++) { if (b[i] > (float)0.) { a[i] = b[i]; } } } X86_64 ASM L2: vmovaps b(%rax), %ymm0 vcmpltps %ymm0, %ymm2, %ymm1 ⇐ Set masks vmaskmovps %ymm0, %ymm1, a(%rax) ⇐ store b[i] to a[i] when the mask is true addq $32, %rax cmpq $128000, %rax jne .L2 In Aarch64, We have BIT and FCMLT instructions. Is it possible to vectorize them like this? ldr q1, [x1] fcmlt v0.4s, v1.4s, #0 ⇐ set mask bit v2.16b, v1.16b, v0.16b ⇐ select str q2, [x1] BIT instructions accepts only 8b or 16B. But my assumption is that it is doing bit by bit copying and fcmlt will set corresponding element bits to 1 if condition is true else sets it to false. We can use bit for any modes. Regards, Venkat.