https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97903
Bug ID: 97903 Summary: [ARM NEON] Missed optimization in lowering test operation Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: prathamesh3492 at gcc dot gnu.org Target Milestone: --- Hi, For the following test-case: #include <arm_neon.h> uint8x8_t f1(int8x8_t a, int8x8_t b) { return (uint8x8_t) ((a & b) != 0); } uint8x8_t f2(int8x8_t a, int8x8_t b) { return vtst_s8 (a, b); } Code-gen: f2: vtst.8 d0, d0, d1 bx lr f1: vmov.i32 d16, #0 @ v8qi vand d1, d0, d1 vmov.i32 d17, #0xffffffff @ v8qi vceq.i8 d1, d1, d16 vbsl d1, d16, d17 vmov d0, d1 @ v8qi bx lr The optimized dump for f1 shows: _1 = a_4(D) & b_5(D); _3 = .VCOND (_1, { 0, 0, 0, 0, 0, 0, 0, 0 }, { -1, -1, -1, -1, -1, -1, -1, -1 }, { 0, 0, 0, 0, 0, 0, 0, 0 }, 113); _6 = VIEW_CONVERT_EXPR<uint8x8_t>(_3); I think we miss opportunity to combine AND followed by VCOND into a vector test instruction. Should we add a .VTEST internal function that expands to vtst ? Or alternatively, add a peephole pattern in backend ? Thanks, Prathamesh