[PATCH][AArch64] Use intrinsics for widening multiplies (PR91598)

2020-03-06 Thread Wilco Dijkstra
Inline assembler instructions don't have latency info and the scheduler does not attempt to schedule them at all - it does not even honor latencies of asm source operands. As a result, SIMD intrinsics which are implemented using inline assembler perform very poorly, particularly on in-order cores.

Re: [PATCH][AArch64] Use intrinsics for widening multiplies (PR91598)

2020-03-06 Thread Richard Sandiford
> +;; vmlal_lane_s16 intrinsics > +(define_insn "aarch64_vec_mlal_lane" > + [(set (match_operand: 0 "register_operand" "=w") > + (plus: (match_operand: 1 "register_operand" "0") > + (mult: > + (ANY_EXTEND: > + (match_operand: 2 "register_operand" "w")) > + (ANY_

Re: [PATCH][AArch64] Use intrinsics for widening multiplies (PR91598)

2020-03-09 Thread Christophe Lyon
On Fri, 6 Mar 2020 at 16:03, Wilco Dijkstra wrote: > > Inline assembler instructions don't have latency info and the scheduler does > not attempt to schedule them at all - it does not even honor latencies of > asm source operands. As a result, SIMD intrinsics which are implemented using > inline a

Re: [PATCH][AArch64] Use intrinsics for widening multiplies (PR91598)

2020-03-09 Thread Wilco Dijkstra
Hi Christophe, > I noticed a regression introduced by Delia's patch "aarch64: ACLE > intrinsics for BFCVTN, BFCVTN2 and BFCVT": > (on aarch64-linux-gnu) > FAIL: g++.dg/cpp0x/variadic-sizeof4.C  -std=c++14 (internal compiler error) > > I couldn't reproduce it with current ToT, until I realized that

Re: [PATCH][AArch64] Use intrinsics for widening multiplies (PR91598)

2020-03-09 Thread Andrew Pinski
On Mon, Mar 9, 2020 at 10:26 AM Wilco Dijkstra wrote: > > Hi Christophe, > > > I noticed a regression introduced by Delia's patch "aarch64: ACLE > > intrinsics for BFCVTN, BFCVTN2 and BFCVT": > > (on aarch64-linux-gnu) > > FAIL: g++.dg/cpp0x/variadic-sizeof4.C -std=c++14 (internal compiler error)