Re: [PATCH] RISC-V: Synthesize power-of-two constants.

2023-05-30 Thread Philipp Tomsich
Assuming a fully pipelined vector unit (and from experience on AArch64), an u-arch's scalar-to-vector move cost is likely to play a significant role in whether this will be profitable or not. --Philipp. On Wed, 31 May 2023 at 00:10, Jeff Law via Gcc-patches wrote: > > > > On 5/30/23 16:01, 钟居哲

Re: [PATCH] RISC-V: Synthesize power-of-two constants.

2023-05-30 Thread Jeff Law via Gcc-patches
On 5/30/23 16:13, 钟居哲 wrote: Ok. I prefer just keep scalar load + vmv.v.x by default since I believe most machines prefer this way. Seems quite reasonable to me. jeff

Re: Re: [PATCH] RISC-V: Synthesize power-of-two constants.

2023-05-30 Thread 钟居哲
Ok. I prefer just keep scalar load + vmv.v.x by default since I believe most machines prefer this way. juzhe.zh...@rivai.ai From: Jeff Law Date: 2023-05-31 06:09 To: 钟居哲; andrew; rdapp.gcc CC: gcc-patches; kito.cheng; palmer Subject: Re: [PATCH] RISC-V: Synthesize power-of-two constants

Re: [PATCH] RISC-V: Synthesize power-of-two constants.

2023-05-30 Thread Jeff Law via Gcc-patches
On 5/30/23 16:01, 钟居哲 wrote: I agree with Andrew. And I don't think this patch is appropriate for following reasons: 1. This patch increases vector workload in machine since      it convert scalar load + vmv.v.x into vmv.v.i + vsll.vi. This is probably uarch dependent. I can probably

Re: Re: [PATCH] RISC-V: Synthesize power-of-two constants.

2023-05-30 Thread 钟居哲
I disagree with this patch. Thanks. juzhe.zh...@rivai.ai From: Andrew Waterman Date: 2023-05-31 04:18 To: Robin Dapp CC: gcc-patches; Kito Cheng; palmer; juzhe.zh...@rivai.ai; jeffreyalaw Subject: Re: [PATCH] RISC-V: Synthesize power-of-two constants. This turns out to be a de-optimization for imple

Re: [PATCH] RISC-V: Synthesize power-of-two constants.

2023-05-30 Thread Andrew Waterman via Gcc-patches
This turns out to be a de-optimization for implementations with any amount of temporal execution (which is most machines with LMUL > 1 and even some machines with LMUL <= 1). Scalar instructions are generally cheaper than multi-cycle-occupancy vector operations, so reducing scalar work by