[RFC PATCH] RISC-V: Add Zawrs ISA extension support

2022-06-01 Thread Christoph Muellner via Gcc-patches
This patch adds support for the Zawrs ISA extension. The patch depends on the corresponding Binutils patch to be usable (see [1]) The specification can be found here: https://github.com/riscv/riscv-zawrs/blob/main/zawrs.adoc Note, that the Zawrs extension is not frozen or ratified yet. Therefore

[PATCH v3 9/9] RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]

2022-05-26 Thread Christoph Muellner via Gcc-patches
Atomic instructions require zero-offset memory addresses. If we allow all addresses, the nonzero-offset addresses will be prepared in an extra register in an extra instruction before the actual atomic instruction. This patch introduces the predicate "riscv_sync_memory_operand", which restricts the

[PATCH v3 8/9] RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]

2022-05-26 Thread Christoph Muellner via Gcc-patches
The current model of the LR and SC INSNs requires a sign-extension to use the generated SImode value for conditional branches, which only operate on XLEN registers. However, the sign-extension is actually not required in both cases, therefore this patch introduces additional INSNs that consume the

[PATCH v3 7/9] RISC-V: Model INSNs for LR and SC [PR 100266]

2022-05-26 Thread Christoph Muellner via Gcc-patches
In order to emit LR/SC sequences, let's provide INSNs, which take care of memory ordering constraints. gcc/ PR 100266 * config/rsicv/sync.md (UNSPEC_LOAD_RESERVED): New. * config/rsicv/sync.md (UNSPEC_STORE_CONDITIONAL): New. * config/riscv/sync.md (riscv_load_r

[PATCH v3 6/9] RISC-V: Implement atomic_{load,store} [PR 100265]

2022-05-26 Thread Christoph Muellner via Gcc-patches
A recent commit introduced a mechanism to emit proper fences for RISC-V. Additionally, we already have emit_move_insn (). Let's reuse this code and provide atomic_load and atomic_store for RISC-V (as defined in section "Code Porting and Mapping Guidelines" of the unpriv spec). Note, that this works

[PATCH v3 5/9] RISC-V: Emit fences according to chosen memory model [PR 100265]

2022-05-26 Thread Christoph Muellner via Gcc-patches
mem_thread_fence gets the desired memory model as operand. Let's emit fences according to this value (as defined in section "Code Porting and Mapping Guidelines" of the unpriv spec). gcc/ PR 100265 * config/riscv/sync.md (mem_thread_fence): Emit fences according t

[PATCH v3 4/9] RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]

2022-05-26 Thread Christoph Muellner via Gcc-patches
Using AMOSWAP as atomic store does not allow us to do sub-word accesses. Further, it is not consistent with our atomic_load () implementation. The benefit of AMOSWAP is that the resulting code sequence will be smaller (comapred to FENCE+STORE), however, this does not weight out for the lack of sub-

[PATCH v3 3/9] RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]

2022-05-26 Thread Christoph Muellner via Gcc-patches
A previous patch took care, that the proper memory ordering suffixes for AMOs are emitted. Therefore there is no reason to keep the fence generation mechanism for release operations. gcc/ PR 100265 * config/riscv/riscv.c (riscv_memmodel_needs_release_fence): Remove fu

[PATCH v3 2/9] RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]

2022-05-26 Thread Christoph Muellner via Gcc-patches
The ratified A extension supports '.aq', '.rl' and '.aqrl' as memory ordering suffixes. Let's emit them in case we get a '%A' conversion specifier for riscv_print_operand(). As '%A' was already used for a similar, but restricted, purpose (only '.aq' was emitted so far), this does not require any o

[PATCH v3 1/9] RISC-V: Simplify memory model code [PR 100265]

2022-05-26 Thread Christoph Muellner via Gcc-patches
We don't have any special treatment of MEMMODEL_SYNC_* values, so let's hide them behind the memmodel_base() function. gcc/ PR 100265 * config/riscv/riscv.c (riscv_memmodel_needs_amo_acquire): Ignore MEMMODEL_SYNC_* values. * config/riscv/riscv.c (riscv_memmod

[PATCH v3 0/9] [RISC-V] Atomics improvements

2022-05-26 Thread Christoph Muellner via Gcc-patches
This series provides a cleanup of the current atomics implementation of RISC-V (PR100265: Use proper fences for atomic load/store). The first patch could be squashed into the following patches, but I found it easier to understand the chances with it in place. The series has been tested as follows

[PATCH] RISC-V: Allow unaligned accesses in cpymemsi expansion

2021-07-29 Thread Christoph Muellner via Gcc-patches
The RISC-V cpymemsi expansion is called, whenever the by-pieces infrastructure will not be taking care of the builtin expansion. Currently, that's the case for e.g. memcpy() with n <= 24 bytes. The code emitted by the by-pieces infrastructure emits code, that performs unaligned accesses if the targ

[PATCH v2] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

2021-07-22 Thread Christoph Muellner via Gcc-patches
This patch enables the overlap-by-pieces feature of the by-pieces infrastructure for inlining builtins in case the target has set riscv_slow_unaligned_access_p to false. An example to demonstrate the effect for targets with fast unaligned access (target's that have slow_unaligned_access set to fal

[PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

2021-07-22 Thread Christoph Muellner via Gcc-patches
This patch enables the overlap-by-pieces feature of the by-pieces infrastructure for inlining builtins in case the target has set riscv_slow_unaligned_access_p to false. To demonstrate the effect for targets with fast unaligned access, the following code sequences are generated for a 15-byte memse

[PATCH] RISC-V: Enable overlap-by-pieces via tune param

2021-07-21 Thread Christoph Muellner via Gcc-patches
This patch adds the field overlap_op_by_pieces to the struct riscv_tune_param, which allows to enable the overlap_op_by_pieces feature of the by-pieces infrastructure. gcc/ChangeLog: * config/riscv/riscv.c (struct riscv_tune_param): New field. (riscv_overlap_op_by_pieces): New fun

[PATCH v2] REE: PR rtl-optimization/100264: Handle more PARALLEL SET expressions

2021-05-10 Thread Christoph Muellner via Gcc-patches
Move the check for register targets (i.e. REG_P ()) into the function get_sub_rtx () and change the restriction of REE to "only one child of a PARALLEL expression is a SET register expression" (was "only one child of a PARALLEL expression is a SET expression"). This allows to handle more PARALLEL

[PATCH v2 10/10] RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]

2021-05-05 Thread Christoph Muellner via Gcc-patches
Atomic instructions require zero-offset memory addresses. If we allow all addresses, the nonzero-offset addresses will be prepared in an extra register in an extra instruction before the actual atomic instruction. This patch introduces the predicate "riscv_sync_memory_operand", which restricts the

[PATCH v2 09/10] RISC-V: Provide programmatic implementation of CAS [PR 100266]

2021-05-05 Thread Christoph Muellner via Gcc-patches
The existing CAS implementation uses an INSN definition, which provides the core LR/SC sequence. Additionally to that, there is a follow-up code, that evaluates the results and calculates the return values. This has two drawbacks: a) an extension to sub-word CAS implementations is not possible (eve

[PATCH v2 08/10] RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]

2021-05-05 Thread Christoph Muellner via Gcc-patches
The current model of the LR and SC INSNs requires a sign-extension to use the generated SImode value for conditional branches, which only operate on XLEN registers. However, the sign-extension is actually not required in both cases, therefore this patch introduces additional INSNs that consume the

[PATCH v2 07/10] RISC-V: Model INSNs for LR and SC [PR 100266]

2021-05-05 Thread Christoph Muellner via Gcc-patches
In order to emit LR/SC sequences, let's provide INSNs, which take care of memory ordering constraints. gcc/ PR 100266 * config/rsicv/sync.md (UNSPEC_LOAD_RESERVED): New. * config/rsicv/sync.md (UNSPEC_STORE_CONDITIONAL): New. * config/riscv/sync.md (riscv_load_r

[PATCH v2 06/10] RISC-V: Implement atomic_{load,store} [PR 100265]

2021-05-05 Thread Christoph Muellner via Gcc-patches
A recent commit introduced a mechanism to emit proper fences for RISC-V. Additionally, we already have emit_move_insn (). Let's reuse this code and provide atomic_load and atomic_store for RISC-V (as defined in section "Code Porting and Mapping Guidelines" of the unpriv spec). Note, that this works

[PATCH v2 05/10] RISC-V: Emit fences according to chosen memory model [PR 100265]

2021-05-05 Thread Christoph Muellner via Gcc-patches
mem_thread_fence gets the desired memory model as operand. Let's emit fences according to this value (as defined in section "Code Porting and Mapping Guidelines" of the unpriv spec). gcc/ PR 100265 * config/riscv/sync.md (mem_thread_fence): Emit fences according t

[PATCH v2 04/10] RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]

2021-05-05 Thread Christoph Muellner via Gcc-patches
Using AMOSWAP as atomic store does not allow us to do sub-word accesses. Further, it is not consistent with our atomic_load () implementation. The benefit of AMOSWAP is that the resulting code sequence will be smaller (comapred to FENCE+STORE), however, this does not weight out for the lack of sub-

[PATCH v2 03/10] RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]

2021-05-05 Thread Christoph Muellner via Gcc-patches
A previous patch took care, that the proper memory ordering suffixes for AMOs are emitted. Therefore there is no reason to keep the fence generation mechanism for release operations. gcc/ PR 100265 * config/riscv/riscv.c (riscv_memmodel_needs_release_fence): Remove fu

[PATCH v2 02/10] RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]

2021-05-05 Thread Christoph Muellner via Gcc-patches
The ratified A extension supports '.aq', '.rl' and '.aqrl' as memory ordering suffixes. Let's emit them in case we get a '%A' conversion specifier for riscv_print_operand(). As '%A' was already used for a similar, but restricted, purpose (only '.aq' was emitted so far), this does not require any o

[PATCH v2 01/10] RISC-V: Simplify memory model code [PR 100265]

2021-05-05 Thread Christoph Muellner via Gcc-patches
We don't have any special treatment of MEMMODEL_SYNC_* values, so let's hide them behind the memmodel_base() function. gcc/ PR 100265 * config/riscv/riscv.c (riscv_memmodel_needs_amo_acquire): Ignore MEMMODEL_SYNC_* values. * config/riscv/riscv.c (riscv_memmod

[PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2021-05-05 Thread Christoph Muellner via Gcc-patches
This series provides a cleanup of the current atomics implementation of RISC-V: * PR100265: Use proper fences for atomic load/store * PR100266: Provide programmatic implementation of CAS As both are very related, I merged the patches into one series. The first patch could be squashed into the fo

[PATCH] RISC-V: Generate helpers for cbranch4

2021-05-05 Thread Christoph Muellner via Gcc-patches
On RISC-V we are facing the fact, that our conditional branches require Pmode conditions. Currently, we generate them explicitly with a check for Pmode and then calling the proper generator (i.e. gen_cbranchdi4 on RV64 and gen_cbranchsi4 on RV32). Let's simplify this code by generating the INSN hel

[PATCH 10/10] RISC-V: Provide programmatic implementation of CAS [PR 100266]

2021-04-26 Thread Christoph Muellner via Gcc-patches
The existing CAS implementation uses an INSN definition, which provides the core LR/SC sequence. Additionally to that, there is a follow-up code, that evaluates the results and calculates the return values. This has two drawbacks: a) an extension to sub-word CAS implementations is not possible (eve

[PATCH 09/10] RISC-V: Generate helpers for cbranch4 [PR 100266]

2021-04-26 Thread Christoph Muellner via Gcc-patches
On RISC-V we are facing the fact, that our conditional branches require Pmode conditions. Currently, we generate them explicitly with a check for Pmode and then calling the proper generator (i.e. gen_cbranchdi4 on RV64 and gen_cbranchsi4 on RV32). Let's make simplify this code by using gen_cbranch4

[PATCH 08/10] RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]

2021-04-26 Thread Christoph Muellner via Gcc-patches
The current model of the LR and SC INSNs requires a sign-extension to use the generated SImode value for conditional branches, which only operate on XLEN registers. However, the sign-extension is actually not required in both cases, therefore this patch introduces additional INSNs that consume the

[PATCH 07/10] RISC-V: Model INSNs for LR and SC [PR 100266]

2021-04-26 Thread Christoph Muellner via Gcc-patches
In order to emit LR/SC sequences, let's provide INSNs, which take care of memory ordering constraints. gcc/ PR 100266 * config/rsicv/sync.md (UNSPEC_LOAD_RESERVED): New. * config/rsicv/sync.md (UNSPEC_STORE_CONDITIONAL): New. * config/riscv/sync.md (riscv_load_r

[PATCH 06/10] RISC-V: Implement atomic_{load,store} [PR 100265]

2021-04-26 Thread Christoph Muellner via Gcc-patches
A recent commit introduced a mechanism to emit proper fences for RISC-V. Additionally, we already have emit_move_insn (). Let's reuse this code and provide atomic_load and atomic_store for RISC-V (as defined in section "Code Porting and Mapping Guidelines" of the unpriv spec). Note, that this works

[PATCH 05/10] RISC-V: Emit fences according to chosen memory model [PR 100265]

2021-04-26 Thread Christoph Muellner via Gcc-patches
mem_thread_fence gets the desired memory model as operand. Let's emit fences according to this value (as defined in section "Code Porting and Mapping Guidelines" of the unpriv spec). gcc/ PR 100265 * config/riscv/sync.md (mem_thread_fence): Emit fences according t

[PATCH 04/10] RISC-V: Don't use amoswap for atomic stores [PR 100265]

2021-04-26 Thread Christoph Muellner via Gcc-patches
Using amoswap as atomic store is not an expected optimization and most likely causes a performance penalty. Neither SW nor HW have a benefit from this optimization, so let's simply drop it. gcc/ PR 100265 * config/riscv/sync.md (atomic_store): Remove. --- gcc/config/

[PATCH 03/10] RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]

2021-04-26 Thread Christoph Muellner via Gcc-patches
A previous patch took care, that the proper memory ordering suffixes for AMOs are emitted. Therefore there is no reason to keep the fence generation mechanism for release operations. gcc/ PR 100265 * config/riscv/riscv.c (riscv_memmodel_needs_release_fence): Remove fu

[PATCH 01/10] RISC-V: Simplify memory model code [PR 100265]

2021-04-26 Thread Christoph Muellner via Gcc-patches
We don't have any special treatment of MEMMODEL_SYNC_* values, so let's hide them behind the memmodel_base() function. gcc/ PR 100265 * config/riscv/riscv.c (riscv_memmodel_needs_amo_acquire): Ignore MEMMODEL_SYNC_* values. * config/riscv/riscv.c (riscv_memmod

[PATCH 02/10] RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]

2021-04-26 Thread Christoph Muellner via Gcc-patches
The ratified A extension supports '.aq', '.rl' and '.aqrl' as memory ordering suffixes. Let's emit them in case we get a '%A' conversion specifier for riscv_print_operand(). As '%A' was already used for a similar, but restricted, purpose (only '.aq' was emitted so far), this does not require any o

[PATCH 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2021-04-26 Thread Christoph Muellner via Gcc-patches
This series provides a cleanup of the current atomics implementation of RISC-V: * PR100265: Use proper fences for atomic load/store * PR100266: Provide programmatic implementation of CAS As both are very related, I merged the patches into one series (to avoid merge issues if one overtake the othe

[PATCH 1/2] REE: PR rtl-optimization/100264: Handle more PARALLEL SET expressions

2021-04-26 Thread Christoph Muellner via Gcc-patches
[ree] PR rtl-optimization/100264: Handle more PARALLEL SET expressions PR rtl-optimization/100264 * ree.c (get_sub_rtx): Ignore SET expressions without register destinations. (merge_def_and_ext): Eliminate destination check for register as such SET expressio