[PATCH] middle-end: [PR middle-end/116926] Allow widening optabs for vec-mode -> scalar-mode

2024-10-10 Thread Victor Do Nascimento
The recent refactoring of the dot_prod optab to convert-type exposed a limitation in how `find_widening_optab_handler_and_mode' is currently implemented, owing to the fact that, while the function expects the GET_MODE_CLASS (from_mode) == GET_MODE_CLASS (to_mode) condition to hold, the c6x back

Re: [PATCH] middle-end: reorder masking priority of math functions

2024-10-07 Thread Victor Do Nascimento
On 10/7/24 10:52, Richard Biener wrote: On Wed, Oct 2, 2024 at 6:26 PM Victor Do Nascimento wrote: Given the categorization of math built-in functions as `ECF_CONST', when if-converting their uses, their calls are not masked and are thus called with an all-true predicate. This, howeve

Re: [PATCH] middle-end: reorder masking priority of math functions

2024-10-04 Thread Victor Do Nascimento
On 10/4/24 09:32, Tamar Christina wrote: Hi Victor, -Original Message- From: Victor Do Nascimento Sent: Wednesday, October 2, 2024 5:26 PM To: gcc-patches@gcc.gnu.org Cc: Tamar Christina ; richard.guent...@gmail.com; Victor Do Nascimento Subject: [PATCH] middle-end: reorder masking

[PATCH] middle-end: reorder masking priority of math functions

2024-10-02 Thread Victor Do Nascimento
Given the categorization of math built-in functions as `ECF_CONST', when if-converting their uses, their calls are not masked and are thus called with an all-true predicate. This, however, is not appropriate where built-ins have library equivalents, wherein they may exhibit highly architecture-spe

Re: [PATCH] middle-end: Fix ifcvt predicate generation for masked function calls

2024-10-02 Thread Victor Do Nascimento
On 10/1/24 13:10, Richard Biener wrote: On Mon, Sep 30, 2024 at 8:40 PM Tamar Christina wrote: Hi Victor, Thanks! This looks good to me with one minor comment: -Original Message- From: Victor Do Nascimento Sent: Monday, September 30, 2024 2:34 PM To: gcc-patches@gcc.gnu.org Cc

[PATCH] middle-end: Fix ifcvt predicate generation for masked function calls

2024-09-30 Thread Victor Do Nascimento
Up until now, due to a latent bug in the code for the ifcvt pass, irrespective of the branch taken in a conditional statement, the original condition for the if statement was used in masking the function call. Thus, for code such as: if (a[i] > limit) b[i] = fixed_const; else b[i] = f

[PING][PATCH V4 10/10] autovectorizer: Test autovectorization of different dot-prod modes.

2024-09-26 Thread Victor Do Nascimento
Hello, Gentle reminder for this simple renaming update in response to the feedback from the last iteration. 🙂 Thanks, Victor On 9/5/24 12:05, Victor Do Nascimento wrote: Changes from previous revision: Rename new `check_effective_target' and tests to make their intent clearer.

[PING][PATCH V4 04/10] arm: Fix arm backend-use of (u|s|us)dot_prod patterns

2024-09-26 Thread Victor Do Nascimento
Hello, Gentle reminder for this patch 🙂 Thanks, Victor On 9/5/24 11:59, Victor Do Nascimento wrote: Changes from previous revision: As was done for the equivalent aarch64 patch, we rework this patch to do away with mission creep, keeping changes as simple as possible. We thus remove the

[PATCH V4 10/10] autovectorizer: Test autovectorization of different dot-prod modes.

2024-09-05 Thread Victor Do Nascimento
Changes from previous revision: Rename new `check_effective_target' and tests to make their intent clearer. * lib/target-supports.exp: For new `check_effective_target', s/vect_dotprod_twoway/vect_dotprod_hisi/. * One test is renamed to `vect-dotprod-conv-optab.c' to emphasize aim of c

[PATCH V4 04/10] arm: Fix arm backend-use of (u|s|us)dot_prod patterns

2024-09-05 Thread Victor Do Nascimento
Changes from previous revision: As was done for the equivalent aarch64 patch, we rework this patch to do away with mission creep, keeping changes as simple as possible. We thus remove the `gimple_fold_builtin' changes that would have replaced the dot-product builtin calls with DOT_PROD_EXPRs a

[PING] [PATCH V3 09/10] c6x: Adjust dot-product backend patterns

2024-08-28 Thread Victor Do Nascimento
Hello, Gentle reminder for this simple renaming patch :) Thanks, Victor On 8/15/24 09:44, Victor Do Nascimento wrote: Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names

[PING] [PATCH V3 07/10] mips: Adjust dot-product backend patterns

2024-08-28 Thread Victor Do Nascimento
Hello, Gentle reminder for this simple renaming patch :) Thanks, Victor On 8/15/24 09:44, Victor Do Nascimento wrote: Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names

[PING] [PATCH V3 06/10] arc: Adjust dot-product backend patterns

2024-08-28 Thread Victor Do Nascimento
Hello, Gentle reminder for this simple renaming patch :) Thanks, Victor On 8/15/24 09:44, Victor Do Nascimento wrote: Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names

Re: [PATCH V2 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns

2024-08-15 Thread Victor Do Nascimento
On 8/15/24 09:26, Richard Sandiford wrote: Victor Do Nascimento writes: Given recent changes to the dot_prod standard pattern name, this patch fixes the aarch64 back-end by implementing the following changes: 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files. 2. Rewrite

[PATCH V3 08/10] rs6000: Adjust altivec dot-product backend patterns

2024-08-15 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/rs6000/altivec.md (udot_prod): Renamed to... (udot_prodv4si): ...this. (sdot

[PATCH V3 06/10] arc: Adjust dot-product backend patterns

2024-08-15 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/arc/simdext.md (sdot_prodv2hi): Renamed to... (sdot_prodsiv2hi): ...this. (u

[PATCH V3 07/10] mips: Adjust dot-product backend patterns

2024-08-15 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/mips/loongson-mmi.md (sdot_prodv4hi): Renamed to... (sdot_prodv2siv4hi): ...this. --

[PATCH V3 09/10] c6x: Adjust dot-product backend patterns

2024-08-15 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/c6x/c6x.md (sdot_prodv2hi): Renamed to... (sdot_prodsiv2hi): ...this. --- gcc/confi

[PATCH V3 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns

2024-08-15 Thread Victor Do Nascimento
Given recent changes to the dot_prod standard pattern name, this patch fixes the aarch64 back-end by implementing the following changes: 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files. 2. Rewrite initialization and function expansion mechanism for simd builtins. 3. Fix all direct ca

[PATCH V3 02/10] autovectorizer: Add basic support for convert optabs

2024-08-15 Thread Victor Do Nascimento
Given the shift from modeling dot products as direct optabs to treating them as conversion optabs, we make necessary changes to the autovectorizer code to ensure that given the relevant tree code, together with the input and output data modes, we can retrieve the relevant optab and subsequently the

[PATCH V3 10/10] autovectorizer: Test autovectorization of different dot-prod modes.

2024-08-15 Thread Victor Do Nascimento
From: Victor Do Nascimento Given the novel treatment of the dot product optab as a conversion, we are now able to targe different relationships between output modes and input modes. This is made clearer by way of example. Previously, on AArch64, the following loop was vectorizable: uint32_t

[PATCH V3 04/10] arm: Fix arm backend-use of (u|s|us)dot_prod patterns

2024-08-15 Thread Victor Do Nascimento
gcc/ChangeLog: * config/arm/arm-builtins.cc (enum arm_builtins): Add new ARM_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI, UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI. (arm_init_dotprod_builtins): New. (arm_init_builtins): Add call to `arm_init_dotprod_builtins

[PATCH V3 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-08-15 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/i386/mmx.md (usdot_prodv8qi): Renamed to... (usdot_prodv2siv8qi): ...this. (

[PATCH V3 01/10] optabs: Make all `*dot_prod_optab's modeled as conversions

2024-08-15 Thread Victor Do Nascimento
Given the specification in the GCC internals manual defines the {u|s}dot_prod standard name as taking "two signed elements of the same mode, adding them to a third operand of wider mode", there is currently ambiguity in the relationship between the mode of the first two arguments and that of the th

[PATCH V3 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions

2024-08-15 Thread Victor Do Nascimento
nd armhf. I'd appreciate help running relevant tests on the remaining architectures, i.e. arc, mips, altivec and c6x to ensure I've not inadvertently broken anything for those back-ends. Victor Do Nascimento (10): optabs: Make all `*dot_prod_optab's modeled as conversions autov

Re: [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs

2024-08-14 Thread Victor Do Nascimento
On 8/14/24 13:24, Tamar Christina wrote: It seems to me that this should take a code_helper, create the vector modes and call directly_supported_p, or am I missing something? Ok. Having done some digging around in the git history, I see that `vect_supportable_direct_optab_p', upon which I ba

Re: [PATCH V2 02/10] autovectorizer: Add basic support for convert optabs

2024-08-14 Thread Victor Do Nascimento
On 8/14/24 13:24, Tamar Christina wrote: Hi Victor, -Original Message- From: Victor Do Nascimento Sent: Tuesday, August 13, 2024 1:42 PM To: gcc-patches@gcc.gnu.org Cc: Tamar Christina ; claz...@gmail.com; hongtao@intel.com; s...@gcc.gnu.org; bernds_...@t-online.de; al

[PATCH V2 10/10] autovectorizer: Test autovectorization of different dot-prod modes.

2024-08-13 Thread Victor Do Nascimento
From: Victor Do Nascimento Given the novel treatment of the dot product optab as a conversion, we are now able to targe different relationships between output modes and input modes. This is made clearer by way of example. Previously, on AArch64, the following loop was vectorizable: uint32_t

[PATCH V2 07/10] mips: Adjust dot-product backend patterns

2024-08-13 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/mips/loongson-mmi.md (sdot_prodv4hi): Renamed to... (sdot_prodv2siv4hi): ...this. --

[PATCH V2 08/10] rs6000: Adjust altivec dot-product backend patterns

2024-08-13 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/rs6000/altivec.md (udot_prod): Renamed to... (udot_prodv4si): ...this. (sdot

[PATCH V2 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-08-13 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/i386/mmx.md (usdot_prodv8qi): Renamed to... (usdot_prodv2siv8qi): ...this. (

[PATCH V2 00/10] optabs: Make all `*dot_prod_optab's modeled as conversions

2024-08-13 Thread Victor Do Nascimento
the same input mode but resulting in a different output mode. Regression-tested on x86_64, aarch64 and armhf. I'd appreciate help running relevant tests on the remaining architectures, i.e. arc, mips, altivec and c6x to ensure I've not inadvertently broken anything for those back-e

[PATCH V2 04/10] arm: Fix arm backend-use of (u|s|us)dot_prod patterns

2024-08-13 Thread Victor Do Nascimento
gcc/ChangeLog: * config/arm/arm-builtins.cc (enum arm_builtins): Add new ARM_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI, UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI. (arm_init_dotprod_builtins): New. (arm_init_builtins): Add call to `arm_init_dotprod_builtins

[PATCH V2 06/10] arc: Adjust dot-product backend patterns

2024-08-13 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/arc/simdext.md (sdot_prodv2hi): Renamed to... (sdot_prodsiv2hi): ...this. (u

[PATCH V2 09/10] c6x: Adjust dot-product backend patterns

2024-08-13 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/c6x/c6x.md (sdot_prodv2hi): Renamed to... (sdot_prodsiv2hi): ...this. --- gcc/confi

[PATCH V2 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns

2024-08-13 Thread Victor Do Nascimento
Given recent changes to the dot_prod standard pattern name, this patch fixes the aarch64 back-end by implementing the following changes: 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files. 2. Rewrite initialization and function expansion mechanism for simd builtins. 3. Fix all direct ca

[PATCH V2 01/10] optabs: Make all `*dot_prod_optab's modeled as conversions

2024-08-13 Thread Victor Do Nascimento
Given the specification in the GCC internals manual defines the {u|s}dot_prod standard name as taking "two signed elements of the same mode, adding them to a third operand of wider mode", there is currently ambiguity in the relationship between the mode of the first two arguments and that of the th

[PATCH V2 02/10] autovectorizer: Add basic support for convert optabs

2024-08-13 Thread Victor Do Nascimento
Given the shift from modeling dot products as direct optabs to treating them as conversion optabs, we make necessary changes to the autovectorizer code to ensure that given the relevant tree code, together with the input and output data modes, we can retrieve the relevant optab and subsequently the

Re: [PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-07-12 Thread Victor Do Nascimento
On 7/12/24 03:23, Jiang, Haochen wrote: -Original Message- From: Hongtao Liu Sent: Thursday, July 11, 2024 9:45 AM To: Victor Do Nascimento Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com; richard.earns...@arm.com Subject: Re: [PATCH 05/10] i386: Fix dot_prod backend patterns

[PATCH 03/10] aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns.

2024-07-10 Thread Victor Do Nascimento
Given recent changes to the dot_prod standard pattern name, this patch fixes the aarch64 back-end by implementing the following changes: 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files. 2. Rewrite initialization and function expansion mechanism for simd builtins. 3. Fix all direct ca

[PATCH 00/10] Make `dot_prod' a convert-type optab

2024-07-10 Thread Victor Do Nascimento
x to ensure I've not inadvertently broken anything for those backends. Victor Do Nascimento (10): optabs: Make all `*dot_prod_optab's modeled as conversions autovectorizer: Add basic support for convert optabs aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns. arm: Fix arm b

[PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/i386/mmx.md (usdot_prodv8qi): Deleted. (usdot_prodv2siv8qi): New. (sdot_prod

[PATCH 06/10] arc: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/arc/simdext.md (sdot_prodv2hi): Deleted. (sdot_prodsiv2hi): New. (udot_prodv

[PATCH 02/10] autovectorizer: Add basic support for convert optabs

2024-07-10 Thread Victor Do Nascimento
Given the shift from modeling dot products as direct optabs to treating them as conversion optabs, we make necessary changes to the autovectorizer code to ensure that given the relevant tree code, together with the input and output data modes, we can retrieve the relevant optab and subsequently the

[PATCH 08/10] altivec: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/rs6000/altivec.md (udot_prod): Deleted. (udot_prodv4si): New. (sdot_prodv8hi

[PATCH 09/10] c6x: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/c6x/c6x.md (sdot_prodv2hi): Deleted. (sdot_prodsiv2hi): New. --- gcc/config/c6x/c6x

[PATCH 07/10] mips: Adjust dot-product backend patterns

2024-07-10 Thread Victor Do Nascimento
Following the migration of the dot_prod optab from a direct to a conversion-type optab, ensure all back-end patterns incorporate the second machine mode into pattern names. gcc/ChangeLog: * config/mips/loongson-mmi.md (sdot_prodv4hi): Deleted. (sdot_prodv2siv4hi): New. --- gcc/co

[PATCH 10/10] autovectorizer: Test autovectorization of different dot-prod modes.

2024-07-10 Thread Victor Do Nascimento
From: Victor Do Nascimento Given the novel treatment of the dot product optab as a conversion we are now able to target, for a given architecture, different relationships between output modes and input modes. This is made clearer by way of example. Previously, on AArch64, the following loop was

[PATCH 01/10] optabs: Make all `*dot_prod_optab's modeled as conversions

2024-07-10 Thread Victor Do Nascimento
Given the specification in the GCC internals manual defines the {u|s}dot_prod standard name as taking "two signed elements of the same mode, adding them to a third operand of wider mode", there is currently ambiguity in the relationship between the mode of the first two arguments and that of the th

[PATCH 04/10] arm: Fix arm backend-use of (u|s|us)dot_prod patterns.

2024-07-10 Thread Victor Do Nascimento
gcc/ChangeLog: * config/arm/arm-builtins.cc (enum arm_builtins): Add new ARM_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI, UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI. (arm_init_dotprod_builtins): New. (arm_init_builtins): Add call to `arm_init_dotprod_builtins

[PATCH v2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-06-12 Thread Victor Do Nascimento
The introduction of the optional RCPC3 architectural extension for Armv8.2-A upwards provides additional support for the release consistency model, introducing the Load-Acquire RCpc Pair Ordered, and Store-Release Pair Ordered operations in the form of LDIAPP and STILP. These operations are single

[PATCH v2 2/4] Libatomic: Define per-file identifier macros

2024-06-11 Thread Victor Do Nascimento
In order to facilitate the fine-tuning of how `libatomic_i.h' and `host-config.h' headers are used by different atomic functions, we define distinct identifier macros for each file which, in implementing atomic operations, imports these headers. The idea is that different parts of these headers co

[PATCH v2 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file

2024-06-11 Thread Victor Do Nascimento
At present, `atomic_16.S' groups different implementations of the same functions together in the file. Therefore, as an example, the LSE2 implementation of `load_16' follows on immediately from its core implementation, as does the `store_16' LSE2 implementation. Such architectural extension-depen

[PATCH v2 3/4] Libatomic: Make ifunc selector behavior contingent on importing file

2024-06-11 Thread Victor Do Nascimento
By querying previously-defined file-identifier macros, `host-config.h' is able to get information about its environment and, based on this information, select more appropriate function-specific ifunc selectors. This reduces the number of unnecessary feature tests that need to be carried out in ord

[PATCH v2 1/4] Libatomic: AArch64: Convert all lse128 assembly to .insn directives

2024-06-11 Thread Victor Do Nascimento
Given the lack of support for the LSE128 instructions in all but the the most up-to-date version of Binutils (2.42), having the build-time test for assembler support for these instructions often leads to the building of Libatomic without support for LSE128-dependent atomic function implementations.

[PATCH v2 0/4] Libatomic: Cleanup ifunc selector and aliasing

2024-06-11 Thread Victor Do Nascimento
and `--disable-gnu-indirect-function' configurations on armv9.4-a target with LRCPC3 and LSE128 support and without. Victor Do Nascimento (4): Libatomic: AArch64: Convert all lse128 assembly to .insn directives Libatomic: Define per-file identifier macros Libatomic: Make ifunc selector

[PATCH v2] middle-end: Drop __builtin_prefetch calls in autovectorization [PR114061]

2024-06-11 Thread Victor Do Nascimento
At present the autovectorizer fails to vectorize simple loops involving calls to `__builtin_prefetch'. A simple example of such loop is given below: void foo(double * restrict a, double * restrict b, int n){ int i; for(i=0; i *references) clobbers_memory = true; break;

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-17 Thread Victor Do Nascimento
6 AM Tamar Christina wrote: -Original Message- From: Richard Biener Sent: Friday, May 17, 2024 10:46 AM To: Tamar Christina Cc: Victor Do Nascimento ; gcc- patc...@gcc.gnu.org; Richard Sandiford ; Richard Earnshaw ; Victor Do Nascimento Subject: Re: [PATCH] middle-end: Expand {u|s}do

[PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Victor Do Nascimento
From: Victor Do Nascimento At present, the compiler offers the `{u|s|us}dot_prod_optab' direct optabs for dealing with vectorizable dot product code sequences. The consequence of using a direct optab for this is that backend-pattern selection is only ever able to match against one dat

Re: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Victor Do Nascimento
On 5/16/24 15:16, Andrew Pinski wrote: On Thu, May 16, 2024, 3:58 PM Victor Do Nascimento mailto:victor.donascime...@arm.com>> wrote: At present the autovectorizer fails to vectorize simple loops involving calls to `__builtin_prefetch'.  A simple example of such lo

[PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Victor Do Nascimento
At present the autovectorizer fails to vectorize simple loops involving calls to `__builtin_prefetch'. A simple example of such loop is given below: void foo(double * restrict a, double * restrict b, int n){ int i; for(i=0; i *references) clobbers_memory = true; break;

[PATCH] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-05-16 Thread Victor Do Nascimento
The introduction of the optional RCPC3 architectural extension for Armv8.2-A upwards provides additional support for the release consistency model, introducing the Load-Acquire RCpc Pair Ordered, and Store-Release Pair Ordered operations in the form of LDIAPP and STILP. These operations are single

[PATCH 1/4] Libatomic: Define per-file identifier macros

2024-05-16 Thread Victor Do Nascimento
In order to facilitate the fine-tuning of how `libatomic_i.h' and `host-config.h' headers are used by different atomic functions, we define distinct identifier macros for each file which, in implementing atomic operations, imports these headers. The idea is that different parts of these headers co

[PATCH 4/4] Libatomic: Clean up AArch64 `atomic_16.S' implementation file

2024-05-16 Thread Victor Do Nascimento
At present, `atomic_16.S' groups different implementations of the same functions together in the file. Therefore, as an example, the LSE128 implementation of `exchange_16' follows on immediately from its core implementation, as does the `fetch_or_16' LSE128 implementation. Such architectural exte

[PATCH 2/4] Libatomic: Make ifunc selector behavior contingent on importing file

2024-05-16 Thread Victor Do Nascimento
By querying previously-defined file-identifier macros, `host-config.h' is able to get information about its environment and, based on this information, select more appropriate function-specific ifunc selectors. This reduces the number of unnecessary feature tests that need to be carried out in ord

[PATCH 3/4] Libatomic: Clean up AArch64 ifunc aliasing

2024-05-16 Thread Victor Do Nascimento
Following improvements to the way ifuncs are selected based on detected architectural features, we are able to do away with many of the aliases that were previously needed for subsets of atomic functions that were not implemented in a given extension. This may be clarified by virtue of an example.

[PATCH 0/4] Libatomic: Cleanup ifunc selector and aliasing

2024-05-16 Thread Victor Do Nascimento
able-gnu-indirect-function' configurations on armv9.4-a target with LRCPC3 and LSE128 support and without. Victor Do Nascimento (4): Libatomic: Define per-file identifier macros Libatomic: Make ifunc selector behavior contingent on importing file Libatomic: Clean up AArch64 ifunc a

Re: [PATCH] aarch64: Add +lse128 architectural extension command-line flag

2024-03-27 Thread Victor Do Nascimento
On 3/26/24 12:26, Richard Sandiford wrote: Victor Do Nascimento writes: Given how, at present, the choice of using LSE128 atomic instructions by the toolchain is delegated to run-time selection in the form of Libatomic ifuncs, responsible for querying target support, the `+lse128' t

[PATCH] aarch64: Align lrcpc3 FEAT_STRING with /proc/cpuinfo 'Features' entry

2024-03-25 Thread Victor Do Nascimento
Due to the Linux kernel exposing the lrcpc3 architectural feature as "lrcpc3", this patch corrects the relevant FEATURE_STRING entry in the "rcpc3" AARCH64_OPT_FMV_EXTENSION macro, such that the feature can be correctly detected when doing native compilation on rcpc3-enabled targets. Regtested on

[PATCH] aarch64: Add +lse128 architectural extension command-line flag

2024-03-15 Thread Victor Do Nascimento
Given how, at present, the choice of using LSE128 atomic instructions by the toolchain is delegated to run-time selection in the form of Libatomic ifuncs, responsible for querying target support, the `+lse128' target architecture compile-time flag is absent from GCC. This, however, contrasts with

Re: [libatomic PATCH] PR other/113336: Fix libatomic testsuite regressions on ARM.

2024-02-14 Thread Victor Do Nascimento
arm-linux-gnueabihf with --with-arch=armv6 with make bootstrap and make -k check where it fixes all of the FAILs in libatomic. Ok for mainline? 2024-01-28 Roger Sayle Victor Do Nascimento libatomic/ChangeLog PR other/113336 * Makefile.am: Build tas

[PATCH] AArch64: Update system register database.

2024-02-06 Thread Victor Do Nascimento
With the release of Binutils 2.42, this brings the level of system-register support in GCC in line with the current state-of-the-art in Binutils, ensuring everything available in Binutils is plainly accessible from GCC. Where Binutils uses a more detailed description of which features are responsi

Re: [PATCH v2 2/2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-01-26 Thread Victor Do Nascimento
On 1/26/24 10:53, Richard Sandiford wrote: > Victor Do Nascimento writes: >> @@ -712,6 +760,27 @@ ENTRY (libat_test_and_set_16) >> END (libat_test_and_set_16) >> >> >> +/* Alias all LSE128_LRCPC3 ifuncs to their specific implementations, >> + that

Re: [libatomic PATCH] Fix testsuite regressions on ARM [raspberry pi].

2024-01-25 Thread Victor Do Nascimento
On 1/11/24 15:55, Roger Sayle wrote: Hi Richard, As you've recommended, this issue has now been filed in bugzilla as PR other/113336. As explained in the new PR, libatomic's testsuite used to pass on armv6 (raspberry pi) in previous GCC releases, but the code was incorrect/non-synchronous; t

[PATCH v2 2/2] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2024-01-24 Thread Victor Do Nascimento
The introduction of the optional RCPC3 architectural extension for Armv8.2-A upwards provides additional support for the release consistency model, introducing the Load-Acquire RCpc Pair Ordered, and Store-Release Pair Ordered operations in the form of LDIAPP and STILP. These operations are single

[PATCH v2 1/2] libatomic: Increase max IFUNC_NCOND(N) from 3 to 4.

2024-01-24 Thread Victor Do Nascimento
libatomic/ChangeLog: * libatomic_i.h: Add GEN_SELECTOR implementation for IFUNC_NCOND(N) == 4. --- libatomic/libatomic_i.h | 18 ++ 1 file changed, 18 insertions(+) diff --git a/libatomic/libatomic_i.h b/libatomic/libatomic_i.h index 861a22da152..0a854fd908c 100644

[PATCH v2 0/2] libatomic: AArch64 rcpc3 128-bit atomic operation enablement

2024-01-24 Thread Victor Do Nascimento
/gcc-patches/2024-January/643841.html Victor Do Nascimento (2): libatomic: Increase max IFUNC_NCOND(N) from 3 to 4. libatomic: Add rcpc3 128-bit atomic operations for AArch64 libatomic/Makefile.am| 6 +- libatomic/Makefile.in| 22

[PATCH v4 1/4] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-24 Thread Victor Do Nascimento
The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their su

[PATCH v4 4/4] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

2024-01-24 Thread Victor Do Nascimento
At present, Evaluation of both `has_lse2(hwcap)' and `has_lse128(hwcap)' may require issuing an `mrs' instruction to query a system register. This instruction, when issued from user-space results in a trap by the kernel which then returns the value read in by the system register. Given the undesi

[PATCH v4 3/4] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-24 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset) of

[PATCH v4 2/4] libatomic: Add support for __ifunc_arg_t arg in ifunc resolver

2024-01-24 Thread Victor Do Nascimento
With support for new atomic features in Armv9.4-a being indicated by HWCAP2 bits, Libatomic's ifunc resolver must now query its second argument, of type __ifunc_arg_t*. We therefore make this argument known to libatomic, allowing us to query hwcap2 bits in the following manner: bool resolver

[PATCH v4 0/4] Libatomic: Add LSE128 atomics support for AArch64

2024-01-24 Thread Victor Do Nascimento
upport is present. Regression tested on aarch64-linux-gnu target with LSE128-support. [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html Victor Do Nascimento (4): libatomic: atomic_16.S: Improve ENTRY, END an

Re: [PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-08 Thread Victor Do Nascimento
On 1/5/24 11:10, Richard Sandiford wrote: Victor Do Nascimento writes: The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which

Re: [PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-08 Thread Victor Do Nascimento
On 1/5/24 11:47, Richard Sandiford wrote: Victor Do Nascimento writes: The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with

[PATCH v3 3/3] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

2024-01-02 Thread Victor Do Nascimento
At present, Evaluation of both `has_lse2(hwcap)' and `has_lse128(hwcap)' may require issuing an `mrs' instruction to query a system register. This instruction, when issued from user-space results in a trap by the kernel which then returns the value read in by the system register. Given the undesi

[PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-02 Thread Victor Do Nascimento
The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their su

[PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-02 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset) of

[PATCH v3 0/3] Libatomic: Add LSE128 atomics support for AArch64

2024-01-02 Thread Victor Do Nascimento
org/pipermail/gcc-patches/2023-June/620529.html [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html Victor Do Nascimento (3): libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface libatomic: Enable LSE128 128-bit atomics for armv9.4-a aarch64: Add ex

[PATCH] aarch64: arm_neon.h - Fix -Wincompatible-pointer-types errors

2023-12-09 Thread Victor Do Nascimento
In the Linux kernel, u64/s64 are [un]signed long long, not [un]signed long. This means that when the `arm_neon.h' header is used by the kernel, any use of the `uint64_t' / `in64_t' types needs to be correctly cast to the correct `__builtin_aarch64_simd_di' / `__builtin_aarch64_simd_df' types when

[PATCH v3] aarch64: Implement the ACLE instruction/data prefetch functions.

2023-12-05 Thread Victor Do Nascimento
Key changes in v3: * Implement the `require_const_argument' function to ensure the nth argument in EXP represents a const-type argument in the valid range given by [minval, maxval), forgoing expansion altogether when an invalid argument is detected early on. * Whereas in the previous iter

[PATCH v2 3/5] aarch64: Sync `aarch64-sys-regs.def' with Binutils.

2023-11-28 Thread Victor Do Nascimento
This patch updates `aarch64-sys-regs.def', bringing it into sync with the Binutils source. gcc/ChangeLog: * config/aarch64/aarch64-sys-regs.def (par_el1): New. (rcwmask_el1): Likewise. (rcwsmask_el1): Likewise. (ttbr0_el1): Likewise. (ttbr0_el12): Likewise.

[PATCH v2 1/5] aarch64: Add march flags for +the and +d128 arch extensions

2023-11-28 Thread Victor Do Nascimento
Given the introduction of optional 128-bit page table descriptor and translation hardening extension support with the Arm9.4-a architecture, this introduces the relevant flags to enable the reading and writing of 128-bit system registers. The `+d128' -march modifier enables the use of the followin

[PATCH v2 4/5] aarch64: Implement 128-bit extension to ACLE sysreg r/w builtins

2023-11-28 Thread Victor Do Nascimento
Implement the ACLE builtins for 128-bit system register manipulation: * __uint128_t __arm_rsr128(const char *special_register); * void __arm_wsr128(const char *special_register, __uint128_t value); gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (AARCH64_RSR128): New `enu

[PATCH v2 5/5] aarch64: Add rsr128 and wsr128 ACLE tests

2023-11-28 Thread Victor Do Nascimento
Extend existing unit tests for the ACLE system register manipulation functions to include 128-bit tests. gcc/testsuite/ChangeLog: * gcc/testsuite/gcc.target/aarch64/acle/rwsr.c (get_rsr128): New. (set_wsr128): Likewise. --- gcc/testsuite/gcc.target/aarch64/acle/rwsr.c | 32 ++

[PATCH v2 0/5] aarch64: Add Armv9.4-a 128-bit system-register read/write support

2023-11-28 Thread Victor Do Nascimento
ces the Guarded Control Stack (GCS) `+gcs' architecture modifier flag, allowing the inclusion of the novel GCS system registers which are now supported and also present in the `aarch64-sys-regs.def' system register database. Victor Do Nascimento (5): aarch64: Add march flags for +the

[PATCH v2 2/5] aarch64: Add support for GCS system registers with the +gcs modifier

2023-11-28 Thread Victor Do Nascimento
Given the introduction of system registers associated with the Guarded Control Stack extension to Armv9.4-a in Binutils and their reliance on the `+gcs' modifier, we implement the necessary changes in GCC to allow for them to be recognized by the compiler. gcc/ChangeLog: * config/aarch64/

[PATCH] libatomic: Add rcpc3 128-bit atomic operations for AArch64

2023-11-13 Thread Victor Do Nascimento
Continuing on from previously-proposed Libatomic enablement work [1], the introduction of the optional RCPC3 architectural extension for Armv8.2-A upwards provides additional support for the release consistency model, introducing both the Load-Acquire RCpc Pair Ordered, and Store-Release Pair Order

[PATCH v2 2/2] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2023-11-13 Thread Victor Do Nascimento
The armv9.4-a architectural revision adds three new atomic operations associated with the LSE128 feature: * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit value held in a pair of registers, with original data loaded into the same 2 registers. * LDSETP - Atomic OR (bitset) of

[PATCH v2 1/2] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2023-11-13 Thread Victor Do Nascimento
The introduction of further architectural-feature dependent ifuncs for AArch64 makes hard-coding ifunc `_i' suffixes to functions cumbersome to work with. It is awkward to remember which ifunc maps onto which arch feature and makes the code harder to maintain when new ifuncs are added and their su

[PATCH v2 0/2] Libatomic: Add LSE128 atomics support for AArch64

2023-11-13 Thread Victor Do Nascimento
-patches/2023-August/626358.html Victor Do Nascimento (2): libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface libatomic: Enable LSE128 128-bit atomics for armv9.4-a libatomic/Makefile.am| 3 + libatomic/Makefile.in| 1 + li

  1   2   >