from:"Matt Arsenault via cfe\\\-commits"

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits

@@ -3172,8 +3172,8 @@ def int_amdgcn_loop : Intrinsic<[llvm_i1_ty], [llvm_anyint_ty], [IntrWillReturn, IntrNoCallback, IntrNoFree] >; -def int_amdgcn_end_cf : Intrinsic<[], [llvm_anyint_ty], - [IntrWillReturn, IntrNoCallback, IntrNoFree]>; +def int_amdgcn_wave_reconverge :

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits

@@ -15740,6 +15740,32 @@ void SITargetLowering::finalizeLowering(MachineFunction &MF) const { } } + // ISel inserts copy to regs for the successor PHIs + // at the BB end. We need to move the SI_WAVE_RECONVERGE right before the + // branch. + for (auto &MBB : MF) {

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/92809 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm requested changes to this pull request. There are quite a few code quality regressions, and XFAILed tests. The description needs more elaboration on what the strategy is here https://github.com/llvm/llvm-project/pull/92809 _

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-17 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][CodeGen] Add query for a target's flat address space (PR #95728)

2024-06-17 Thread Matt Arsenault via cfe-commits

@@ -1764,6 +1764,13 @@ class TargetInfo : public TransferrableTargetInfo, return 0; } + /// \returns Target specific flat ptr address space; a flat ptr is a ptr that + /// can be casted to / from all other target address spaces. If the target + /// exposes no such add

[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-06-17 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,86 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s +// RUN: %clang_cc1 %s -O0 -triple amdg

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,84 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature + // REQUIRES: amdgpu-registered-target + // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s + // RUN

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,84 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature + // REQUIRES: amdgpu-registered-target + // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s + // RUN

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,9 @@ + arsenm wrote: Extra blank line https://github.com/llvm/llvm-project/pull/94830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,17 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -fsyntax-only -verify -std=gnu++11 -triple amdgcn -Wno-unused-value %s + arsenm wrote: We probably want another similar sema test for OpenCL/HIP/OpenMP https://github.com/llvm/llvm-pro

[clang] [clang][CodeGen] Add query for a target's flat address space (PR #95728)

2024-06-17 Thread Matt Arsenault via cfe-commits

@@ -1764,6 +1764,13 @@ class TargetInfo : public TransferrableTargetInfo, return 0; } + /// \returns Target specific flat ptr address space; a flat ptr is a ptr that + /// can be casted to / from all other target address spaces. If the target + /// exposes no such add

[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-06-17 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,86 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s +// RUN: %clang_cc1 %s -O0 -triple amdg

[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-06-17 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,86 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s +// RUN: %clang_cc1 %s -O0 -triple amdg

[clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-15 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/95395 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-14 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,21 @@ +//===-- AMDGPUTypes.def - Metadata about AMDGPU types ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apa

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-14 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,84 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature + // REQUIRES: amdgpu-registered-target + // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s + // RUN

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-14 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,21 @@ +//===-- AMDGPUTypes.def - Metadata about AMDGPU types ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apa

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits

@@ -6129,13 +6150,55 @@ static SDValue lowerLaneOp(const SITargetLowering &TLI, SDNode *N, if (ValSize % 32 != 0) return SDValue(); + auto unrollLaneOp = [&DAG, &SL](SDNode *N) -> SDValue { +EVT VT = N->getValueType(0); +unsigned NE = VT.getVectorNumElements();

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-13 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,69 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py + // REQUIRES: amdgpu-registered-target + // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s + // RUN: %clang_cc1 -triple amdgcn-unkn

[clang] [clang-tools-extra] [compiler-rt] [flang] [libc] [lld] [lldb] [llvm] [mlir] [openmp] [llvm-project] Fix typo "seperate" (PR #95373)

2024-06-13 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/95373 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Offload] Introduce the concept of "default streams" (PR #95371)

2024-06-13 Thread Matt Arsenault via cfe-commits

@@ -1125,6 +1125,22 @@ void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA, CmdArgs.push_back("__clang_openmp_device_functions.h"); } + if (Args.hasArg(options::OPT_foffload_via_llvm)) { +// Add llvm_wrappers/* to our system include path. This

[clang] [llvm] [Offload] Introduce the concept of "default streams" (PR #95371)

2024-06-13 Thread Matt Arsenault via cfe-commits

@@ -1125,6 +1125,22 @@ void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA, CmdArgs.push_back("__clang_openmp_device_functions.h"); } + if (Args.hasArg(options::OPT_foffload_via_llvm)) { +// Add llvm_wrappers/* to our system include path. This

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-13 Thread Matt Arsenault via cfe-commits

arsenm wrote: > Just a note - and maybe this was already discussed above - is there good > reason not to explicitly make this type a 128-bit scalar? The LLVM data > layout already does this I thought this was the 160 bit version? Can we have an opaque-but-sized type? The concern is exposing

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-13 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,95 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu verde -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-13 Thread Matt Arsenault via cfe-commits

arsenm wrote: > I understand the chance of conflict is low. It may be like the chance of > hitting by a meteor. However, if we prefix with `__amdgcn_`, there is no such > risk. And we have the benefit to clearly indicate it is a amdgcn > target-specific type. Should use amdgpu https://githu

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,95 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu verde -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,14 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple amdgcn-unknown

[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-12 Thread Matt Arsenault via cfe-commits

@@ -128,12 +128,13 @@ enum class CudaArch { GFX12_GENERIC, GFX1200, GFX1201, + AMDGCNSPIRV, Generic, // A processor model named 'generic' if the target backend defines a // public one. LAST, CudaDefault = CudaArch::SM_52, - HIPDefault = CudaArch::

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits

arsenm wrote: > Or drop the new nodes altogether and legelaize to intrinsics directly ? That's another option. The only real plus to the intermediate is it's slightly less annoying to write combines for. But there are limited combining opportunities for these https://github.com/llvm/llvm-p

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,46 @@ +# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o /dev/null %s 2>&1 | FileCheck %s arsenm wrote: I'd still test all 3, but yes an IR test https://github.com/llvm/llvm-project/pull/89217 ___

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,46 @@ +# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o /dev/null %s 2>&1 | FileCheck %s arsenm wrote: You should not need to introduce any new machine verifier tests, they are not useful. The useful test would be the IR

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits

@@ -2201,6 +2207,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const { Align = 8; \ break; #include "clang/Basic/WebAssemblyReferenceTypes.def" +case BuiltinType::AMDGPUBufferRsrc: + W

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,11 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -fsyntax-only -verify -triple amdgcn -Wno-unused-value %s + +void foo() { + int n = 100; + __buffer_rsrc_t v = 0; // expected-error {{cannot initialize a variable of type '__buffer_rsrc_t' with an rvalu

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,11 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -fsyntax-only -verify -triple amdgcn -Wno-unused-value %s + +void foo() { + int n = 100; + __buffer_rsrc_t v = 0; // expected-error {{cannot initialize a variable of type '__buffer_rsrc_t' with an rvalu

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits

@@ -2200,6 +2206,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const { Align = 8; \ break; #include "clang/Basic/WebAssemblyReferenceTypes.def" +case BuiltinType::AMDGPUBufferRsrc: + W

[clang] [llvm] [Offload][CUDA] Add initial cuda_runtime.h overlay (PR #94821)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,30 @@ +// RUN: %clang++ -foffload-via-llvm --offload-arch=native %s -o %t +// RUN: %t | %fcheck-generic + +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED: x86_64-pc-linux-gnu-

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,9 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -fclang-abi-compat=latest -triple amdgcn %s -emit-llvm -o - | FileCheck %s arsenm wrote: Why do you need -fclang-abi-compat=latest https://github.com/llvm/llvm-project/pull/94830 ___

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,21 @@ +//===-- AMDGPUTypes.def - Metadata about AMDGPU types ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apa

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/94830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -2200,6 +2206,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const { Align = 8; \ break; #include "clang/Basic/WebAssemblyReferenceTypes.def" +case BuiltinType::AMDGPUBufferRsrc: + W

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented: Need stacked PR that adds the make_buffer_rsrc builtin that shows its use https://github.com/llvm/llvm-project/pull/94830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailm

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -1091,6 +1091,9 @@ enum PredefinedTypeIDs { // \brief WebAssembly reference types with auto numeration #define WASM_TYPE(Name, Id, SingletonId) PREDEF_TYPE_##Id##_ID, #include "clang/Basic/WebAssemblyReferenceTypes.def" +// \breif AMDGPU types with auto numeration --

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -16055,6 +16145,90 @@ of the two arguments. -0.0 is considered to be less than +0.0 for this intrinsic. Note that these are the semantics specified in the draft of IEEE 754-2019. +.. _i_minimumnum: + +'``llvm.minimumnum.*``' Intrinsic +^ + +

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -16055,6 +16145,90 @@ of the two arguments. -0.0 is considered to be less than +0.0 for this intrinsic. Note that these are the semantics specified in the draft of IEEE 754-2019. +.. _i_minimumnum: + +'``llvm.minimumnum.*``' Intrinsic +^ + +

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -15874,6 +15874,96 @@ The returned value is completely identical to the input except for the sign bit; in particular, if the input is a NaN, then the quiet/signaling bit and payload are perfectly preserved. +.. _i_fminmax_family: + +'``llvm.min.*``' Intrinsics Comparation

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -15874,6 +15874,96 @@ The returned value is completely identical to the input except for the sign bit; in particular, if the input is a NaN, then the quiet/signaling bit and payload are perfectly preserved. +.. _i_fminmax_family: + +'``llvm.min.*``' Intrinsics Comparation

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -16055,6 +16145,90 @@ of the two arguments. -0.0 is considered to be less than +0.0 for this intrinsic. Note that these are the semantics specified in the draft of IEEE 754-2019. +.. _i_minimumnum: + +'``llvm.minimumnum.*``' Intrinsic +^ + +

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits

arsenm wrote: > "aggregates" here might even be unusual cases like `<4 x i8>` Vectors aren't aggregates and are more reasonable https://github.com/llvm/llvm-project/pull/94576 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits

arsenm wrote: > `voffset` and `soffset` are "offset that goes in VGPRs" and "offset that goes > in SGPRs", with the latter having some different bounds-checking semantics on > ... at least some of the gfx9's, IIRC. > Right, that's the problem. We need to know the parameters of the SRD in orde

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits

arsenm wrote: > 2. What I mean is that "types that work" isn't the right framing: any type > can be legalized to one or more types that work. That is, down in the isel > legalizer, if I call for, for example >```llvm >%0 = call {i64, i64, i8} @llvm.amdgcn.raw.buffer.ptr.load(ptr addrspa

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits

arsenm wrote: > 1. For the swizzled case, that's `struct.ptr.buffer.*`, and yeah, those will > always need builtins because LLVM can't deal in 2D addressing schemes But the raw buffer intrinsics have both the soffset and voffset parameters though? Not just the struct https://github.com/llv

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-07 Thread Matt Arsenault via cfe-commits

arsenm wrote: > Actually, even ignoring address space 7, it feels like these builtins if you > could `raw.ptr.buffer.store` any type you liked, and then they could be > type-varying in Clang? We could either have a builtin for all the types that would work, or if we want to treat them more li

[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -68,6 +68,10 @@ enum class fltNonfiniteBehavior { // `fltNanEncoding` enum. We treat all NaNs as quiet, as the available // encodings do not distinguish between signalling and quiet NaN. NanOnly, + + // This behavior is present in Float6E3M2FN and Float6E2M3FN types.

[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -878,6 +896,10 @@ void IEEEFloat::copySignificand(const IEEEFloat &rhs) { for the significand. If double or longer, this is a signalling NaN, which may not be ideal. If float, this is QNaN(0). */ void IEEEFloat::makeNaN(bool SNaN, bool Negative, const APInt *fill) {

[clang] [Clang] Add timeout for GPU detection utilities (PR #94751)

2024-06-07 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/94751 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Add timeout for GPU detection utilities (PR #94751)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -205,7 +205,7 @@ class ToolChain { /// Executes the given \p Executable and returns the stdout. llvm::Expected> - executeToolChainProgram(StringRef Executable) const; + executeToolChainProgram(StringRef Executable, unsigned Timeout = 0) const; arsenm

[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -1881,6 +1890,20 @@ TEST(APFloatTest, getSmallest) { EXPECT_TRUE(test.isFiniteNonZero()); EXPECT_TRUE(test.isDenormal()); EXPECT_TRUE(test.bitwiseIsEqual(expected)); + + test = APFloat::getSmallest(APFloat::Float6E3M2FN(), false); + expected = APFloat(APFloat::Float6

[clang] [llvm] [APFloat] Add APFloat support for FP6 data types (PR #94735)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -47,6 +47,10 @@ static std::string convertToString(double d, unsigned Prec, unsigned Pad, return std::string(Buffer.data(), Buffer.size()); } +static bool hasNanOrInf(APFloat::Semantics S) { + return (S != APFloat::S_Float6E3M2FN) && (S != APFloat::S_Float6E2M3FN); +} -

[clang] [llvm] [clang][CodeGen] `used` globals are fake (PR #93601)

2024-06-07 Thread Matt Arsenault via cfe-commits

@@ -8642,8 +8642,11 @@ The '``llvm.used``' Global Variable The ``@llvm.used`` global is an array which has :ref:`appending linkage `. This array contains a list of pointers to named global variables, functions and aliases which may optionally -have a pointer cast formed of bitc

[clang] [llvm] [clang][CodeGen] Global constructors/destructors are globals (PR #93914)

2024-06-06 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented: Is this redundant with #93601? https://github.com/llvm/llvm-project/pull/93914 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][CodeGen] `used` globals && the payloads for global ctors & dtors are globals (PR #93601)

2024-06-06 Thread Matt Arsenault via cfe-commits

arsenm wrote: Commit message also needs to be updated https://github.com/llvm/llvm-project/pull/93601 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][CodeGen] `used` globals && the payloads for global ctors & dtors are globals (PR #93601)

2024-06-06 Thread Matt Arsenault via cfe-commits

@@ -2922,18 +2922,19 @@ static void emitUsed(CodeGenModule &CGM, StringRef Name, if (List.empty()) return; + llvm::Type *UsedPtrTy = llvm::PointerType::getUnqual(CGM.getLLVMContext()); arsenm wrote: Best to just use get(Ctx, 0) https://github.com/llv

[clang] [llvm] [clang][CodeGen] `used` globals && the payloads for global ctors & dtors are globals (PR #93601)

2024-06-06 Thread Matt Arsenault via cfe-commits

@@ -2922,18 +2922,19 @@ static void emitUsed(CodeGenModule &CGM, StringRef Name, if (List.empty()) return; + llvm::Type *UsedPtrTy = llvm::PointerType::getUnqual(CGM.getLLVMContext()); + // Convert List to what ConstantArray needs. SmallVector UsedArray; UsedAr

[clang] [llvm] [clang][CodeGen] `used` globals && the payloads for global ctors & dtors are globals (PR #93601)

2024-06-06 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request. lgtm with nit https://github.com/llvm/llvm-project/pull/93601 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][CodeGen] `used` globals && the payloads for global ctors & dtors are globals (PR #93601)

2024-06-06 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/93601 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-06 Thread Matt Arsenault via cfe-commits

arsenm wrote: > If we do want addrspace(7), we'll need to expose `make.buffer.rsrc` and give > it a `p7` variant probably. Yes. We probably should expose some kind of custom type instead of directly using a C address_space(7) attribute https://github.com/llvm/llvm-project/pull/94576 ___

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-06 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,19 @@ +; RUN: not --crash llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx900 -verify-machineinstrs -o - %s 2>&1 | FileCheck %s arsenm wrote: This should also be repeated for all 3 intrinsics https://github.com/llvm/llvm-project/pull/89217 __

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-06 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm requested changes to this pull request. @jayfoad's testcase fails and the same test should be repeated for all 3 intrinsics https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-06 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-06 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,19 @@ +; RUN: not --crash llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx900 -verify-machineinstrs -o - %s 2>&1 | FileCheck %s arsenm wrote: This is not an IR verifier test, it is a codegen test that fails the machine verifier. A machine veri

[clang] [Clang][AMDGPU] Use `I` to decorate imm argument for `__builtin_amdgcn_global_load_lds` (PR #94376)

2024-06-06 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/94376 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-06 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,264 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s --check-prefixes=VERDE +// RUN: %clang_cc

[clang] [amdgpu] Pass variadic arguments without splitting (PR #94083)

2024-06-06 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,293 @@ +// REQUIRES: amdgpu-registered-target +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature +// RUN: %clang_cc1 -cc1 -std=c23 -triple amdgcn-amd-amdhsa -emit-llvm -O1 %s -o - | FileCheck %s + +void sink_0

[clang] [amdgpu] Pass variadic arguments without splitting (PR #94083)

2024-06-06 Thread Matt Arsenault via cfe-commits

arsenm wrote: > @arsenm You're right about passing larger things indirectly. I'm intending to > land this as-is, with the types inlined, as that unblocks #93362. I'm nervous > that the extra pointer indirection will hit the same memory error that > tweaking codegen in that patch hits (it's a s

[clang] [libc] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-06-06 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,1037 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apach

[clang] [libc] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-06-06 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,1037 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apach

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-06 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,264 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s --check-prefixes=VERDE +// RUN: %clang_cc

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-06 Thread Matt Arsenault via cfe-commits

arsenm wrote: > Is there really a good use case for this? Can you use regular stores to > addrspace(7) instead? @krzysz00 I see these regularly used via inline asm in various ML code. We need to expose these in some way to stop people from doing that > > Also, do you really need a separate

[clang] [Clang][AMDGPU] Use `I` to decorate imm argument for `__builtin_amdgcn_global_load_lds` (PR #94376)

2024-06-04 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented: Missing non-constant tests for each parameter? https://github.com/llvm/llvm-project/pull/94376 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [amdgpu] Pass variadic arguments without splitting (PR #94083)

2024-05-31 Thread Matt Arsenault via cfe-commits

@@ -197,12 +202,20 @@ ABIArgInfo AMDGPUABIInfo::classifyKernelArgumentType(QualType Ty) const { return ABIArgInfo::getDirect(LTy, 0, nullptr, false); } -ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType Ty, +ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType Ty,

[clang] [amdgpu] Pass variadic arguments without splitting (PR #94083)

2024-05-31 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,293 @@ +// REQUIRES: amdgpu-registered-target +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature +// RUN: %clang_cc1 -cc1 -std=c23 -triple amdgcn-amd-amdhsa -emit-llvm -O1 %s -o - | FileCheck %s + +void sink_0

[clang] [amdgpu] Pass variadic arguments without splitting (PR #94083)

2024-05-31 Thread Matt Arsenault via cfe-commits

@@ -197,12 +202,20 @@ ABIArgInfo AMDGPUABIInfo::classifyKernelArgumentType(QualType Ty) const { return ABIArgInfo::getDirect(LTy, 0, nullptr, false); } -ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType Ty, +ABIArgInfo AMDGPUABIInfo::classifyArgumentType(QualType Ty,

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-05-31 Thread Matt Arsenault via cfe-commits

@@ -32,27 +32,29 @@ class StoreInst; /// These are the kinds of recurrences that we support. enum class RecurKind { - None, ///< Not a recurrence. - Add, ///< Sum of integers. - Mul, ///< Product of integers. - Or, ///< Bitwise or logical OR of integers

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-05-31 Thread Matt Arsenault via cfe-commits

arsenm wrote: You should add the mentioned convergence-tokens.ll test function https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][CodeGen] Global constructors/destructors are globals (PR #93914)

2024-05-31 Thread Matt Arsenault via cfe-commits

arsenm wrote: > Perhaps an alternative is to tweak LangRef wording to say that that these are > always emitted as unqualified ptrs, and that their ephemeral nature implies > that their AS is meaningless? I think this is the correct way to handle it. Also we'll need a few stripPointerCasts add

[clang] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-05-31 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,1023 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apach

[clang] [llvm] [AMDGPU] Implement variadic functions by IR lowering (PR #93362)

2024-05-31 Thread Matt Arsenault via cfe-commits

@@ -0,0 +1,1023 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apach

[clang] [llvm] [clang][CodeGen] Global constructors/destructors are globals (PR #93914)

2024-05-31 Thread Matt Arsenault via cfe-commits

arsenm wrote: > The third argument here is like for llvm.used, it's a way to associate the > entry with a global or function. If the corresponding global or function is > omitted from the output then the entry will be removed. It isn't used for > anything at run time. So I think there should b

[clang] [clang][CodeGen] `used` globals && the payloads for global ctors & dtors are globals (PR #93601)

2024-05-31 Thread Matt Arsenault via cfe-commits

@@ -2928,12 +2928,13 @@ static void emitUsed(CodeGenModule &CGM, StringRef Name, for (unsigned i = 0, e = List.size(); i != e; ++i) { UsedArray[i] = llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast( -cast(&*List[i]), CGM.Int8PtrTy); ---

[clang] [llvm] [WIP] Expand variadic functions in IR (PR #89007)

2024-05-31 Thread Matt Arsenault via cfe-commits

arsenm wrote: > I think the comments here are fed into #93362 successfully, will go through > the list again to check. So #93362 is the replacement, and not the sequential next piece? Can we close this one then? https://github.com/llvm/llvm-project/pull/89007

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-05-31 Thread Matt Arsenault via cfe-commits

@@ -5005,8 +5007,11 @@ void computeKnownFPClass(const Value *V, const APInt &DemandedElts, // If either operand is not NaN, the result is not NaN. if (NeverNaN && (IID == Intrinsic::minnum || IID == Intrinsic::maxnum)) Known.knownNot(fcNan); + if (Neve

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-05-31 Thread Matt Arsenault via cfe-commits

@@ -16049,6 +16094,84 @@ of the two arguments. -0.0 is considered to be less than +0.0 for this intrinsic. Note that these are the semantics specified in the draft of IEEE 754-2019. +.. _i_minimumnum: + +'``llvm.minimumnum.*``' Intrinsic +^ + +

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-05-31 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/93841 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-05-31 Thread Matt Arsenault via cfe-commits

@@ -3636,6 +3648,22 @@ def Fmin : FPMathTemplate, LibBuiltin<"math.h"> { let OnlyBuiltinPrefixedAliasIsConstexpr = 1; } +def FmaximumNum : FPMathTemplate, LibBuiltin<"math.h"> { arsenm wrote: I'd prefer to split the clang changes into a separate change ht

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-05-31 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm commented: > 3. PowerPC: has some interaction with the behavior of `minnum/maxnum`: need > define `fcanonicalize`. AMDGPU has the same handling. This is to break the signaling nan handling from IEEE to the broken old glibc libm behavior. If we fix the definition to ma

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 1520 matches

Mail list logo