[clang] clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins (PR #96738)

2024-06-27 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96738 >From 5f614809ac4ffa5e29a01c7e9410d91eadcbe6f2 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Tue, 11 Jun 2024 10:40:27 +0200 Subject: [PATCH 1/2] clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins --- c

[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-26 Thread Matt Arsenault via cfe-commits
@@ -178,6 +178,20 @@ bool AMDGPUAtomicOptimizerImpl::run(Function &F) { return Changed; } +static bool shouldOptimize(Type *Ty) { arsenm wrote: Better name that expresses why this type is handleable. Also in a follow up, really should cover the i16/half/b

[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-26 Thread Matt Arsenault via cfe-commits
@@ -178,6 +178,20 @@ bool AMDGPUAtomicOptimizerImpl::run(Function &F) { return Changed; } +static bool shouldOptimize(Type *Ty) { + switch (Ty->getTypeID()) { + case Type::FloatTyID: + case Type::DoubleTyID: +return true; + case Type::IntegerTyID: { +if (Ty->getI

[clang] clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins (PR #96738)

2024-06-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/96738 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins (PR #96738)

2024-06-26 Thread Matt Arsenault via cfe-commits
arsenm wrote: * **#96739** https://app.graphite.dev/github/pr/llvm/llvm-project/96739?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#96738** https://app.graphite.dev/github/pr/llvm/llvm-proj

[clang] clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins (PR #96738)

2024-06-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/96738 None >From 0d9ab2bcbaa2b4b11832a8ac1848505cf73f4880 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Tue, 11 Jun 2024 10:40:27 +0200 Subject: [PATCH] clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins ---

[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-25 Thread Matt Arsenault via cfe-commits
arsenm wrote: > > > > Incrementing by align is just a bug, of course the size is the real > > > > value. Whether we want to continue wasting space is another > > > > not-correctness discussion > > > > > > > > > Struct padding is pretty universal, AMDGPU seems the odd one out here. I > > > wo

[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-25 Thread Matt Arsenault via cfe-commits
arsenm wrote: > > Incrementing by align is just a bug, of course the size is the real value. > > Whether we want to continue wasting space is another not-correctness > > discussion > > Struct padding is pretty universal, AMDGPU seems the odd one out here. I > wouldn't mind it so much if it di

[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-25 Thread Matt Arsenault via cfe-commits
@@ -228,10 +228,11 @@ void AMDGPUAtomicOptimizerImpl::visitAtomicRMWInst(AtomicRMWInst &I) { // If the value operand is divergent, each lane is contributing a different // value to the atomic calculation. We can only optimize divergent values if - // we have DPP availabl

[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-25 Thread Matt Arsenault via cfe-commits
@@ -311,10 +312,11 @@ void AMDGPUAtomicOptimizerImpl::visitIntrinsicInst(IntrinsicInst &I) { // If the value operand is divergent, each lane is contributing a different // value to the atomic calculation. We can only optimize divergent values if - // we have DPP availabl

[clang] [llvm] [AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types (PR #92725)

2024-06-25 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/92725 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.ptr.buffer.store` (PR #94576)

2024-06-25 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/94576 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-24 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang] Improve diagnostics for constraints of inline asm (NFC) (PR #96363)

2024-06-24 Thread Matt Arsenault via cfe-commits
@@ -2626,14 +2629,20 @@ void CodeGenFunction::EmitAsmStmt(const AsmStmt &S) { SmallVector OutputConstraintInfos; SmallVector InputConstraintInfos; + const FunctionDecl *FD = dyn_cast_or_null(CurCodeDecl); arsenm wrote: I think we should just get rid of d

[clang] [clang] Improve diagnostics for constraints of inline asm (NFC) (PR #96363)

2024-06-24 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: It's really unfortunate to have to add all this asm handling to clang. Can't it rely on backend diagnostic remarks for this? https://github.com/llvm/llvm-project/pull/96363 ___ cfe-commits mailing list cfe-commits

[clang] [clang] Improve diagnostics for constraints of inline asm (NFC) (PR #96363)

2024-06-24 Thread Matt Arsenault via cfe-commits
@@ -2626,14 +2629,20 @@ void CodeGenFunction::EmitAsmStmt(const AsmStmt &S) { SmallVector OutputConstraintInfos; SmallVector InputConstraintInfos; + const FunctionDecl *FD = dyn_cast_or_null(CurCodeDecl); arsenm wrote: Where do you get dyn_cast_or_null i

[clang] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96473)

2024-06-24 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Kindly review only the top commit here If you're going to repost with a pre-commit, it would be better to have all the pieces squashed into one. Also you could look into using graphite or SPR for managing dependent pull requests https://github.com/llvm/llvm-project/pull/96473

[clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-23 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/95396 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] AMDGPU: Start selecting buffer fat pointer atomicrmw fmin/fmax (PR #95593)

2024-06-23 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/95593 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-23 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/95592 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] AMDGPU: Start selecting flat/global atomicrmw fmin/fmax. (PR #95592)

2024-06-23 Thread Matt Arsenault via cfe-commits
arsenm wrote: ### Merge activity * **Jun 23, 4:06 AM EDT**: @arsenm started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/95592). https://github.com/llvm/llvm-project/pull/95592 __

[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Matt Arsenault via cfe-commits
arsenm wrote: Incrementing by align is just a bug, of course the size is the real value. Whether we want to continue wasting space is another not-correctness discussion https://github.com/llvm/llvm-project/pull/96370 ___ cfe-commits mailing list cfe-

[clang] [llvm] [LLVM] Fix incorrect alignment on AMDGPU variadics (PR #96370)

2024-06-22 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Here, because the minimum alignment is 4, we will only increment the buffer by 4, It should be incrementing by the size? 4 byte aligned access of 8 byte type should work fine https://github.com/llvm/llvm-project/pull/96370 ___ cfe-co

[clang] [libc] [llvm] [libc] Implement (v|f)printf on the GPU (PR #96369)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -1671,6 +1671,7 @@ int main(int Argc, char **Argv) { NewArgv.push_back(Arg->getValue()); for (const opt::Arg *Arg : Args.filtered(OPT_offload_opt_eq_minus)) NewArgv.push_back(Args.MakeArgString(StringRef("-") + Arg->getValue())); + llvm::errs() << "asdfasdf\n"; --

[clang] [Clang] Replace `emitXXXBuiltin` with a unified interface (PR #96313)

2024-06-21 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/96313 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -149,6 +149,12 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", "nc") BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc") +BUILTIN(__builtin_amdgcn_raw_buffer_store_b8, "vcQbiiIi", "n") +BUILT

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -149,6 +149,12 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", "nc") BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc") +BUILTIN(__builtin_amdgcn_raw_buffer_store_b8, "vcQbiiIi", "n") +BUILT

[clang] [Clang] Replace `emitXXXBuiltin` with a unified interface (PR #96313)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -581,49 +581,19 @@ static Value *emitCallMaybeConstrainedFPBuiltin(CodeGenFunction &CGF, return CGF.Builder.CreateCall(F, Args); } -// Emit a simple mangled intrinsic that has 1 argument and a return type -// matching the argument type. -static Value *emitUnaryBuiltin(

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -149,6 +149,19 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", "nc") BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc") +BUILTIN(__builtin_amdgcn_raw_ptr_buffer_store_i8, "vcQbiiIi", "n") --

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -149,6 +149,19 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", "nc") BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc") +BUILTIN(__builtin_amdgcn_raw_ptr_buffer_store_i8, "vcQbiiIi", "n") --

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -626,6 +626,18 @@ static Value *emitQuaternaryBuiltin(CodeGenFunction &CGF, const CallExpr *E, return CGF.Builder.CreateCall(F, {Src0, Src1, Src2, Src3}); } +static Value *emitQuinaryBuiltin(CodeGenFunction &CGF, const CallExpr *E, arsenm wrote: The nam

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/94576 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -149,6 +149,19 @@ BUILTIN(__builtin_amdgcn_mqsad_pk_u16_u8, "WUiWUiUiWUi", "nc") BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") BUILTIN(__builtin_amdgcn_make_buffer_rsrc, "Qbv*sii", "nc") +BUILTIN(__builtin_amdgcn_raw_ptr_buffer_store_i8, "vcQbiiIi", "n") --

[clang] [Clang][AMDGPU] Add builtins for instrinsic `llvm.amdgcn.raw.buffer.store` (PR #94576)

2024-06-21 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: I'm wondering if we should really have all the different typed variants, and if this should be the name. I guess https://github.com/llvm/llvm-project/pull/94576 ___ cfe-commits mailing list cfe-commits@lists.llvm.o

[clang] Enable ASAN in amdgpu toolchain for OpenCL (PR #96262)

2024-06-21 Thread Matt Arsenault via cfe-commits
@@ -169,6 +180,11 @@ // COMMON-UNSAFE-MATH-SAME: "-mlink-builtin-bitcode" "{{.*}}/amdgcn/bitcode/oclc_finite_only_off.bc" // COMMON-UNSAFE-MATH-SAME: "-mlink-builtin-bitcode" "{{.*}}/amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc" +// ASAN-SAME: "-fsanitize=address" + +//

[clang] Enable ASAN in amdgpu toolchain for OpenCL (PR #96262)

2024-06-21 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/96262 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Enable ASAN in amdgpu toolchain for OpenCL (PR #96262)

2024-06-21 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/96262 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-18 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/95396 >From f0f8e09caff2df5632d4252ca354b24c0c6f0e87 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 10 Jun 2024 19:48:13 +0200 Subject: [PATCH] AMDGPU: Remove ds atomic fadd intrinsics These have been replace

[clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-18 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/95396 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-18 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/95395 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-18 Thread Matt Arsenault via cfe-commits
@@ -33,6 +33,7 @@ // q -> Scalable vector, followed by the number of elements and the base type. // Q -> target builtin type, followed by a character to distinguish the builtin type //Qa -> AArch64 svcount_t builtin type. +//Qb -> AMDGPU __amdgpu_buffer_rsrc_t builti

[clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-18 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/95395 >From 35c741fe2563094bc20c179ee9f244620025405c Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 10 Jun 2024 19:40:59 +0200 Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins We should hav

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -33,6 +33,7 @@ // q -> Scalable vector, followed by the number of elements and the base type. // Q -> target builtin type, followed by a character to distinguish the builtin type //Qa -> AArch64 svcount_t builtin type. +//Qb -> AMDGPU __amdgpu_buffer_rsrc_t builti

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -19082,6 +19082,15 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(Intrinsic::amdgcn_s_sendmsg_rtn, {ResultType}); return Builder.CreateCall(F, {Arg}); } + case AMDGPU::BI__builtin_amdgcn_make_buffer_rsrc: { +llvm::Va

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. LGTM but I'm not a frontend expert https://github.com/llvm/llvm-project/pull/94830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commit

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,11 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -verify -triple amdgcn-amd-amdhsa -Wno-unused-value %s arsenm wrote: set explicit -cl-std, and check 1.2 and 2.0? https://github.com/llvm/llvm-project/pull/94830 ___

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -305,43 +304,43 @@ bool SIAnnotateControlFlow::handleLoop(BranchInst *Term) { } /// Close the last opened control flow -bool SIAnnotateControlFlow::closeControlFlow(BasicBlock *BB) { - llvm::Loop *L = LI->getLoopFor(BB); +bool SIAnnotateControlFlow::tryWaveReconverge(Basic

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1 @@ +remark: :0:0: removing function 'needs_extimg': +extended-image-insts is not supported on the current target arsenm wrote: accidentally added file? https://github.com/llvm/llvm-project/pull/92809 ___ c

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -15740,6 +15740,32 @@ void SITargetLowering::finalizeLowering(MachineFunction &MF) const { } } + // ISel inserts copy to regs for the successor PHIs + // at the BB end. We need to move the SI_WAVE_RECONVERGE right before the arsenm wrote: Can you

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -2103,12 +2103,36 @@ bool SIInstrInfo::expandPostRAPseudo(MachineInstr &MI) const { MI.setDesc(get(AMDGPU::S_MOV_B64)); break; + case AMDGPU::S_CMOV_B64_term: +// This is only a terminator to get the correct spill code placement during +// register allocat

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -1,3 +1,4 @@ +; XFAIL: * arsenm wrote: can't just xfail tests https://github.com/llvm/llvm-project/pull/92809 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -3172,8 +3172,8 @@ def int_amdgcn_loop : Intrinsic<[llvm_i1_ty], [llvm_anyint_ty], [IntrWillReturn, IntrNoCallback, IntrNoFree] >; -def int_amdgcn_end_cf : Intrinsic<[], [llvm_anyint_ty], - [IntrWillReturn, IntrNoCallback, IntrNoFree]>; +def int_amdgcn_wave_reconverge :

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -15740,6 +15740,32 @@ void SITargetLowering::finalizeLowering(MachineFunction &MF) const { } } + // ISel inserts copy to regs for the successor PHIs + // at the BB end. We need to move the SI_WAVE_RECONVERGE right before the + // branch. + for (auto &MBB : MF) {

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/92809 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Change CF intrinsics lowering to reconverge on predecessors. (PR #92809)

2024-06-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm requested changes to this pull request. There are quite a few code quality regressions, and XFAILed tests. The description needs more elaboration on what the strategy is here https://github.com/llvm/llvm-project/pull/92809 _

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-17 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][CodeGen] Add query for a target's flat address space (PR #95728)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -1764,6 +1764,13 @@ class TargetInfo : public TransferrableTargetInfo, return 0; } + /// \returns Target specific flat ptr address space; a flat ptr is a ptr that + /// can be casted to / from all other target address spaces. If the target + /// exposes no such add

[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,86 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s +// RUN: %clang_cc1 %s -O0 -triple amdg

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,84 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature + // REQUIRES: amdgpu-registered-target + // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s + // RUN

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,84 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature + // REQUIRES: amdgpu-registered-target + // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s + // RUN

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,9 @@ + arsenm wrote: Extra blank line https://github.com/llvm/llvm-project/pull/94830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,17 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -fsyntax-only -verify -std=gnu++11 -triple amdgcn -Wno-unused-value %s + arsenm wrote: We probably want another similar sema test for OpenCL/HIP/OpenMP https://github.com/llvm/llvm-pro

[clang] [clang][CodeGen] Add query for a target's flat address space (PR #95728)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -1764,6 +1764,13 @@ class TargetInfo : public TransferrableTargetInfo, return 0; } + /// \returns Target specific flat ptr address space; a flat ptr is a ptr that + /// can be casted to / from all other target address spaces. If the target + /// exposes no such add

[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,86 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s +// RUN: %clang_cc1 %s -O0 -triple amdg

[clang] [Clang] [WIP] Added builtin_alloca support for OpenCL1.2 and below (PR #95750)

2024-06-17 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,86 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5 +// RUN: %clang_cc1 %s -O0 -triple amdgcn-amd-amdhsa -cl-std=CL1.2 -emit-llvm -o - | FileCheck --check-prefix=OPENCL12 %s +// RUN: %clang_cc1 %s -O0 -triple amdg

[clang] [llvm] clang/AMDGPU: Emit atomicrmw from ds_fadd builtins (PR #95395)

2024-06-15 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/95395 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/89217 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,21 @@ +//===-- AMDGPUTypes.def - Metadata about AMDGPU types ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apa

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,84 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature + // REQUIRES: amdgpu-registered-target + // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s + // RUN

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,21 @@ +//===-- AMDGPUTypes.def - Metadata about AMDGPU types ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apa

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -6129,13 +6150,55 @@ static SDValue lowerLaneOp(const SITargetLowering &TLI, SDNode *N, if (ValSize % 32 != 0) return SDValue(); + auto unrollLaneOp = [&DAG, &SL](SDNode *N) -> SDValue { +EVT VT = N->getValueType(0); +unsigned NE = VT.getVectorNumElements();

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-14 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,65 @@ +; RUN: llc -stop-after=amdgpu-isel -mtriple=amdgcn-- -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck --check-prefixes=CHECK,ISEL %s + +; CHECK-LABEL: name:basic_readfirstlane_i64 +; CHECK:[[TOKEN:%[0-9]+]]{{[^ ]*}} = CONVERGENCECTRL

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-13 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,69 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py + // REQUIRES: amdgpu-registered-target + // RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s + // RUN: %clang_cc1 -triple amdgcn-unkn

[clang] [clang-tools-extra] [compiler-rt] [flang] [libc] [lld] [lldb] [llvm] [mlir] [openmp] [llvm-project] Fix typo "seperate" (PR #95373)

2024-06-13 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/95373 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Offload] Introduce the concept of "default streams" (PR #95371)

2024-06-13 Thread Matt Arsenault via cfe-commits
@@ -1125,6 +1125,22 @@ void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA, CmdArgs.push_back("__clang_openmp_device_functions.h"); } + if (Args.hasArg(options::OPT_foffload_via_llvm)) { +// Add llvm_wrappers/* to our system include path. This

[clang] [llvm] [Offload] Introduce the concept of "default streams" (PR #95371)

2024-06-13 Thread Matt Arsenault via cfe-commits
@@ -1125,6 +1125,22 @@ void Clang::AddPreprocessingOptions(Compilation &C, const JobAction &JA, CmdArgs.push_back("__clang_openmp_device_functions.h"); } + if (Args.hasArg(options::OPT_foffload_via_llvm)) { +// Add llvm_wrappers/* to our system include path. This

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-13 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Just a note - and maybe this was already discussed above - is there good > reason not to explicitly make this type a 128-bit scalar? The LLVM data > layout already does this I thought this was the 160 bit version? Can we have an opaque-but-sized type? The concern is exposing

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-13 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,95 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu verde -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-13 Thread Matt Arsenault via cfe-commits
arsenm wrote: > I understand the chance of conflict is low. It may be like the chance of > hitting by a meteor. However, if we prefix with `__amdgcn_`, there is no such > risk. And we have the benefit to clearly indicate it is a amdgcn > target-specific type. Should use amdgpu https://githu

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,95 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -cl-std=CL2.0 -target-cpu verde -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple

[clang] [Clang][AMDGPU] Add a builtin for llvm.amdgcn.make.buffer.rsrc intrinsic (PR #95276)

2024-06-12 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,14 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu verde -emit-llvm -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple amdgcn-unknown

[clang] [llvm] [clang][Driver] Add HIPAMD Driver support for AMDGCN flavoured SPIR-V (PR #95061)

2024-06-12 Thread Matt Arsenault via cfe-commits
@@ -128,12 +128,13 @@ enum class CudaArch { GFX12_GENERIC, GFX1200, GFX1201, + AMDGCNSPIRV, Generic, // A processor model named 'generic' if the target backend defines a // public one. LAST, CudaDefault = CudaArch::SM_52, - HIPDefault = CudaArch::

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Or drop the new nodes altogether and legelaize to intrinsics directly ? That's another option. The only real plus to the intermediate is it's slightly less annoying to write combines for. But there are limited combining opportunities for these https://github.com/llvm/llvm-p

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,46 @@ +# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o /dev/null %s 2>&1 | FileCheck %s arsenm wrote: I'd still test all 3, but yes an IR test https://github.com/llvm/llvm-project/pull/89217 ___

[clang] [llvm] [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types (PR #89217)

2024-06-12 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,46 @@ +# RUN: not --crash llc -mtriple=amdgcn -run-pass=none -verify-machineinstrs -o /dev/null %s 2>&1 | FileCheck %s arsenm wrote: You should not need to introduce any new machine verifier tests, they are not useful. The useful test would be the IR

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits
@@ -2201,6 +2207,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const { Align = 8; \ break; #include "clang/Basic/WebAssemblyReferenceTypes.def" +case BuiltinType::AMDGPUBufferRsrc: + W

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,11 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -fsyntax-only -verify -triple amdgcn -Wno-unused-value %s + +void foo() { + int n = 100; + __buffer_rsrc_t v = 0; // expected-error {{cannot initialize a variable of type '__buffer_rsrc_t' with an rvalu

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,11 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -fsyntax-only -verify -triple amdgcn -Wno-unused-value %s + +void foo() { + int n = 100; + __buffer_rsrc_t v = 0; // expected-error {{cannot initialize a variable of type '__buffer_rsrc_t' with an rvalu

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-10 Thread Matt Arsenault via cfe-commits
@@ -2200,6 +2206,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const { Align = 8; \ break; #include "clang/Basic/WebAssemblyReferenceTypes.def" +case BuiltinType::AMDGPUBufferRsrc: + W

[clang] [llvm] [Offload][CUDA] Add initial cuda_runtime.h overlay (PR #94821)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,30 @@ +// RUN: %clang++ -foffload-via-llvm --offload-arch=native %s -o %t +// RUN: %t | %fcheck-generic + +// UNSUPPORTED: aarch64-unknown-linux-gnu +// UNSUPPORTED: aarch64-unknown-linux-gnu-LTO +// UNSUPPORTED: x86_64-pc-linux-gnu +// UNSUPPORTED: x86_64-pc-linux-gnu-

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,9 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -fclang-abi-compat=latest -triple amdgcn %s -emit-llvm -o - | FileCheck %s arsenm wrote: Why do you need -fclang-abi-compat=latest https://github.com/llvm/llvm-project/pull/94830 ___

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,21 @@ +//===-- AMDGPUTypes.def - Metadata about AMDGPU types ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apa

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/94830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -2200,6 +2206,9 @@ TypeInfo ASTContext::getTypeInfoImpl(const Type *T) const { Align = 8; \ break; #include "clang/Basic/WebAssemblyReferenceTypes.def" +case BuiltinType::AMDGPUBufferRsrc: + W

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: Need stacked PR that adds the make_buffer_rsrc builtin that shows its use https://github.com/llvm/llvm-project/pull/94830 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailm

[clang] [Clang][AMDGPU] Add a new builtin type for buffer rsrc (PR #94830)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -1091,6 +1091,9 @@ enum PredefinedTypeIDs { // \brief WebAssembly reference types with auto numeration #define WASM_TYPE(Name, Id, SingletonId) PREDEF_TYPE_##Id##_ID, #include "clang/Basic/WebAssemblyReferenceTypes.def" +// \breif AMDGPU types with auto numeration --

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -16055,6 +16145,90 @@ of the two arguments. -0.0 is considered to be less than +0.0 for this intrinsic. Note that these are the semantics specified in the draft of IEEE 754-2019. +.. _i_minimumnum: + +'``llvm.minimumnum.*``' Intrinsic +^ + +

[clang] [llvm] Intrinsic: introduce minimumnum and maximumnum (PR #93841)

2024-06-07 Thread Matt Arsenault via cfe-commits
@@ -16055,6 +16145,90 @@ of the two arguments. -0.0 is considered to be less than +0.0 for this intrinsic. Note that these are the semantics specified in the draft of IEEE 754-2019. +.. _i_minimumnum: + +'``llvm.minimumnum.*``' Intrinsic +^ + +

<    1   2   3   4   5   6   7   8   9   10   >