[clang] [llvm] [NVPTX] Support i256 load/store with 256-bit vector load (PR #155198)

2025-08-28 Thread Alex MacLean via cfe-commits
AlexMaclean wrote: Yea! working on a fix for the test now! https://github.com/llvm/llvm-project/pull/155198 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Support i256 load/store with 256-bit vector load (PR #155198)

2025-08-28 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/155198 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Support i256 load/store with 256-bit vector load (PR #155198)

2025-08-28 Thread Alex MacLean via cfe-commits
@@ -198,6 +198,12 @@ static bool IsPTXVectorType(MVT VT) { static std::optional> getVectorLoweringShape(EVT VectorEVT, const NVPTXSubtarget &STI, unsigned AddressSpace) { + const bool CanLowerTo256Bit = STI.has256BitVectorLoadStore(AddressSpace); + + if

[clang] [llvm] [NVPTX] Support i256 load/store with 256-bit vector load (PR #155198)

2025-08-28 Thread Alex MacLean via cfe-commits
@@ -1506,3 +1506,98 @@ define void @local_volatile_4xdouble(ptr addrspace(5) %a, ptr addrspace(5) %b) { store volatile <4 x double> %a.load, ptr addrspace(5) %b ret void } + +define void @test_i256_global(ptr addrspace(1) %a, ptr addrspace(1) %b) { AlexMac

[clang] [llvm] [NVPTX] Support i256 load/store with 256-bit vector load (PR #155198)

2025-08-27 Thread Alex MacLean via cfe-commits
@@ -3085,9 +3089,114 @@ SDValue NVPTXTargetLowering::LowerVASTART(SDValue Op, SelectionDAG &DAG) const { MachinePointerInfo(SV)); } +/// ReplaceVectorLoad - Convert vector loads into multi-output scalar loads. AlexMaclean wrote: Fixed

[clang] [llvm] [NVPTX] Support i256 load/store with 256-bit vector load (PR #155198)

2025-08-26 Thread Alex MacLean via cfe-commits
@@ -3085,9 +3089,114 @@ SDValue NVPTXTargetLowering::LowerVASTART(SDValue Op, SelectionDAG &DAG) const { MachinePointerInfo(SV)); } +/// ReplaceVectorLoad - Convert vector loads into multi-output scalar loads. +static std::optional> +replaceLoadVector(SD

[clang] [llvm] [mlir] [NVPTX] Auto-upgrade nvvm.grid_constant to param attribute (PR #155489)

2025-08-26 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean created https://github.com/llvm/llvm-project/pull/155489 Upgrade the !"grid_constant" !nvvm.annotation to a "nvvm.grid_constant" attribute. This attribute is much simpler for front-ends to apply and faster and simpler to query. >From 178af6d5a6ae46a1db9969fab050

[clang] [llvm] [NVPTX] Support i256 load/store with 256-bit vector load (PR #155198)

2025-08-25 Thread Alex MacLean via cfe-commits
@@ -1506,3 +1506,69 @@ define void @local_volatile_4xdouble(ptr addrspace(5) %a, ptr addrspace(5) %b) { store volatile <4 x double> %a.load, ptr addrspace(5) %b ret void } + +define void @test_i256_global(ptr addrspace(1) %a, ptr addrspace(1) %b) { +; SM90-LABEL: test_i256

[clang] [llvm] [NVPTX] Support i256 load/store with 256-bit vector load (PR #155198)

2025-08-25 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/155198 >From f7e19af684716853b9b0ea6f0db101db35c9ec77 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Mon, 25 Aug 2025 02:45:10 + Subject: [PATCH 1/2] [NVPTX] Support i256 load/store with 256-bit vector load

[clang] [llvm] [NVPTX] Support i256 load/store with 256-bit vector load (PR #155198)

2025-08-24 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/155198 >From 5a4d50f683166738c5ee03ac032ef4a3c6c0fec9 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Mon, 25 Aug 2025 02:45:10 + Subject: [PATCH] [NVPTX] Support i256 load/store with 256-bit vector load ---

[clang] [llvm] [NVPTX] Support i256 load/store with 256-bit vector load (PR #155198)

2025-08-24 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/155198 >From e8e640ba75a072ebc9dfe0d004c48a42dbbb82ce Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Mon, 25 Aug 2025 02:45:10 + Subject: [PATCH] [NVPTX] Support i256 load/store with 256-bit vector load ---

[clang] [llvm] [NVPTX] Change the alloca address space in NVPTXLowerAlloca (PR #154814)

2025-08-22 Thread Alex MacLean via cfe-commits
@@ -51,86 +59,66 @@ char NVPTXLowerAlloca::ID = 1; INITIALIZE_PASS(NVPTXLowerAlloca, "nvptx-lower-alloca", "Lower Alloca", false, false) -// = -// Main function for this pass. -// ===

[clang] [llvm] [NVPTX] Change the alloca address space in NVPTXLowerAlloca (PR #154814)

2025-08-22 Thread Alex MacLean via cfe-commits
@@ -139,21 +134,20 @@ define void @foo4() { ; PTX32-EMPTY: ; PTX32-NEXT: // %bb.0: ; PTX32-NEXT:mov.b32 %SPL, __local_depot3; -; PTX32-NEXT:cvta.local.u32 %SP, %SPL; -; PTX32-NEXT:add.u32 %r1, %SP, 0; -; PTX32-NEXT:add.u32 %r2, %SPL, 0; -; PTX32-NEXT:add.u3

[clang] [llvm] [NVPTX] Change the alloca address space in NVPTXLowerAlloca (PR #154814)

2025-08-22 Thread Alex MacLean via cfe-commits
@@ -216,24 +214,23 @@ define dso_local ptx_kernel void @escape_ptr_store(ptr nocapture noundef writeon ; ; PTX-LABEL: escape_ptr_store( ; PTX: { -; PTX-NEXT:.local .align 4 .b8 __local_depot4[8]; +; PTX-NEXT:.local .align 8 .b8 __local_depot4[8];

[clang] [llvm] [NVPTX] Change the alloca address space in NVPTXLowerAlloca (PR #154814)

2025-08-22 Thread Alex MacLean via cfe-commits
@@ -371,6 +371,8 @@ void NVPTXPassConfig::addIRPasses() { if (getOptLevel() != CodeGenOptLevel::None) { addAddressSpaceInferencePasses(); addStraightLineScalarOptimizationPasses(); + } else { +addPass(createNVPTXLowerAllocaPass()); AlexMaclean wr

[clang] [llvm] [NVPTX] Change the alloca address space in NVPTXLowerAlloca (PR #154814)

2025-08-22 Thread Alex MacLean via cfe-commits
@@ -51,86 +59,66 @@ char NVPTXLowerAlloca::ID = 1; INITIALIZE_PASS(NVPTXLowerAlloca, "nvptx-lower-alloca", "Lower Alloca", false, false) -// = -// Main function for this pass. -// ===

[clang] [llvm] [NVPTX] Change the alloca address space in NVPTXLowerAlloca (PR #154814)

2025-08-22 Thread Alex MacLean via cfe-commits
@@ -51,86 +59,66 @@ char NVPTXLowerAlloca::ID = 1; INITIALIZE_PASS(NVPTXLowerAlloca, "nvptx-lower-alloca", "Lower Alloca", false, false) -// = -// Main function for this pass. -// ===

[clang] [llvm] [NVPTX] Change the alloca address space in NVPTXLowerAlloca (PR #154814)

2025-08-22 Thread Alex MacLean via cfe-commits
@@ -1484,13 +1485,11 @@ void NVPTXAsmPrinter::setAndEmitFunctionVirtualRegisters( if (NumBytes) { O << "\t.local .align " << MFI.getMaxAlign().value() << " .b8 \t" << DEPOTNAME << getFunctionNumber() << "[" << NumBytes << "];\n"; -if (static_cast(MF.getTarget()

[clang] [llvm] [NVPTX] Change the alloca address space in NVPTXLowerAlloca (PR #154814)

2025-08-22 Thread Alex MacLean via cfe-commits
@@ -46,21 +46,11 @@ void NVPTXFrameLowering::emitPrologue(MachineFunction &MF, // Emits // mov %SPL, %depot; -// cvta.local %SP, %SPL; // for local address accesses in MF. -bool Is64Bit = -static_cast(MF.getTarget()).is64Bit(); -unsigned Cv

[clang] [llvm] [NVPTX] Consolidate and cleanup various NVPTXISD nodes (NFC) (PR #145581)

2025-08-14 Thread Alex MacLean via cfe-commits
AlexMaclean wrote: @ThomasRaoux thanks for the investigation. Please confirm if https://github.com/llvm/llvm-project/pull/153730 fixes the issue! https://github.com/llvm/llvm-project/pull/145581 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] [llvm] [NVPTX] Fix v2i8 call lowering, use generic ld/st nodes for call params (PR #146930)

2025-07-08 Thread Alex MacLean via cfe-commits
@@ -1487,14 +1380,39 @@ SDValue NVPTXTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI, // After all vararg is processed, 'VAOffset' holds the size of the // vararg byte array. - SDValue VADeclareParam; // vararg byte array + SDValue VADecl

[clang] [llvm] [NVPTX] Fix v2i8 call lowering, use generic ld/st nodes for call params (PR #146930)

2025-07-08 Thread Alex MacLean via cfe-commits
@@ -5754,47 +5540,106 @@ static SDValue combineADDRSPACECAST(SDNode *N, return SDValue(); } +static SDValue sinkProxyReg(SDValue R, SDValue Chain, +TargetLowering::DAGCombinerInfo &DCI) { + switch (R.getOpcode()) { + case ISD::TRUNCATE: + case

[clang] [llvm] [NVPTX] Fix v2i8 call lowering, use generic ld/st nodes for call params (PR #146930)

2025-07-08 Thread Alex MacLean via cfe-commits
@@ -5754,47 +5540,106 @@ static SDValue combineADDRSPACECAST(SDNode *N, return SDValue(); } +static SDValue sinkProxyReg(SDValue R, SDValue Chain, +TargetLowering::DAGCombinerInfo &DCI) { AlexMaclean wrote: Added! https://github

[clang] [llvm] [NVPTX] Consolidate and cleanup various NVPTXISD nodes (NFC) (PR #145581)

2025-06-25 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/145581 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Consolidate and cleanup various NVPTXISD nodes (NFC) (PR #145581)

2025-06-24 Thread Alex MacLean via cfe-commits
@@ -2174,23 +2129,40 @@ let mayStore = true in { []>; } -let isCall=1 in { - multiclass CALL { - def PrintCallNoRetInst : NVPTXInst<(outs), (ins), - OpcStr # " ", [(OpNode 0)]>; - def PrintCallRetInst1 : NVPTXInst<(outs), (ins), - OpcStr

[clang] [llvm] [NVPTX] Consolidate and cleanup various NVPTXISD nodes (NFC) (PR #145581)

2025-06-24 Thread Alex MacLean via cfe-commits
@@ -1750,19 +1739,31 @@ def BFMOV16i : MOVi; def FMOV32i : MOVi; def FMOV64i : MOVi; -def : Pat<(i32 (Wrapper texternalsym:$dst)), (IMOV32i texternalsym:$dst)>; -def : Pat<(i64 (Wrapper texternalsym:$dst)), (IMOV64i texternalsym:$dst)>; + +def to_tglobaladdr : SDNodeXFormgetTa

[clang] [llvm] [NVPTX] Consolidate and cleanup various NVPTXISD nodes (NFC) (PR #145581)

2025-06-24 Thread Alex MacLean via cfe-commits
@@ -2174,23 +2129,40 @@ let mayStore = true in { []>; } -let isCall=1 in { - multiclass CALL { - def PrintCallNoRetInst : NVPTXInst<(outs), (ins), - OpcStr # " ", [(OpNode 0)]>; - def PrintCallRetInst1 : NVPTXInst<(outs), (ins), - OpcStr

[clang] [llvm] [NVPTX] Consolidate and cleanup various NVPTXISD nodes (NFC) (PR #145581)

2025-06-24 Thread Alex MacLean via cfe-commits
@@ -909,20 +907,9 @@ bool NVPTXDAGToDAGISel::tryIntrinsicNoChain(SDNode *N) { switch (IID) { AlexMaclean wrote: Oops, removed https://github.com/llvm/llvm-project/pull/145581 ___ cfe-commits mailing list cfe-commits

[clang] [llvm] [NVPTX] Add pm_event intrinsics (PR #141278)

2025-05-26 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/141278 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] Reland "[NVPTX] Unify and extend barrier{.cta} intrinsic support" (PR #141143)

2025-05-22 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/141143 >From a46075f9aa3970735104cbcf2503ebef89db Mon Sep 17 00:00:00 2001 From: Alex MacLean Date: Wed, 21 May 2025 08:14:15 -0700 Subject: [PATCH 1/2] [NVPTX] Unify and extend barrier{.cta} intrinsic support

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-22 Thread Alex MacLean via cfe-commits
AlexMaclean wrote: > This merge broke our builds on Halide. > > ``` > Unhandled exception: Error: Could not find PTX barrier intrinsic > (llvm.nvvm.barrier0) > ``` > > We have [an `.ll` > file](https://github.com/halide/Halide/blob/main/src/runtime/ptx_dev.ll) > declaring these intrinsics: >

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-21 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-20 Thread Alex MacLean via cfe-commits
@@ -71,14 +71,6 @@ define float @nvvm_rcp(float %0) { ret float %2 } -; CHECK-LABEL: @llvm_nvvm_barrier0() -define void @llvm_nvvm_barrier0() { - ; CHECK: nvvm.barrier0 - call void @llvm.nvvm.barrier0() - ret void -} - AlexMaclean wrote: I've added this

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-20 Thread Alex MacLean via cfe-commits
@@ -462,24 +462,28 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">, // NVVM synchronization op definitions //===--===// -def NVVM_Barrier0Op : NVVM_IntrOp<"barrier0"> { +def NVVM_

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-20 Thread Alex MacLean via cfe-commits
@@ -462,24 +462,28 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">, // NVVM synchronization op definitions //===--===// -def NVVM_Barrier0Op : NVVM_IntrOp<"barrier0"> { +def NVVM_

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-20 Thread Alex MacLean via cfe-commits
@@ -462,24 +462,28 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">, // NVVM synchronization op definitions //===--===// -def NVVM_Barrier0Op : NVVM_IntrOp<"barrier0"> { +def NVVM_

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-20 Thread Alex MacLean via cfe-commits
@@ -199,21 +199,58 @@ map in the following way to CUDA builtins: Barriers -'``llvm.nvvm.barrier0``' -^^^ +'``llvm.nvvm.barrier.cta.*``' +^ Syntax: """ .. code-block:: llvm - declare void @llvm.nvvm.barr

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-20 Thread Alex MacLean via cfe-commits
@@ -240,6 +240,47 @@ def BF16RT : RegTyInfo; def F16X2RT : RegTyInfo; def BF16X2RT : RegTyInfo; +// This class provides a basic wrapper around an NVPTXInst that abstracts the +// specific syntax of most PTX instructions. It automatically handles the +// construction of the

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-20 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/140615 >From babb28ef1c935f0d0cfb3b40f62be860be027010 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Thu, 15 May 2025 18:12:11 + Subject: [PATCH 1/5] [NVPTX] Unify and extend barrier{.cta} intrinsic support

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-19 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/140615 >From babb28ef1c935f0d0cfb3b40f62be860be027010 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Thu, 15 May 2025 18:12:11 + Subject: [PATCH 1/4] [NVPTX] Unify and extend barrier{.cta} intrinsic support

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-19 Thread Alex MacLean via cfe-commits
@@ -6,13 +7,15 @@ ; Use bar.sync to arrive at a pre-computed barrier number and ; wait for all threads in CTA to also arrive: define ptx_device void @test_barrier_named_cta() { -; CHECK: mov.b32 %r[[REG0:[0-9]+]], 0; -; CHECK: bar.sync %r[[REG0]]; -; CHECK: mov.b32 %r[[REG1:[

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-19 Thread Alex MacLean via cfe-commits
@@ -240,6 +240,34 @@ def BF16RT : RegTyInfo; def F16X2RT : RegTyInfo; def BF16X2RT : RegTyInfo; +// This class provides a basic wrapper around an NVPTXInst that abstracts the +// specific syntax of most PTX instructions. It automatically handles the +// construction of the

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-19 Thread Alex MacLean via cfe-commits
@@ -1349,6 +1349,10 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn, else if (Name == "clz.ll" || Name == "popc.ll" || Name == "h2f" || Name == "swap.lo.hi.b64") Expand = true; + else if (Name == "barrier0" || Name == "b

[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)

2025-05-19 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/140615 >From babb28ef1c935f0d0cfb3b40f62be860be027010 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Thu, 15 May 2025 18:12:11 + Subject: [PATCH 1/3] [NVPTX] Unify and extend barrier{.cta} intrinsic support

[clang] [llvm] [NVPTX] use untyped loads and stores where ever possible (PR #137698)

2025-05-10 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/137698 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-22 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean approved this pull request. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-22 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean commented: llvm changes LGTM, though I'm not too familiar with the MLIR portion of this change. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-21 Thread Alex MacLean via cfe-commits
@@ -2381,25 +2387,38 @@ def INT_PTX_LDG_G_v4i32_ELE : VLDG_G_ELE_V4<"u32", Int32Regs>; def INT_PTX_LDG_G_v4f32_ELE : VLDG_G_ELE_V4<"f32", Float32Regs>; -multiclass NG_TO_G { - def "" : NVPTXInst<(outs Int32Regs:$result), (ins Int32Regs:$src), - "cvta." # Str # ".u

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-19 Thread Alex MacLean via cfe-commits
@@ -3019,8 +3019,26 @@ SDValue NVPTXTargetLowering::LowerADDRSPACECAST(SDValue Op, unsigned SrcAS = N->getSrcAddressSpace(); unsigned DestAS = N->getDestAddressSpace(); if (SrcAS != llvm::ADDRESS_SPACE_GENERIC && - DestAS != llvm::ADDRESS_SPACE_GENERIC) + DestA

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-19 Thread Alex MacLean via cfe-commits
@@ -3019,8 +3019,26 @@ SDValue NVPTXTargetLowering::LowerADDRSPACECAST(SDValue Op, unsigned SrcAS = N->getSrcAddressSpace(); unsigned DestAS = N->getDestAddressSpace(); if (SrcAS != llvm::ADDRESS_SPACE_GENERIC && - DestAS != llvm::ADDRESS_SPACE_GENERIC) + DestA

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-19 Thread Alex MacLean via cfe-commits
@@ -25,6 +25,7 @@ enum AddressSpace : unsigned { ADDRESS_SPACE_CONST = 4, ADDRESS_SPACE_LOCAL = 5, ADDRESS_SPACE_TENSOR = 6, + ADDRESS_SPACE_SHARED_CLUSTER = 7, AlexMaclean wrote: I think it would be good to rename `ADDRESS_SPACE_SHARED` to `ADDRESS_SP

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-19 Thread Alex MacLean via cfe-commits
@@ -426,10 +426,7 @@ static std::optional evaluateIsSpace(Intrinsic::ID IID, unsigned AS) { case Intrinsic::nvvm_isspacep_shared: return AS == NVPTXAS::ADDRESS_SPACE_SHARED; AlexMaclean wrote: If the address space is `ADDRESS_SPACE_SHARED_CLUSTER` this i

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-19 Thread Alex MacLean via cfe-commits
@@ -176,6 +176,7 @@ enum AddressSpace : AddressSpaceUnderlyingType { Shared = 3, AlexMaclean wrote: Lets rename this to `SharedCTA` as well. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing l

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-19 Thread Alex MacLean via cfe-commits
@@ -426,10 +426,7 @@ static std::optional evaluateIsSpace(Intrinsic::ID IID, unsigned AS) { case Intrinsic::nvvm_isspacep_shared: return AS == NVPTXAS::ADDRESS_SPACE_SHARED; case Intrinsic::nvvm_isspacep_shared_cluster: -// We can't tell shared from shared_cluster

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-18 Thread Alex MacLean via cfe-commits
@@ -2381,29 +2387,41 @@ def INT_PTX_LDG_G_v4i32_ELE : VLDG_G_ELE_V4<"u32", Int32Regs>; def INT_PTX_LDG_G_v4f32_ELE : VLDG_G_ELE_V4<"f32", Float32Regs>; -multiclass NG_TO_G { +multiclass NG_TO_G Preds = []> { def "" : NVPTXInst<(outs Int32Regs:$result), (ins Int32Regs:$sr

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-18 Thread Alex MacLean via cfe-commits
@@ -0,0 +1,329 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -o - -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s +; RUN: %if ptxas-12.0 %{ llc < %s -mcpu=sm_90 -mattr=+ptx78| %ptxas-verify -arch=sm_90 %} + +tar

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-18 Thread Alex MacLean via cfe-commits
@@ -3019,8 +3019,42 @@ SDValue NVPTXTargetLowering::LowerADDRSPACECAST(SDValue Op, unsigned SrcAS = N->getSrcAddressSpace(); unsigned DestAS = N->getDestAddressSpace(); if (SrcAS != llvm::ADDRESS_SPACE_GENERIC && - DestAS != llvm::ADDRESS_SPACE_GENERIC) + DestA

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-18 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean commented: Getting close to ready, a couple more places to update: - NVPTXTargetTransformInfo.cpp: evaluateIsSpace - NVPTXUsage.rst: Address Space section, add intrinsics you're modifying, such as `mapa`, to the spec https://github.com/llvm/llvm-project/pull/13544

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-18 Thread Alex MacLean via cfe-commits
@@ -0,0 +1,48 @@ +; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_80 | FileCheck %s -check-prefixes=ALL,NOPTRCONV,CLS64 AlexMaclean wrote: Use update_llc_test_checks for this test. https://github.com/llvm/llvm-project/pull/135444 _

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-18 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-17 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/135644 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Alex MacLean via cfe-commits
@@ -1034,6 +1034,10 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, case NVPTX::BI__nvvm_fmin_xorsign_abs_f16x2: return MakeHalfType(Intrinsic::nvvm_fmin_xorsign_abs_f16x2, BuiltinID, E, *this); + case NVPTX::BI__nvvm_abs_bf16

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/135644 >From fd11c2b4c964a3fe336e3fcb106fca5bf9c7d2b2 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Fri, 11 Apr 2025 17:59:50 + Subject: [PATCH 1/6] [NVPTX] Cleaup and document nvvm.fabs intrinsics, adding

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-15 Thread Alex MacLean via cfe-commits
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 :

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Alex MacLean via cfe-commits
@@ -1034,6 +1034,10 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, case NVPTX::BI__nvvm_fmin_xorsign_abs_f16x2: return MakeHalfType(Intrinsic::nvvm_fmin_xorsign_abs_f16x2, BuiltinID, E, *this); + case NVPTX::BI__nvvm_abs_bf16

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Alex MacLean via cfe-commits
@@ -411,6 +412,13 @@ static Instruction *convertNvvmIntrinsicToLlvm(InstCombiner &IC, } return nullptr; } + case SPC_Fabs: { +if (!II->getType()->isDoubleTy()) + return nullptr; +auto *Fabs = Intrinsic::getOrInsertDeclaration( +II->getModule(),

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Alex MacLean via cfe-commits
@@ -309,6 +309,60 @@ space casted to this space), 1 is returned, otherwise 0 is returned. Arithmetic Intrinsics - +'``llvm.nvvm.fabs.*``' Intrinsic + + +Syntax: +""" + +.. code-block:: llvm + +declare float @llvm.nvv

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/135644 >From fd11c2b4c964a3fe336e3fcb106fca5bf9c7d2b2 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Fri, 11 Apr 2025 17:59:50 + Subject: [PATCH 1/5] [NVPTX] Cleaup and document nvvm.fabs intrinsics, adding

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/135644 >From fd11c2b4c964a3fe336e3fcb106fca5bf9c7d2b2 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Fri, 11 Apr 2025 17:59:50 + Subject: [PATCH 1/4] [NVPTX] Cleaup and document nvvm.fabs intrinsics, adding

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Alex MacLean via cfe-commits
@@ -1034,6 +1034,10 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, case NVPTX::BI__nvvm_fmin_xorsign_abs_f16x2: return MakeHalfType(Intrinsic::nvvm_fmin_xorsign_abs_f16x2, BuiltinID, E, *this); + case NVPTX::BI__nvvm_abs_bf16

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-12 Thread Alex MacLean via cfe-commits
@@ -137,6 +137,7 @@ def hasAtomBitwise64 : Predicate<"Subtarget->hasAtomBitwise64()">; def hasAtomMinMax64 : Predicate<"Subtarget->hasAtomMinMax64()">; def hasVote : Predicate<"Subtarget->hasVote()">; def hasDouble : Predicate<"Subtarget->hasDouble()">; +def hasClusters : Pred

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-12 Thread Alex MacLean via cfe-commits
@@ -2038,15 +2038,15 @@ multiclass F_ATOMIC_2_AS, preds>; defm _S : F_ATOMIC_2, preds>; - defm _DS : F_ATOMIC_2, !listconcat([hasSM<80>], preds)>; + defm _S_C : F_ATOMIC_2, !listconcat([hasSM<80>], preds)>; AlexMaclean wrote: The PTX doc seems to say this

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-12 Thread Alex MacLean via cfe-commits
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 :

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-12 Thread Alex MacLean via cfe-commits
@@ -0,0 +1,258 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -o - -mcpu=sm_90 -march=nvptx64 -mattr=+ptx80 | FileCheck %s +; RUN: %if ptxas-12.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| %pt

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-12 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean commented: Backend changes look reasonable so far. One concern I have with this change is that until now we've assumed specific address-spaces are non-overlapping. You've addressed some of the places where this assumption is encoded but I think there are others y

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-12 Thread Alex MacLean via cfe-commits
@@ -0,0 +1,258 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -o - -mcpu=sm_90 -march=nvptx64 -mattr=+ptx80 | FileCheck %s +; RUN: %if ptxas-12.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| %pt

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-12 Thread Alex MacLean via cfe-commits
@@ -0,0 +1,258 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -o - -mcpu=sm_90 -march=nvptx64 -mattr=+ptx80 | FileCheck %s +; RUN: %if ptxas-12.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| %pt

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-12 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-10 Thread Alex MacLean via cfe-commits
@@ -703,6 +703,46 @@ let hasSideEffects = false in { defm CVT_to_tf32_rz_satf : CVT_TO_TF32<"rz.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rn_relu_satf : CVT_TO_TF32<"rn.relu.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rz_relu_satf : CVT_TO_TF

[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-10 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/134345 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Alex MacLean via cfe-commits
AlexMaclean wrote: Merging on behalf of @YonahGoldberg at his request offline. https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-10 Thread Alex MacLean via cfe-commits
@@ -1548,6 +1548,45 @@ let TargetPrefix = "nvvm" in { Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>; def int_nvvm_e5m2x2_to_f16x2_rn_relu : ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">, Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoM

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean approved this pull request. LGTM, please wait for @Artem-B's approval before landing. https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin

[clang] [llvm] [Clang][NVVM] Support `-f[no-]cuda-prec-sqrt` and propagate precision flag to `NVVMReflect` (PR #134244)

2025-04-08 Thread Alex MacLean via cfe-commits
AlexMaclean wrote: It seems like we already have perhaps too many mechanisms to control how sqrt gets lowered. There is the `__nv_sqrtf` libdevice function which chooses between specific (1:1 to PTX) intrinsics based on NVVMReflect and then there is also `llvm.sqrt` and `nvvm.sqrt.f` which are

[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)

2025-04-08 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/134111 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)

2025-04-08 Thread Alex MacLean via cfe-commits
@@ -2314,6 +2317,12 @@ static Value *upgradeNVVMIntrinsicCall(StringRef Name, CallBase *CI, Value *Val = CI->getArgOperand(1); Rep = Builder.CreateAtomicRMW(AtomicRMWInst::FAdd, Ptr, Val, MaybeAlign(), AtomicOrdering::SequentiallyConsi

[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)

2025-04-08 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/134111 >From 46de785e801bf8ca87e01aee9ad0a13ac07a47d6 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Tue, 1 Apr 2025 20:22:24 + Subject: [PATCH] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 ---

[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-08 Thread Alex MacLean via cfe-commits
@@ -703,6 +703,53 @@ let hasSideEffects = false in { defm CVT_to_tf32_rz_satf : CVT_TO_TF32<"rz.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rn_relu_satf : CVT_TO_TF32<"rn.relu.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rz_relu_satf : CVT_TO_TF

[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-08 Thread Alex MacLean via cfe-commits
@@ -1944,6 +1944,62 @@ def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn Int16Regs:$a), def : Pat<(int_nvvm_e5m2x2_to_f16x2_rn_relu Int16Regs:$a), (CVT_f16x2_e5m2x2 $a, CvtRN_RELU)>; +def : Pat<(int_nvvm_ff_to_e2m3x2_rn f32:$a, f32:$b), + (CVT_e2m3x2_f32 $a, $b, CvtRN)

[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-08 Thread Alex MacLean via cfe-commits
@@ -1548,6 +1548,45 @@ let TargetPrefix = "nvvm" in { Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>; def int_nvvm_e5m2x2_to_f16x2_rn_relu : ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">, Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoM

[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-08 Thread Alex MacLean via cfe-commits
@@ -1548,6 +1548,45 @@ let TargetPrefix = "nvvm" in { Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoMem, IntrNoCallback]>; def int_nvvm_e5m2x2_to_f16x2_rn_relu : ClangBuiltin<"__nvvm_e5m2x2_to_f16x2_rn_relu">, Intrinsic<[llvm_v2f16_ty], [llvm_i16_ty], [IntrNoM

[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)

2025-04-03 Thread Alex MacLean via cfe-commits
@@ -2314,6 +2317,12 @@ static Value *upgradeNVVMIntrinsicCall(StringRef Name, CallBase *CI, Value *Val = CI->getArgOperand(1); Rep = Builder.CreateAtomicRMW(AtomicRMWInst::FAdd, Ptr, Val, MaybeAlign(), AtomicOrdering::SequentiallyConsi

[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)

2025-04-03 Thread Alex MacLean via cfe-commits
@@ -2070,8 +2070,8 @@ defm INT_PTX_ATOMIC_UMIN_32 : F_ATOMIC_2_AS]>; // atom_inc atom_dec AlexMaclean wrote: I think it makes sense to test the auto-upgrade rules and test the lowering of the current syntax but not to maintain lowering tests using out-of-dat

[clang] [llvm] [mlir] [NVPTX] Convert vector function nvvm.annotations to attributes (PR #127736)

2025-02-25 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/127736 >From b637f2a9142aa9493e78f8d6e05b692b7175c123 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Wed, 19 Feb 2025 02:26:23 + Subject: [PATCH 1/3] [NVPTX] Convert vector function nvvm.annotations to attr

[clang] [llvm] [mlir] [NVPTX] Convert vector function nvvm.annotations to attributes (PR #127736)

2025-02-25 Thread Alex MacLean via cfe-commits
@@ -5021,6 +5024,36 @@ bool llvm::UpgradeDebugInfo(Module &M) { return Modified; } +static void upgradeNVVMFnVectorAttr(const StringRef Attr, const char DimC, +GlobalValue *GV, const Metadata *V) { + Function *F = cast(GV); + + constexpr

[clang] [llvm] [mlir] [NVPTX] Convert vector function nvvm.annotations to attributes (PR #127736)

2025-02-25 Thread Alex MacLean via cfe-commits
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/127736 >From 5ca8b82e146439453b51f990e4ed43f8bd2838eb Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Wed, 19 Feb 2025 02:26:23 + Subject: [PATCH 1/3] [NVPTX] Convert vector function nvvm.annotations to attr

[clang] [llvm] [mlir] [NVPTX] Convert vector function nvvm.annotations to attributes (PR #127736)

2025-02-25 Thread Alex MacLean via cfe-commits
AlexMaclean wrote: > I think they will become something like: > > ```c++ > llvmFunc->addFnAttr("nvvm.maxntid", llvm::utostr(workgroupSize[0])); > llvmFunc->addFnAttr("nvvm.maxntid", llvm::utostr(workgroupSize[1])); > llvmFunc->addFnAttr("nvvm.maxntid", llvm::utostr(workgroupSize[2])); > ``` Not

[clang] [llvm] [mlir] [NVPTX] Convert vector function nvvm.annotations to attributes (PR #127736)

2025-02-25 Thread Alex MacLean via cfe-commits
AlexMaclean wrote: @hanhanW, @akuegel Heads up, if you're using any of these annotations, I expect you'll need to update your respective out-of-tree frontends once this change lands (similar to https://github.com/llvm/llvm-project/pull/119261). Here's an example of what that might look like:

[clang] [llvm] [mlir] [NVPTX] Convert vector function nvvm.annotations to attributes (PR #127736)

2025-02-19 Thread Alex MacLean via cfe-commits
@@ -196,6 +198,36 @@ static std::optional getFnAttrParsedInt(const Function &F, : std::nullopt; } +static SmallVector getFnAttrParsedVector(const Function &F, + StringRef Attr) { + SmallVector V; + auto &Ctx

  1   2   >