[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
AlexMaclean wrote: > This merge broke our builds on Halide. > > ``` > Unhandled exception: Error: Could not find PTX barrier intrinsic > (llvm.nvvm.barrier0) > ``` > > We have [an `.ll` > file](https://github.com/halide/Halide/blob/main/src/runtime/ptx_dev.ll) > declaring these intrinsics: > > ```llvm > declare void @llvm.nvvm.barrier0() > ``` > > So, it seems the auto-upgrade mechanism doesn't work. Any ideas? @mcourteaux it looks to me like the error you're seeing is coming from here: https://github.com/halide/Halide/blob/85a3b07fab4ce07a9747d645dd1274c3f1c29b44/src/CodeGen_PTX_Dev.cpp#L265-L267 I reverted this change, but I plan to reland soon. Once I do you'll need to update this code to reference the new intrinsic. https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
mcourteaux wrote: This merge broke our builds on Halide. ``` Unhandled exception: Error: Could not find PTX barrier intrinsic (llvm.nvvm.barrier0) ``` We have [an `.ll` file](https://github.com/halide/Halide/blob/main/src/runtime/ptx_dev.ll) declaring these intrinsics: ```ll declare void @llvm.nvvm.barrier0() ``` So, it seems the auto-upgrade mechanism doesn't work. Any ideas? https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
https://github.com/AlexMaclean closed https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
https://github.com/grypp approved this pull request. https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -462,24 +462,28 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">, // NVVM synchronization op definitions //===--===// -def NVVM_Barrier0Op : NVVM_IntrOp<"barrier0"> { +def NVVM_Barrier0Op : NVVM_Op<"barrier0"> { let assemblyFormat = "attr-dict"; + string llvmBuilder = [{ + createIntrinsicCall( + builder, llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all, + {builder.getInt32(0)}); + }]; durga4github wrote: Sure, Alex. I will take care of this in a separate change. https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -462,24 +462,28 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">, // NVVM synchronization op definitions //===--===// -def NVVM_Barrier0Op : NVVM_IntrOp<"barrier0"> { +def NVVM_Barrier0Op : NVVM_Op<"barrier0"> { let assemblyFormat = "attr-dict"; + string llvmBuilder = [{ + createIntrinsicCall( + builder, llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all, + {builder.getInt32(0)}); + }]; durga4github wrote: +1. https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -71,14 +71,6 @@ define float @nvvm_rcp(float %0) { ret float %2 } -; CHECK-LABEL: @llvm_nvvm_barrier0() -define void @llvm_nvvm_barrier0() { - ; CHECK: nvvm.barrier0 - call void @llvm.nvvm.barrier0() - ret void -} - AlexMaclean wrote: I've added this test back. https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -462,24 +462,28 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">, // NVVM synchronization op definitions //===--===// -def NVVM_Barrier0Op : NVVM_IntrOp<"barrier0"> { +def NVVM_Barrier0Op : NVVM_Op<"barrier0"> { let assemblyFormat = "attr-dict"; + string llvmBuilder = [{ + createIntrinsicCall( + builder, llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all, + {builder.getInt32(0)}); + }]; } def NVVM_BarrierOp : NVVM_Op<"barrier", [AttrSizedOperandSegments]> { let arguments = (ins Optional:$barrierId, Optional:$numberOfThreads); string llvmBuilder = [{ -if ($numberOfThreads && $barrierId) { - createIntrinsicCall(builder, llvm::Intrinsic::nvvm_barrier, -{$barrierId, $numberOfThreads}); -} else if($barrierId) { - createIntrinsicCall(builder, llvm::Intrinsic::nvvm_barrier_n, -{$barrierId}); -} else { - createIntrinsicCall(builder, llvm::Intrinsic::nvvm_barrier0); -} +auto id = $barrierId ? $barrierId : builder.getInt32(0); AlexMaclean wrote: Fixed https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -462,24 +462,28 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">, // NVVM synchronization op definitions //===--===// -def NVVM_Barrier0Op : NVVM_IntrOp<"barrier0"> { +def NVVM_Barrier0Op : NVVM_Op<"barrier0"> { AlexMaclean wrote: Yep. Otherwise, references to the no-longer present Intrinsic::barrier0 get generated by this record such as in `NVVMConvertibleLLVMIRIntrinsics.inc` https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -462,24 +462,28 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">, // NVVM synchronization op definitions //===--===// -def NVVM_Barrier0Op : NVVM_IntrOp<"barrier0"> { +def NVVM_Barrier0Op : NVVM_Op<"barrier0"> { let assemblyFormat = "attr-dict"; + string llvmBuilder = [{ + createIntrinsicCall( + builder, llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all, + {builder.getInt32(0)}); + }]; AlexMaclean wrote: I agree this op can be completely removed but doing so is not trivial. I'd prefer to leave this for a subsequent change. CC @durga4github https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -199,21 +199,58 @@ map in the following way to CUDA builtins: Barriers -'``llvm.nvvm.barrier0``' -^^^ +'``llvm.nvvm.barrier.cta.*``' +^ Syntax: """ .. code-block:: llvm - declare void @llvm.nvvm.barrier0() + declare void @llvm.nvvm.barrier.cta.sync(i32 %id, i32 %n) + declare void @llvm.nvvm.barrier.cta.sync.all(i32 %id) + declare void @llvm.nvvm.barrier.cta.arrive(i32 %id, i32 %n) + + declare void @llvm.nvvm.barrier.cta.sync.aligned(i32 %id, i32 %n) + declare void @llvm.nvvm.barrier.cta.sync.aligned.all(i32 %id) + declare void @llvm.nvvm.barrier.cta.arrive.aligned(i32 %id, i32 %n) Overview: " -The '``@llvm.nvvm.barrier0()``' intrinsic emits a PTX ``bar.sync 0`` -instruction, equivalent to the ``__syncthreads()`` call in CUDA. +The '``@llvm.nvvm.barrier.cta.*``' family of intrinsics perform barrier +synchronization and communication within a CTA. They can be used by the threads +within the CTA for synchronization and communication. + +Semantics: +"" + +Operand %id specifies a logical barrier resource and must fall within the range +0 through 15. When present, operand %n specifies the number of threads +participating in the barrier. When specifying a thread count, the value must be +a multiple of the warp size. With the '``@llvm.nvvm.barrier.cta.sync.*``' +variants, the '``.all``' suffix indicates that all threads in the CTA should +participate in the barrier and the %n operand is not present. + +All forms of the '``@llvm.nvvm.barrier.cta.*``' intrinsic cause the executing +thread to wait for all non-exited threads from its warp and then marks the +warp's arrival at the barrier. In addition to signaling its arrival at the +barrier, the '``@llvm.nvvm.barrier.cta.sync.*``' intrinsics cause the executing +thread to wait for non-exited threads of all other warps participating in the +barrier to arrive. On the other hand, the '``@llvm.nvvm.barrier.cta.arrive.*``' +intrinsic does not cause the executing thread to wait for threads of other +participating warps. + +When a barrier completes, the waiting threads are restarted without delay, +and the barrier is reinitialized so that it can be immediately reused. + +The '``@llvm.nvvm.barrier.cta.*``' intrinsic has an optional '``.aligned``' +modifier to indicate textual alignment of the barrier. When specified, it +indicates that all threads in the CTA will execute the same +'``@llvm.nvvm.barrier.cta.*``' instruction. In conditionally executed code, an +aligned '``@llvm.nvvm.barrier.cta.*``' instruction should only be used if it is +known that all threads in the CTA evaluate the condition identically, otherwise +behavior is undefined. AlexMaclean wrote: I think the PTX instruction used to lower the intrinsic is an implementation detail. It should not be necessary to link to it if the semantics of the intrinsic are fully defined. https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -240,6 +240,47 @@ def BF16RT : RegTyInfo; def F16X2RT : RegTyInfo; def BF16X2RT : RegTyInfo; +// This class provides a basic wrapper around an NVPTXInst that abstracts the +// specific syntax of most PTX instructions. It automatically handles the +// construction of the asm string based on the provided dag arguments. +// For example, the following asm-strings would be computed: +// +// * BasicFlagsNVPTXInst<(outs Int32Regs:$dst), +// (ins Int32Regs:$a, Int32Regs:$b), (ins), +// "add.s32">; +// ---> "add.s32 \t$dst, $a, $b;" +// +// * BasicFlagsNVPTXInst<(outs Int32Regs:$d), +// (ins Int32Regs:$a, Int32Regs:$b, Hexu32imm:$c), +// (ins PrmtMode:$mode), +// "prmt.b32${mode}">; +// ---> "prmt.b32${mode} \t$dst, $a, $b, $c;" AlexMaclean wrote: Fixed! https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/140615 >From babb28ef1c935f0d0cfb3b40f62be860be027010 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Thu, 15 May 2025 18:12:11 + Subject: [PATCH 1/5] [NVPTX] Unify and extend barrier{.cta} intrinsic support --- llvm/include/llvm/IR/IntrinsicsNVVM.td| 37 +++-- llvm/lib/IR/AutoUpgrade.cpp | 18 +++ llvm/lib/Target/NVPTX/NVPTXInstrInfo.td | 28 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | 71 .../Transforms/IPO/AttributorAttributes.cpp | 3 +- .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 22 +++ llvm/test/CodeGen/NVPTX/barrier.ll| 153 +++--- llvm/test/CodeGen/NVPTX/named-barriers.ll | 36 +++-- .../CodeGen/NVPTX/noduplicate-syncthreads.ll | 6 +- 9 files changed, 275 insertions(+), 99 deletions(-) diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td index a95c739f1331d..f648815b06ab8 100644 --- a/llvm/include/llvm/IR/IntrinsicsNVVM.td +++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td @@ -128,6 +128,12 @@ // * llvm.nvvm.swap.lo.hi.b64 --> llvm.fshl(x, x, 32) // * llvm.nvvm.atomic.load.inc.32 --> atomicrmw uinc_wrap // * llvm.nvvm.atomic.load.dec.32 --> atomicrmw udec_wrap +// * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) +// * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) +// * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) +// * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y) +// * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) +// * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y) def llvm_global_ptr_ty : LLVMQualPointerType<1>; // (global)ptr def llvm_shared_ptr_ty : LLVMQualPointerType<3>; // (shared)ptr @@ -1263,18 +1269,6 @@ let TargetPrefix = "nvvm" in { defm int_nvvm_atomic_cas_gen_i : PTXAtomicWithScope3; // Bar.Sync - - // The builtin for "bar.sync 0" is called __syncthreads. Unlike most of the - // intrinsics in this file, this one is a user-facing API. - def int_nvvm_barrier0 : ClangBuiltin<"__syncthreads">, - Intrinsic<[], [], [IntrConvergent, IntrNoCallback]>; - // Synchronize all threads in the CTA at barrier 'n'. - def int_nvvm_barrier_n : ClangBuiltin<"__nvvm_bar_n">, - Intrinsic<[], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; - // Synchronize 'm', a multiple of warp size, (arg 2) threads in - // the CTA at barrier 'n' (arg 1). - def int_nvvm_barrier : ClangBuiltin<"__nvvm_bar">, - Intrinsic<[], [llvm_i32_ty, llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; def int_nvvm_barrier0_popc : ClangBuiltin<"__nvvm_bar0_popc">, Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; def int_nvvm_barrier0_and : ClangBuiltin<"__nvvm_bar0_and">, @@ -1282,16 +1276,21 @@ let TargetPrefix = "nvvm" in { def int_nvvm_barrier0_or : ClangBuiltin<"__nvvm_bar0_or">, Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; - def int_nvvm_bar_sync : NVVMBuiltin, - Intrinsic<[], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; def int_nvvm_bar_warp_sync : NVVMBuiltin, Intrinsic<[], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; - // barrier.sync id[, cnt] - def int_nvvm_barrier_sync : NVVMBuiltin, - Intrinsic<[], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; - def int_nvvm_barrier_sync_cnt : NVVMBuiltin, - Intrinsic<[], [llvm_i32_ty, llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; + // barrier{.cta}.sync{.aligned} a{, b}; + // barrier{.cta}.arrive{.aligned}a, b; + let IntrProperties = [IntrConvergent, IntrNoCallback] in { +foreach align = ["", "_aligned"] in { + def int_nvvm_barrier_cta_sync # align # _all : + Intrinsic<[], [llvm_i32_ty]>; + def int_nvvm_barrier_cta_sync # align : + Intrinsic<[], [llvm_i32_ty, llvm_i32_ty]>; + def int_nvvm_barrier_cta_arrive # align : + Intrinsic<[], [llvm_i32_ty, llvm_i32_ty]>; +} + } // barrier.cluster.[wait, arrive, arrive.relaxed] def int_nvvm_barrier_cluster_arrive : diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp index 9091e7585f9d9..18f6f2bf9ed11 100644 --- a/llvm/lib/IR/AutoUpgrade.cpp +++ b/llvm/lib/IR/AutoUpgrade.cpp @@ -1349,6 +1349,10 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn, else if (Name == "clz.ll" || Name == "popc.ll" || Name == "h2f" || Name == "swap.lo.hi.b64") Expand = true; + else if (Name == "barrier0" || Name == "barrier.n" || + Name == "bar.sync" || Name == "barrier" || + Name == "barrier.sync" || Name == "barrier.sync.cnt") +Expand = true; else if (Name.consume_front("max.") || Name.consume
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
https://github.com/grypp edited https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -71,14 +71,6 @@ define float @nvvm_rcp(float %0) { ret float %2 } -; CHECK-LABEL: @llvm_nvvm_barrier0() -define void @llvm_nvvm_barrier0() { - ; CHECK: nvvm.barrier0 - call void @llvm.nvvm.barrier0() - ret void -} - grypp wrote: test removal here. is it accident? https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -462,24 +462,28 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">, // NVVM synchronization op definitions //===--===// -def NVVM_Barrier0Op : NVVM_IntrOp<"barrier0"> { +def NVVM_Barrier0Op : NVVM_Op<"barrier0"> { let assemblyFormat = "attr-dict"; + string llvmBuilder = [{ + createIntrinsicCall( + builder, llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all, + {builder.getInt32(0)}); + }]; } def NVVM_BarrierOp : NVVM_Op<"barrier", [AttrSizedOperandSegments]> { let arguments = (ins Optional:$barrierId, Optional:$numberOfThreads); string llvmBuilder = [{ -if ($numberOfThreads && $barrierId) { - createIntrinsicCall(builder, llvm::Intrinsic::nvvm_barrier, -{$barrierId, $numberOfThreads}); -} else if($barrierId) { - createIntrinsicCall(builder, llvm::Intrinsic::nvvm_barrier_n, -{$barrierId}); -} else { - createIntrinsicCall(builder, llvm::Intrinsic::nvvm_barrier0); -} +auto id = $barrierId ? $barrierId : builder.getInt32(0); grypp wrote: We don't use auto when the type isn't obvious. https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -462,24 +462,28 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">, // NVVM synchronization op definitions //===--===// -def NVVM_Barrier0Op : NVVM_IntrOp<"barrier0"> { +def NVVM_Barrier0Op : NVVM_Op<"barrier0"> { grypp wrote: do you need to change NVVM_IntrOp->NVVM_Op? https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -462,24 +462,28 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">, // NVVM synchronization op definitions //===--===// -def NVVM_Barrier0Op : NVVM_IntrOp<"barrier0"> { +def NVVM_Barrier0Op : NVVM_Op<"barrier0"> { let assemblyFormat = "attr-dict"; + string llvmBuilder = [{ + createIntrinsicCall( + builder, llvm::Intrinsic::nvvm_barrier_cta_sync_aligned_all, + {builder.getInt32(0)}); + }]; grypp wrote: you can remove this op completely actually. https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -199,21 +199,58 @@ map in the following way to CUDA builtins: Barriers -'``llvm.nvvm.barrier0``' -^^^ +'``llvm.nvvm.barrier.cta.*``' +^ Syntax: """ .. code-block:: llvm - declare void @llvm.nvvm.barrier0() + declare void @llvm.nvvm.barrier.cta.sync(i32 %id, i32 %n) + declare void @llvm.nvvm.barrier.cta.sync.all(i32 %id) + declare void @llvm.nvvm.barrier.cta.arrive(i32 %id, i32 %n) + + declare void @llvm.nvvm.barrier.cta.sync.aligned(i32 %id, i32 %n) + declare void @llvm.nvvm.barrier.cta.sync.aligned.all(i32 %id) + declare void @llvm.nvvm.barrier.cta.arrive.aligned(i32 %id, i32 %n) Overview: " -The '``@llvm.nvvm.barrier0()``' intrinsic emits a PTX ``bar.sync 0`` -instruction, equivalent to the ``__syncthreads()`` call in CUDA. +The '``@llvm.nvvm.barrier.cta.*``' family of intrinsics perform barrier +synchronization and communication within a CTA. They can be used by the threads +within the CTA for synchronization and communication. + +Semantics: +"" + +Operand %id specifies a logical barrier resource and must fall within the range +0 through 15. When present, operand %n specifies the number of threads +participating in the barrier. When specifying a thread count, the value must be +a multiple of the warp size. With the '``@llvm.nvvm.barrier.cta.sync.*``' +variants, the '``.all``' suffix indicates that all threads in the CTA should +participate in the barrier and the %n operand is not present. + +All forms of the '``@llvm.nvvm.barrier.cta.*``' intrinsic cause the executing +thread to wait for all non-exited threads from its warp and then marks the +warp's arrival at the barrier. In addition to signaling its arrival at the +barrier, the '``@llvm.nvvm.barrier.cta.sync.*``' intrinsics cause the executing +thread to wait for non-exited threads of all other warps participating in the +barrier to arrive. On the other hand, the '``@llvm.nvvm.barrier.cta.arrive.*``' +intrinsic does not cause the executing thread to wait for threads of other +participating warps. + +When a barrier completes, the waiting threads are restarted without delay, +and the barrier is reinitialized so that it can be immediately reused. + +The '``@llvm.nvvm.barrier.cta.*``' intrinsic has an optional '``.aligned``' +modifier to indicate textual alignment of the barrier. When specified, it +indicates that all threads in the CTA will execute the same +'``@llvm.nvvm.barrier.cta.*``' instruction. In conditionally executed code, an +aligned '``@llvm.nvvm.barrier.cta.*``' instruction should only be used if it is +known that all threads in the CTA evaluate the condition identically, otherwise +behavior is undefined. durga4github wrote: Shall we add a link to the PTX ISA here? https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -102,39 +93,51 @@ def INT_BARRIER0_OR : NVPTXInst<(outs Int32Regs:$dst), (ins Int32Regs:$pred), "}}"), [(set i32:$dst, (int_nvvm_barrier0_or i32:$pred))]>; -def INT_BAR_SYNC : NVPTXInst<(outs), (ins i32imm:$i), "bar.sync \t$i;", - [(int_nvvm_bar_sync imm:$i)]>; - def INT_BAR_WARP_SYNC_I : NVPTXInst<(outs), (ins i32imm:$i), "bar.warp.sync \t$i;", [(int_nvvm_bar_warp_sync imm:$i)]>, Requires<[hasPTX<60>, hasSM<30>]>; def INT_BAR_WARP_SYNC_R : NVPTXInst<(outs), (ins Int32Regs:$i), "bar.warp.sync \t$i;", [(int_nvvm_bar_warp_sync i32:$i)]>, Requires<[hasPTX<60>, hasSM<30>]>; -def INT_BARRIER_SYNC_I : NVPTXInst<(outs), (ins i32imm:$i), "barrier.sync \t$i;", - [(int_nvvm_barrier_sync imm:$i)]>, -Requires<[hasPTX<60>, hasSM<30>]>; -def INT_BARRIER_SYNC_R : NVPTXInst<(outs), (ins Int32Regs:$i), "barrier.sync \t$i;", - [(int_nvvm_barrier_sync i32:$i)]>, -Requires<[hasPTX<60>, hasSM<30>]>; +multiclass BARRIER1 requires = []> { + def _i : BasicNVPTXInst<(outs), (ins i32imm:$i), asmstr, + [(intrinsic imm:$i)]>, + Requires; -def INT_BARRIER_SYNC_CNT_RR : NVPTXInst<(outs), (ins Int32Regs:$id, Int32Regs:$cnt), - "barrier.sync \t$id, $cnt;", - [(int_nvvm_barrier_sync_cnt i32:$id, i32:$cnt)]>, -Requires<[hasPTX<60>, hasSM<30>]>; -def INT_BARRIER_SYNC_CNT_RI : NVPTXInst<(outs), (ins Int32Regs:$id, i32imm:$cnt), - "barrier.sync \t$id, $cnt;", - [(int_nvvm_barrier_sync_cnt i32:$id, imm:$cnt)]>, -Requires<[hasPTX<60>, hasSM<30>]>; -def INT_BARRIER_SYNC_CNT_IR : NVPTXInst<(outs), (ins i32imm:$id, Int32Regs:$cnt), - "barrier.sync \t$id, $cnt;", - [(int_nvvm_barrier_sync_cnt imm:$id, i32:$cnt)]>, -Requires<[hasPTX<60>, hasSM<30>]>; -def INT_BARRIER_SYNC_CNT_II : NVPTXInst<(outs), (ins i32imm:$id, i32imm:$cnt), - "barrier.sync \t$id, $cnt;", - [(int_nvvm_barrier_sync_cnt imm:$id, imm:$cnt)]>, -Requires<[hasPTX<60>, hasSM<30>]>; + def _r : BasicNVPTXInst<(outs), (ins Int32Regs:$i), asmstr, + [(intrinsic i32:$i)]>, + Requires; +} + +multiclass BARRIER2 requires = []> { + def _rr : BasicNVPTXInst<(outs), (ins Int32Regs:$i, Int32Regs:$j), asmstr, + [(intrinsic i32:$i, i32:$j)]>, +Requires; + + def _ri : BasicNVPTXInst<(outs), (ins Int32Regs:$i, i32imm:$j), asmstr, + [(intrinsic i32:$i, imm:$j)]>, +Requires; + + def _ir : BasicNVPTXInst<(outs), (ins i32imm:$i, Int32Regs:$j), asmstr, + [(intrinsic imm:$i, i32:$j)]>, +Requires; + + def _ii : BasicNVPTXInst<(outs), (ins i32imm:$i, i32imm:$j), asmstr, + [(intrinsic imm:$i, imm:$j)]>, +Requires; +} + +// Note the "bar.sync" variants could be renamed to the equivalent corresponding +// "barrier.*.aligned" variants. We use the older syntax for compatibility with +// older versions of the PTX ISA. durga4github wrote: Yes, and thanks for this note! https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
https://github.com/durga4github approved this pull request. https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -240,6 +240,47 @@ def BF16RT : RegTyInfo; def F16X2RT : RegTyInfo; def BF16X2RT : RegTyInfo; +// This class provides a basic wrapper around an NVPTXInst that abstracts the +// specific syntax of most PTX instructions. It automatically handles the +// construction of the asm string based on the provided dag arguments. +// For example, the following asm-strings would be computed: +// +// * BasicFlagsNVPTXInst<(outs Int32Regs:$dst), +// (ins Int32Regs:$a, Int32Regs:$b), (ins), +// "add.s32">; +// ---> "add.s32 \t$dst, $a, $b;" +// +// * BasicFlagsNVPTXInst<(outs Int32Regs:$d), +// (ins Int32Regs:$a, Int32Regs:$b, Hexu32imm:$c), +// (ins PrmtMode:$mode), +// "prmt.b32${mode}">; +// ---> "prmt.b32${mode} \t$dst, $a, $b, $c;" durga4github wrote: I think you meant `$d` here, for the output value https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -1349,6 +1349,10 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn, else if (Name == "clz.ll" || Name == "popc.ll" || Name == "h2f" || Name == "swap.lo.hi.b64") Expand = true; + else if (Name == "barrier0" || Name == "barrier.n" || Artem-B wrote: Ideally, all we need is a "has_known_prefix" function and a sorted array of known prefixes. This would give us both efficient search and an easily readable/maintainable list of intrinsics we care about. But given that it's nowhere near the hot path, we don't have to do it. The current version is fine. https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/140615 >From babb28ef1c935f0d0cfb3b40f62be860be027010 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Thu, 15 May 2025 18:12:11 + Subject: [PATCH 1/4] [NVPTX] Unify and extend barrier{.cta} intrinsic support --- llvm/include/llvm/IR/IntrinsicsNVVM.td| 37 +++-- llvm/lib/IR/AutoUpgrade.cpp | 18 +++ llvm/lib/Target/NVPTX/NVPTXInstrInfo.td | 28 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | 71 .../Transforms/IPO/AttributorAttributes.cpp | 3 +- .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 22 +++ llvm/test/CodeGen/NVPTX/barrier.ll| 153 +++--- llvm/test/CodeGen/NVPTX/named-barriers.ll | 36 +++-- .../CodeGen/NVPTX/noduplicate-syncthreads.ll | 6 +- 9 files changed, 275 insertions(+), 99 deletions(-) diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td index a95c739f1331d..f648815b06ab8 100644 --- a/llvm/include/llvm/IR/IntrinsicsNVVM.td +++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td @@ -128,6 +128,12 @@ // * llvm.nvvm.swap.lo.hi.b64 --> llvm.fshl(x, x, 32) // * llvm.nvvm.atomic.load.inc.32 --> atomicrmw uinc_wrap // * llvm.nvvm.atomic.load.dec.32 --> atomicrmw udec_wrap +// * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) +// * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) +// * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) +// * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y) +// * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) +// * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y) def llvm_global_ptr_ty : LLVMQualPointerType<1>; // (global)ptr def llvm_shared_ptr_ty : LLVMQualPointerType<3>; // (shared)ptr @@ -1263,18 +1269,6 @@ let TargetPrefix = "nvvm" in { defm int_nvvm_atomic_cas_gen_i : PTXAtomicWithScope3; // Bar.Sync - - // The builtin for "bar.sync 0" is called __syncthreads. Unlike most of the - // intrinsics in this file, this one is a user-facing API. - def int_nvvm_barrier0 : ClangBuiltin<"__syncthreads">, - Intrinsic<[], [], [IntrConvergent, IntrNoCallback]>; - // Synchronize all threads in the CTA at barrier 'n'. - def int_nvvm_barrier_n : ClangBuiltin<"__nvvm_bar_n">, - Intrinsic<[], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; - // Synchronize 'm', a multiple of warp size, (arg 2) threads in - // the CTA at barrier 'n' (arg 1). - def int_nvvm_barrier : ClangBuiltin<"__nvvm_bar">, - Intrinsic<[], [llvm_i32_ty, llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; def int_nvvm_barrier0_popc : ClangBuiltin<"__nvvm_bar0_popc">, Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; def int_nvvm_barrier0_and : ClangBuiltin<"__nvvm_bar0_and">, @@ -1282,16 +1276,21 @@ let TargetPrefix = "nvvm" in { def int_nvvm_barrier0_or : ClangBuiltin<"__nvvm_bar0_or">, Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; - def int_nvvm_bar_sync : NVVMBuiltin, - Intrinsic<[], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; def int_nvvm_bar_warp_sync : NVVMBuiltin, Intrinsic<[], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; - // barrier.sync id[, cnt] - def int_nvvm_barrier_sync : NVVMBuiltin, - Intrinsic<[], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; - def int_nvvm_barrier_sync_cnt : NVVMBuiltin, - Intrinsic<[], [llvm_i32_ty, llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; + // barrier{.cta}.sync{.aligned} a{, b}; + // barrier{.cta}.arrive{.aligned}a, b; + let IntrProperties = [IntrConvergent, IntrNoCallback] in { +foreach align = ["", "_aligned"] in { + def int_nvvm_barrier_cta_sync # align # _all : + Intrinsic<[], [llvm_i32_ty]>; + def int_nvvm_barrier_cta_sync # align : + Intrinsic<[], [llvm_i32_ty, llvm_i32_ty]>; + def int_nvvm_barrier_cta_arrive # align : + Intrinsic<[], [llvm_i32_ty, llvm_i32_ty]>; +} + } // barrier.cluster.[wait, arrive, arrive.relaxed] def int_nvvm_barrier_cluster_arrive : diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp index 9091e7585f9d9..18f6f2bf9ed11 100644 --- a/llvm/lib/IR/AutoUpgrade.cpp +++ b/llvm/lib/IR/AutoUpgrade.cpp @@ -1349,6 +1349,10 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn, else if (Name == "clz.ll" || Name == "popc.ll" || Name == "h2f" || Name == "swap.lo.hi.b64") Expand = true; + else if (Name == "barrier0" || Name == "barrier.n" || + Name == "bar.sync" || Name == "barrier" || + Name == "barrier.sync" || Name == "barrier.sync.cnt") +Expand = true; else if (Name.consume_front("max.") || Name.consume
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -6,13 +7,15 @@ ; Use bar.sync to arrive at a pre-computed barrier number and ; wait for all threads in CTA to also arrive: define ptx_device void @test_barrier_named_cta() { -; CHECK: mov.b32 %r[[REG0:[0-9]+]], 0; -; CHECK: bar.sync %r[[REG0]]; -; CHECK: mov.b32 %r[[REG1:[0-9]+]], 10; -; CHECK: bar.sync %r[[REG1]]; -; CHECK: mov.b32 %r[[REG2:[0-9]+]], 15; -; CHECK: bar.sync %r[[REG2]]; -; CHECK: ret; +; CHECK-LABEL: test_barrier_named_cta( +; CHECK: { +; CHECK-EMPTY: +; CHECK-EMPTY: +; CHECK-NEXT: // %bb.0: +; CHECK-NEXT:bar.sync 0; +; CHECK-NEXT:bar.sync 10; +; CHECK-NEXT:bar.sync 15; AlexMaclean wrote: I just went ahead and removed this test completely. Both register and immediate cases are well tested in barriers.ll and we also test that these legacy intrinsics are auto-upgraded correctly, so keeping this test around at all seems confusing and pointless. https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -240,6 +240,34 @@ def BF16RT : RegTyInfo; def F16X2RT : RegTyInfo; def BF16X2RT : RegTyInfo; +// This class provides a basic wrapper around an NVPTXInst that abstracts the +// specific syntax of most PTX instructions. It automatically handles the +// construction of the asm string based on the provided dag arguments. AlexMaclean wrote: Added a couple examples https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
@@ -1349,6 +1349,10 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn, else if (Name == "clz.ll" || Name == "popc.ll" || Name == "h2f" || Name == "swap.lo.hi.b64") Expand = true; + else if (Name == "barrier0" || Name == "barrier.n" || AlexMaclean wrote: I'm moved these cases as well as the others without a consume_front into a StringSwitch in the else branch. I agree we could go further but looks like we used to have what you described and switched to this style in https://github.com/llvm/llvm-project/commit/b045c36ab92f4ff478dab678aaff4680fbccc7ea so maybe there are arguments for both? https://github.com/llvm/llvm-project/pull/140615 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Unify and extend barrier{.cta} intrinsic support (PR #140615)
https://github.com/AlexMaclean updated https://github.com/llvm/llvm-project/pull/140615 >From babb28ef1c935f0d0cfb3b40f62be860be027010 Mon Sep 17 00:00:00 2001 From: Alex Maclean Date: Thu, 15 May 2025 18:12:11 + Subject: [PATCH 1/3] [NVPTX] Unify and extend barrier{.cta} intrinsic support --- llvm/include/llvm/IR/IntrinsicsNVVM.td| 37 +++-- llvm/lib/IR/AutoUpgrade.cpp | 18 +++ llvm/lib/Target/NVPTX/NVPTXInstrInfo.td | 28 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | 71 .../Transforms/IPO/AttributorAttributes.cpp | 3 +- .../Assembler/auto_upgrade_nvvm_intrinsics.ll | 22 +++ llvm/test/CodeGen/NVPTX/barrier.ll| 153 +++--- llvm/test/CodeGen/NVPTX/named-barriers.ll | 36 +++-- .../CodeGen/NVPTX/noduplicate-syncthreads.ll | 6 +- 9 files changed, 275 insertions(+), 99 deletions(-) diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td index a95c739f1331d..f648815b06ab8 100644 --- a/llvm/include/llvm/IR/IntrinsicsNVVM.td +++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td @@ -128,6 +128,12 @@ // * llvm.nvvm.swap.lo.hi.b64 --> llvm.fshl(x, x, 32) // * llvm.nvvm.atomic.load.inc.32 --> atomicrmw uinc_wrap // * llvm.nvvm.atomic.load.dec.32 --> atomicrmw udec_wrap +// * llvm.nvvm.barrier0 --> llvm.nvvm.barrier.cta.sync.aligned.all(0) +// * llvm.nvvm.barrier.n --> llvm.nvvm.barrier.cta.sync.aligned.all(x) +// * llvm.nvvm.bar.sync --> llvm.nvvm.barrier.cta.sync.aligned.all(x) +// * llvm.nvvm.barrier --> llvm.nvvm.barrier.cta.sync.aligned(x, y) +// * llvm.nvvm.barrier.sync --> llvm.nvvm.barrier.cta.sync.all(x) +// * llvm.nvvm.barrier.sync.cnt --> llvm.nvvm.barrier.cta.sync(x, y) def llvm_global_ptr_ty : LLVMQualPointerType<1>; // (global)ptr def llvm_shared_ptr_ty : LLVMQualPointerType<3>; // (shared)ptr @@ -1263,18 +1269,6 @@ let TargetPrefix = "nvvm" in { defm int_nvvm_atomic_cas_gen_i : PTXAtomicWithScope3; // Bar.Sync - - // The builtin for "bar.sync 0" is called __syncthreads. Unlike most of the - // intrinsics in this file, this one is a user-facing API. - def int_nvvm_barrier0 : ClangBuiltin<"__syncthreads">, - Intrinsic<[], [], [IntrConvergent, IntrNoCallback]>; - // Synchronize all threads in the CTA at barrier 'n'. - def int_nvvm_barrier_n : ClangBuiltin<"__nvvm_bar_n">, - Intrinsic<[], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; - // Synchronize 'm', a multiple of warp size, (arg 2) threads in - // the CTA at barrier 'n' (arg 1). - def int_nvvm_barrier : ClangBuiltin<"__nvvm_bar">, - Intrinsic<[], [llvm_i32_ty, llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; def int_nvvm_barrier0_popc : ClangBuiltin<"__nvvm_bar0_popc">, Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; def int_nvvm_barrier0_and : ClangBuiltin<"__nvvm_bar0_and">, @@ -1282,16 +1276,21 @@ let TargetPrefix = "nvvm" in { def int_nvvm_barrier0_or : ClangBuiltin<"__nvvm_bar0_or">, Intrinsic<[llvm_i32_ty], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; - def int_nvvm_bar_sync : NVVMBuiltin, - Intrinsic<[], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; def int_nvvm_bar_warp_sync : NVVMBuiltin, Intrinsic<[], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; - // barrier.sync id[, cnt] - def int_nvvm_barrier_sync : NVVMBuiltin, - Intrinsic<[], [llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; - def int_nvvm_barrier_sync_cnt : NVVMBuiltin, - Intrinsic<[], [llvm_i32_ty, llvm_i32_ty], [IntrConvergent, IntrNoCallback]>; + // barrier{.cta}.sync{.aligned} a{, b}; + // barrier{.cta}.arrive{.aligned}a, b; + let IntrProperties = [IntrConvergent, IntrNoCallback] in { +foreach align = ["", "_aligned"] in { + def int_nvvm_barrier_cta_sync # align # _all : + Intrinsic<[], [llvm_i32_ty]>; + def int_nvvm_barrier_cta_sync # align : + Intrinsic<[], [llvm_i32_ty, llvm_i32_ty]>; + def int_nvvm_barrier_cta_arrive # align : + Intrinsic<[], [llvm_i32_ty, llvm_i32_ty]>; +} + } // barrier.cluster.[wait, arrive, arrive.relaxed] def int_nvvm_barrier_cluster_arrive : diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp index 9091e7585f9d9..18f6f2bf9ed11 100644 --- a/llvm/lib/IR/AutoUpgrade.cpp +++ b/llvm/lib/IR/AutoUpgrade.cpp @@ -1349,6 +1349,10 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn, else if (Name == "clz.ll" || Name == "popc.ll" || Name == "h2f" || Name == "swap.lo.hi.b64") Expand = true; + else if (Name == "barrier0" || Name == "barrier.n" || + Name == "bar.sync" || Name == "barrier" || + Name == "barrier.sync" || Name == "barrier.sync.cnt") +Expand = true; else if (Name.consume_front("max.") || Name.consume