[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/joker-eph approved this pull request. MLIR side LGTM https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/AlexMaclean approved this pull request. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/AlexMaclean commented: llvm changes LGTM, though I'm not too familiar with the MLIR portion of this change. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -2381,25 +2387,38 @@ def INT_PTX_LDG_G_v4i32_ELE : VLDG_G_ELE_V4<"u32", Int32Regs>; def INT_PTX_LDG_G_v4f32_ELE : VLDG_G_ELE_V4<"f32", Float32Regs>; -multiclass NG_TO_G { - def "" : NVPTXInst<(outs Int32Regs:$result), (ins Int32Regs:$src), - "cvta." # Str # ".u32 \t$result, $src;", []>; - def _64 : NVPTXInst<(outs Int64Regs:$result), (ins Int64Regs:$src), - "cvta." # Str # ".u64 \t$result, $src;", []>; +multiclass NG_TO_G Preds = []> { + foreach bitwidth = !if(Supports32, ["32", "64"], ["64"]) in { +if !eq(bitwidth, "32") then { + def "" : NVPTXInst<(outs Int32Regs:$result), (ins Int32Regs:$src), + "cvta." # Str # ".u32 \t$result, $src;", []>, Requires; +} else if !eq(bitwidth, "64") then { + def _64 : NVPTXInst<(outs Int64Regs:$result), (ins Int64Regs:$src), +"cvta." # Str # ".u64 \t$result, $src;", []>, Requires; +} + } } -multiclass G_TO_NG { - def "" : NVPTXInst<(outs Int32Regs:$result), (ins Int32Regs:$src), - "cvta.to." # Str # ".u32 \t$result, $src;", []>; - def _64 : NVPTXInst<(outs Int64Regs:$result), (ins Int64Regs:$src), - "cvta.to." # Str # ".u64 \t$result, $src;", []>; +multiclass G_TO_NG Preds = []> { + foreach bitwidth = !if(Supports32, ["32", "64"], ["64"]) in { AlexMaclean wrote: I don't think this foreach loop is really accomplishing much. Can it just be removed? https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -280,8 +282,10 @@ NVPTXTargetMachine::getPredicatedAddrSpace(const Value *V) const { case Intrinsic::nvvm_isspacep_local: return std::make_pair(II->getArgOperand(0), llvm::ADDRESS_SPACE_LOCAL); case Intrinsic::nvvm_isspacep_shared: -case Intrinsic::nvvm_isspacep_shared_cluster: return std::make_pair(II->getArgOperand(0), llvm::ADDRESS_SPACE_SHARED); +case Intrinsic::nvvm_isspacep_shared_cluster: modiking wrote: Technically this is conservative since `nvvm_isspacep_shared_cluster == true` implies that the pointer is also under `ADDRESS_SPACE_SHARED`. However the current interface only allows returning one address so that'll need to get updated. I'm planning on improving InferAddrSpaces to also detect from mapa/mapa_shared_cluster so I'll update this part as well during that. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/modiking edited https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -426,10 +426,7 @@ static std::optional evaluateIsSpace(Intrinsic::ID IID, unsigned AS) { case Intrinsic::nvvm_isspacep_shared: return AS == NVPTXAS::ADDRESS_SPACE_SHARED; modiking wrote: Right, if it's `ADDRESS_SPACE_SHARED_CLUSTER` we can't perform any folding. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -176,6 +176,7 @@ enum AddressSpace : AddressSpaceUnderlyingType { Shared = 3, modiking wrote: Will keep it `shared` in this change. We can visit renaming/aliasing later if we want. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -25,6 +25,7 @@ enum AddressSpace : unsigned { ADDRESS_SPACE_CONST = 4, ADDRESS_SPACE_LOCAL = 5, ADDRESS_SPACE_TENSOR = 6, + ADDRESS_SPACE_SHARED_CLUSTER = 7, modiking wrote: Will keep it `shared` in this change. We can visit renaming/aliasing later if we want. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -426,10 +426,7 @@ static std::optional evaluateIsSpace(Intrinsic::ID IID, unsigned AS) { case Intrinsic::nvvm_isspacep_shared: return AS == NVPTXAS::ADDRESS_SPACE_SHARED; case Intrinsic::nvvm_isspacep_shared_cluster: -// We can't tell shared from shared_cluster at compile time from AS alone, -// but it can't be either is AS is not shared. -return AS == NVPTXAS::ADDRESS_SPACE_SHARED ? std::nullopt - : std::optional{false}; +return AS == NVPTXAS::ADDRESS_SPACE_SHARED_CLUSTER; modiking wrote: Good call, fixed. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -25,6 +25,7 @@ enum AddressSpace : unsigned { ADDRESS_SPACE_CONST = 4, ADDRESS_SPACE_LOCAL = 5, ADDRESS_SPACE_TENSOR = 6, + ADDRESS_SPACE_SHARED_CLUSTER = 7, Artem-B wrote: PTX docs say: ``` If no sub-qualifier is specified with the .shared state space, then it defaults to ::cta. ``` Ideally, it's the most generic variant that should've had the most generic name of `shared`, and the specific subsets would have an additional suffix, but `shared` already has 10+ years of being in use. For pretty much all existing users SHARED already means a per-CTA variant, and I think we should keep it as is. Changing its name to something else will not really buy us anything other than churn, IMO. We may consider adding SHARED_CTA as an alias to SHARED. I can see how in some new contexts it may be useful to highlight the intended distinction. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -3019,8 +3019,26 @@ SDValue NVPTXTargetLowering::LowerADDRSPACECAST(SDValue Op, unsigned SrcAS = N->getSrcAddressSpace(); unsigned DestAS = N->getDestAddressSpace(); if (SrcAS != llvm::ADDRESS_SPACE_GENERIC && - DestAS != llvm::ADDRESS_SPACE_GENERIC) + DestAS != llvm::ADDRESS_SPACE_GENERIC) { +// Shared and SharedCluster can be converted to each other through generic +// space +if ((SrcAS == llvm::ADDRESS_SPACE_SHARED && + DestAS == llvm::ADDRESS_SPACE_SHARED_CLUSTER) || +(SrcAS == llvm::ADDRESS_SPACE_SHARED_CLUSTER && + DestAS == llvm::ADDRESS_SPACE_SHARED)) { + const MVT GenerictVT = + getPointerTy(DAG.getDataLayout(), ADDRESS_SPACE_GENERIC); + const MVT OutputVT = getPointerTy(DAG.getDataLayout(), DestAS); + SDValue GenericConversion = DAG.getAddrSpaceCast( + SDLoc(), GenerictVT, Op.getOperand(0), SrcAS, ADDRESS_SPACE_GENERIC); AlexMaclean wrote: Use the DL from the original value https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -3019,8 +3019,26 @@ SDValue NVPTXTargetLowering::LowerADDRSPACECAST(SDValue Op, unsigned SrcAS = N->getSrcAddressSpace(); unsigned DestAS = N->getDestAddressSpace(); if (SrcAS != llvm::ADDRESS_SPACE_GENERIC && - DestAS != llvm::ADDRESS_SPACE_GENERIC) + DestAS != llvm::ADDRESS_SPACE_GENERIC) { +// Shared and SharedCluster can be converted to each other through generic +// space +if ((SrcAS == llvm::ADDRESS_SPACE_SHARED && + DestAS == llvm::ADDRESS_SPACE_SHARED_CLUSTER) || +(SrcAS == llvm::ADDRESS_SPACE_SHARED_CLUSTER && + DestAS == llvm::ADDRESS_SPACE_SHARED)) { + const MVT GenerictVT = + getPointerTy(DAG.getDataLayout(), ADDRESS_SPACE_GENERIC); + const MVT OutputVT = getPointerTy(DAG.getDataLayout(), DestAS); AlexMaclean wrote: Just use `Op.getValueType()` https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -25,6 +25,7 @@ enum AddressSpace : unsigned { ADDRESS_SPACE_CONST = 4, ADDRESS_SPACE_LOCAL = 5, ADDRESS_SPACE_TENSOR = 6, + ADDRESS_SPACE_SHARED_CLUSTER = 7, AlexMaclean wrote: I think it would be good to rename `ADDRESS_SPACE_SHARED` to `ADDRESS_SPACE_SHARED_CTA`. This would be a bit more consistent with PTX and I think clearer as well. With the current names it is easy to assume that SHARED_CLUSTER is a subspace of SHARED when in fact the opposite is true. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -426,10 +426,7 @@ static std::optional evaluateIsSpace(Intrinsic::ID IID, unsigned AS) { case Intrinsic::nvvm_isspacep_shared: return AS == NVPTXAS::ADDRESS_SPACE_SHARED; AlexMaclean wrote: If the address space is `ADDRESS_SPACE_SHARED_CLUSTER` this intrinsic may or may not return true. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -176,6 +176,7 @@ enum AddressSpace : AddressSpaceUnderlyingType { Shared = 3, AlexMaclean wrote: Lets rename this to `SharedCTA` as well. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -426,10 +426,7 @@ static std::optional evaluateIsSpace(Intrinsic::ID IID, unsigned AS) { case Intrinsic::nvvm_isspacep_shared: return AS == NVPTXAS::ADDRESS_SPACE_SHARED; case Intrinsic::nvvm_isspacep_shared_cluster: -// We can't tell shared from shared_cluster at compile time from AS alone, -// but it can't be either is AS is not shared. -return AS == NVPTXAS::ADDRESS_SPACE_SHARED ? std::nullopt - : std::optional{false}; +return AS == NVPTXAS::ADDRESS_SPACE_SHARED_CLUSTER; AlexMaclean wrote: If the address space is `NVPTXAS::ADDRESS_SPACE_SHARED` won't this also return true? https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -2381,29 +2387,41 @@ def INT_PTX_LDG_G_v4i32_ELE : VLDG_G_ELE_V4<"u32", Int32Regs>; def INT_PTX_LDG_G_v4f32_ELE : VLDG_G_ELE_V4<"f32", Float32Regs>; -multiclass NG_TO_G { +multiclass NG_TO_G Preds = []> { def "" : NVPTXInst<(outs Int32Regs:$result), (ins Int32Regs:$src), - "cvta." # Str # ".u32 \t$result, $src;", []>; + "cvta." # Str # ".u32 \t$result, $src;", []>, Requires; + def _64 : NVPTXInst<(outs Int64Regs:$result), (ins Int64Regs:$src), + "cvta." # Str # ".u64 \t$result, $src;", []>, Requires; +} + +multiclass NG_TO_G_64 Preds = []> { def _64 : NVPTXInst<(outs Int64Regs:$result), (ins Int64Regs:$src), - "cvta." # Str # ".u64 \t$result, $src;", []>; + "cvta." # Str # ".u64 \t$result, $src;", []>, Requires; } AlexMaclean wrote: I think it would be cleaner to just add a bit to the `NG_TO_G` class for `supports_32` https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -0,0 +1,329 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -o - -mcpu=sm_90 -mattr=+ptx78 | FileCheck %s +; RUN: %if ptxas-12.0 %{ llc < %s -mcpu=sm_90 -mattr=+ptx78| %ptxas-verify -arch=sm_90 %} + +target triple = "nvptx64-nvidia-cuda" + +@llvm.used = appending global [5 x ptr] [ + ptr @test_distributed_shared_cluster_common, AlexMaclean wrote: our lit tests generally don't use `@llvm.used`, can you remove this? https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -3019,8 +3019,42 @@ SDValue NVPTXTargetLowering::LowerADDRSPACECAST(SDValue Op, unsigned SrcAS = N->getSrcAddressSpace(); unsigned DestAS = N->getDestAddressSpace(); if (SrcAS != llvm::ADDRESS_SPACE_GENERIC && - DestAS != llvm::ADDRESS_SPACE_GENERIC) + DestAS != llvm::ADDRESS_SPACE_GENERIC) { +// Shared and SharedCluster can be converted to each other through generic +// space +if (SrcAS == llvm::ADDRESS_SPACE_SHARED && AlexMaclean wrote: This `if` and the one below look essentially duplicated. Can you fold them together? https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/AlexMaclean commented: Getting close to ready, a couple more places to update: - NVPTXTargetTransformInfo.cpp: evaluateIsSpace - NVPTXUsage.rst: Address Space section, add intrinsics you're modifying, such as `mapa`, to the spec https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -0,0 +1,48 @@ +; RUN: llc -O0 < %s -mtriple=nvptx64 -mcpu=sm_80 | FileCheck %s -check-prefixes=ALL,NOPTRCONV,CLS64 AlexMaclean wrote: Use update_llc_test_checks for this test. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 : NVPTX::cvta_dshared; +case ADDRESS_SPACE_SHARED_CLUSTER: + Opc = TM.is64Bit() ? NVPTX::cvta_shared_cluster_64 + : NVPTX::cvta_shared_cluster; gonzalobg wrote: If a program/user requests 32-bit ABI and sm_90, that's an error. I'd expect this to be reported either by the frontend (e.g. clang) or when parsing the llvm module target data. At that point, NVPTX can just assume that this never happens and we can hard fail here to be helpful to other compiler developers, since they are the only ones that should be seeing this failure. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
modiking wrote: Made the requested changes, @AlexMaclean/@Artem-B PTAL when you get the chance--thanks! https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
github-actions[bot] wrote: :warning: undef deprecator found issues in your code. :warning: You can test this locally with the following command: ``bash git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 'HEAD~1' HEAD llvm/test/CodeGen/NVPTX/addrspacecast-ptx64.ll llvm/test/CodeGen/NVPTX/distributed-shared-cluster.ll clang/lib/Basic/Targets/NVPTX.cpp clang/test/CodeGen/target-data.c llvm/include/llvm/Support/NVPTXAddrSpace.h llvm/lib/IR/AutoUpgrade.cpp llvm/lib/Target/NVPTX/MCTargetDesc/NVPTXInstPrinter.cpp llvm/lib/Target/NVPTX/NVPTX.h llvm/lib/Target/NVPTX/NVPTXAliasAnalysis.cpp llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp llvm/lib/Target/NVPTX/NVPTXUtilities.h llvm/test/Assembler/auto_upgrade_nvvm_intrinsics.ll llvm/test/CodeGen/NVPTX/cp-async-bulk-tensor-g2s.ll llvm/test/CodeGen/NVPTX/cp-async-bulk.ll llvm/test/CodeGen/NVPTX/nvptx-aa.ll mlir/include/mlir/Dialect/LLVMIR/NVVMDialect.h `` The following files introduce new uses of undef: - llvm/test/Assembler/auto_upgrade_nvvm_intrinsics.ll [Undef](https://llvm.org/docs/LangRef.html#undefined-values) is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields `undef`. You should use `poison` values for placeholders instead. In tests, avoid using `undef` and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead. For example, this is considered a bad practice: ```llvm define void @fn() { ... br i1 undef, ... } ``` Please use the following instead: ```llvm define void @fn(i1 %cond) { ... br i1 %cond, ... } ``` Please refer to the [Undefined Behavior Manual](https://llvm.org/docs/UndefinedBehavior.html) for more information. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 : NVPTX::cvta_dshared; +case ADDRESS_SPACE_SHARED_CLUSTER: + Opc = TM.is64Bit() ? NVPTX::cvta_shared_cluster_64 + : NVPTX::cvta_shared_cluster; Artem-B wrote: > I think throwing the error in ISel is a good place to fail. It's a convenient place for LLVM developer. It's problematic for the end user. In general, if we're diagnosing a user error, it must be in a way that's actionable for the *user*, not for the compiler developer. Diagnostic by compiler crash is not the best UI. Granted, producing an invalid instruction and relying on ptxas to diagnose it is only marginally better. I think the missing bit of the puzzle here is that we still support 32-bit compilation on sm_90+. If we make it impossible to do it in principle, then it makes the whole point moot, and we no longer have to bother with 32-bit generic pointers for the instructions that are available on newer GPUs only. This is something to be addressed separately. For this patch, I'm fine with proceeding with the assumption that 32-bit compilation never happens. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 : NVPTX::cvta_dshared; +case ADDRESS_SPACE_SHARED_CLUSTER: + Opc = TM.is64Bit() ? NVPTX::cvta_shared_cluster_64 + : NVPTX::cvta_shared_cluster; modiking wrote: > I think the missing bit of the puzzle here is that we still support 32-bit > compilation on sm_90+ Yeah that's true, having that be enforced early would resolve this problem. I like that as an overall solution to match that in reality there is only 64-bit mode available on sm_90+. Also agreed that can be addressed separately. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
modiking wrote: > A general thought, Can we include the base changes in this PR and create a > separate PR for the intrinsics-migration+MLIR changes? That makes sense, will do once we're happy with all the changes. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 : NVPTX::cvta_dshared; +case ADDRESS_SPACE_SHARED_CLUSTER: + Opc = TM.is64Bit() ? NVPTX::cvta_shared_cluster_64 + : NVPTX::cvta_shared_cluster; modiking wrote: > The check here is for TM.is64Bit() not whether shared memory is 64 or 32 > bits. Ah I see. Agreed then that 32-bit `cvta.*shared::cluster` shouldn't exist as a valid construct during ISel. I think throwing the error in ISel is a good place to fail. ptxas will complain if we give it a 32-bit target machine on sm90+: ``` ptxas fatal : 32-Bit ABI (--machine 32 or 32-Bit addressing) is not supported on sm_90 or higher architectures. ``` But given that a 32-bit version doesn't exist we shouldn't generate one. We should also update the documentation that `cvta.*shared::cluster` is 64 bit only as well: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cvta https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -0,0 +1,258 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -o - -mcpu=sm_90 -march=nvptx64 -mattr=+ptx80 | FileCheck %s +; RUN: %if ptxas-12.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| %ptxas-verify -arch=sm_90 %} + +target triple = "nvptx64-nvidia-cuda" + +@llvm.used = appending global [1 x ptr] [ptr @test_distributed_shared_cluster], section "llvm.metadata" + +declare ptr addrspace(7) @llvm.nvvm.mapa.shared.cluster(ptr addrspace(3), i32) +declare i1 @llvm.nvvm.isspacep.shared.cluster(ptr) +declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() +declare ptr @llvm.nvvm.mapa(ptr, i32) + +define i32 @test_distributed_shared_cluster(ptr %ptr, ptr addrspace(3) %smem_ptr) local_unnamed_addr { +; CHECK-LABEL: test_distributed_shared_cluster( +; CHECK: { +; CHECK-NEXT:.reg .pred %p<13>; +; CHECK-NEXT:.reg .b16 %rs<5>; +; CHECK-NEXT:.reg .b32 %r<69>; +; CHECK-NEXT:.reg .f32 %f<2>; +; CHECK-NEXT:.reg .b64 %rd<24>; +; CHECK-NEXT:.reg .f64 %fd<2>; +; CHECK-EMPTY: +; CHECK-NEXT: // %bb.0: // %entry +; CHECK-NEXT:ld.param.u64 %rd2, [test_distributed_shared_cluster_param_0]; +; CHECK-NEXT:ld.param.u64 %rd3, [test_distributed_shared_cluster_param_1]; +; CHECK-NEXT:mov.u32 %r24, %ctaid.x; +; CHECK-NEXT:xor.b32 %r25, %r24, 1; +; CHECK-NEXT:isspacep.shared::cluster %p1, %rd2; +; CHECK-NEXT:mapa.u64 %rd4, %rd2, %r25; +; CHECK-NEXT:isspacep.shared::cluster %p2, %rd4; +; CHECK-NEXT:mapa.shared::cluster.u64 %rd5, %rd3, %r25; +; CHECK-NEXT:mov.b16 %rs1, 0x3C00; +; CHECK-NEXT:atom.shared::cluster.add.noftz.f16 %rs2, [%rd5], %rs1; +; CHECK-NEXT:mov.b16 %rs3, 0x3F80; +; CHECK-NEXT:atom.shared::cluster.add.noftz.bf16 %rs4, [%rd5], %rs3; +; CHECK-NEXT:atom.shared::cluster.add.f32 %f1, [%rd5], 0f3F80; +; CHECK-NEXT:atom.shared::cluster.add.f64 %fd1, [%rd5], 0d3FF0; +; CHECK-NEXT:atom.shared::cluster.add.u32 %r26, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.add.u64 %rd6, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.exch.b32 %r27, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.exch.b64 %rd7, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.min.s32 %r28, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.min.s64 %rd8, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.min.u32 %r29, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.min.u64 %rd9, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.max.s32 %r30, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.max.s64 %rd10, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.max.u32 %r31, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.max.u64 %rd11, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.inc.u32 %r32, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.dec.u32 %r33, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.and.b32 %r34, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.and.b64 %rd12, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.or.b32 %r35, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.or.b64 %rd13, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.xor.b32 %r36, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.xor.b64 %rd14, [%rd5], 1; +; CHECK-NEXT:atom.relaxed.shared::cluster.cas.b32 %r37, [%rd5], 1, 0; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b32 %r38, [%rd5], 1, 0; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b32 %r39, [%rd5], 1, 0; +; CHECK-NEXT:atom.release.shared::cluster.cas.b32 %r40, [%rd5], 1, 0; +; CHECK-NEXT:atom.acq_rel.shared::cluster.cas.b32 %r41, [%rd5], 1, 0; +; CHECK-NEXT:atom.acq_rel.shared::cluster.cas.b32 %r42, [%rd5], 1, 0; +; CHECK-NEXT:fence.sc.sys; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b32 %r43, [%rd5], 1, 0; +; CHECK-NEXT:fence.sc.sys; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b32 %r44, [%rd5], 1, 0; +; CHECK-NEXT:fence.sc.sys; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b32 %r45, [%rd5], 1, 0; +; CHECK-NEXT:atom.relaxed.shared::cluster.cas.b64 %rd15, [%rd5], 1, 0; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b64 %rd16, [%rd5], 1, 0; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b64 %rd17, [%rd5], 1, 0; +; CHECK-NEXT:atom.release.shared::cluster.cas.b64 %rd18, [%rd5], 1, 0; +; CHECK-NEXT:atom.acq_rel.shared::cluster.cas.b64 %rd19, [%rd5], 1, 0; +; CHECK-NEXT:atom.acq_rel.shared::cluster.cas.b64 %rd20, [%rd5], 1, 0; +; CHECK-NEXT:fence.sc.sys; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b64 %rd21, [%rd5], 1, 0; +; CHECK-NEXT:fence.sc.sys; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b64 %rd22, [%rd5], 1, 0; +; CHECK-NEXT:fence.sc.sys; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b64 %rd23, [%rd5], 1, 0; +; CHECK-NEXT:and.b64 %rd1, %rd5, -4; +; CHECK-NEXT:cvt.u32.u64 %r46, %rd5; +; CHECK-NEXT:and.b32 %r47, %r46, 3; +; CHECK-NEXT:shl.b32 %r1, %r47, 3; +; CHECK-NEXT:mov.b
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -0,0 +1,258 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -o - -mcpu=sm_90 -march=nvptx64 -mattr=+ptx80 | FileCheck %s +; RUN: %if ptxas-12.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| %ptxas-verify -arch=sm_90 %} modiking wrote: Changed https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 : NVPTX::cvta_dshared; +case ADDRESS_SPACE_SHARED_CLUSTER: + Opc = TM.is64Bit() ? NVPTX::cvta_shared_cluster_64 + : NVPTX::cvta_shared_cluster; Artem-B wrote: > What matters for cvta is the size of the generic pointers, which I think is > always 64, right? It will depend on the compilation mode. You are correct that we can not produce valid code for shared_cluster conversions in 32-bit mode, so the question is -- how are we going to fail? If we unconditionally use `NVPTX::cvta_shared_cluster_64` we'll likely break other things because it will hardcode the assumption that generic pointer is 64-bit, while the rest of the DAG and the lowering machinery would be expecting it to be 32 bits. I suspect we'll run into an assertion somewhere. The bottom line is that we may still want to go through the motions, and pretend that we can convert AS to/from cluster_shared in 32-bit mode , so we can finish the compilation, and *then* let ptxas report an error. At least at that point the user would be able to examine the PTX and reason about what went wrong. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -117,13 +117,15 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeNVPTXTarget() { static std::string computeDataLayout(bool is64Bit, bool UseShortPointers) { std::string Ret = "e"; - if (!is64Bit) -Ret += "-p:32:32"; - else if (UseShortPointers) -Ret += "-p3:32:32-p4:32:32-p5:32:32"; - // Tensor Memory (addrspace:6) is always 32-bits. - Ret += "-p6:32:32"; + // Distributed Shared Memory (addrspace:7) follows shared memory + // (addrspace:3). + if (!is64Bit) +Ret += "-p:32:32-p6:32:32-p7:32:32"; + else if (UseShortPointers) { +Ret += "-p3:32:32-p4:32:32-p5:32:32-p6:32:32-p7:32:32"; + } else +Ret += "-p6:32:32"; modiking wrote: To keep the address spaces in order `p6:32:32` is also moved to the conditions above so the comment applies to every case now. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -2034,13 +2038,15 @@ multiclass F_ATOMIC_2_AS, preds>; defm _S : F_ATOMIC_2, preds>; + defm _S_C : F_ATOMIC_2, !listconcat([hasSM<80>], preds)>; defm _GEN : F_ATOMIC_2, preds>; } multiclass F_ATOMIC_3_AS preds = []> { defvar frag_pat = (frag node:$a, node:$b, node:$c); defm _G : F_ATOMIC_3, preds>; defm _S : F_ATOMIC_3, preds>; + defm _S_C : F_ATOMIC_3, !listconcat([hasSM<80>], preds)>; modiking wrote: Yeah it should be hasSM<90> && hasPTX<78> which is captured by `hasClusters`. Changed to `hasClusters` https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -2038,15 +2038,15 @@ multiclass F_ATOMIC_2_AS, preds>; defm _S : F_ATOMIC_2, preds>; - defm _DS : F_ATOMIC_2, !listconcat([hasSM<80>], preds)>; + defm _S_C : F_ATOMIC_2, !listconcat([hasSM<80>], preds)>; modiking wrote: Changed to use `hasClusters` https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -137,6 +137,7 @@ def hasAtomBitwise64 : Predicate<"Subtarget->hasAtomBitwise64()">; def hasAtomMinMax64 : Predicate<"Subtarget->hasAtomMinMax64()">; def hasVote : Predicate<"Subtarget->hasVote()">; def hasDouble : Predicate<"Subtarget->hasDouble()">; +def hasClusters : Predicate<"Subtarget->hasClusters()">; modiking wrote: Good catch, it's meant to unify how we check for support of features: ``` bool hasClusters() const { return SmVersion >= 90 && PTXVersion >= 78; } ``` Updating the predicate checks in the td files to use this instead. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 : NVPTX::cvta_dshared; +case ADDRESS_SPACE_SHARED_CLUSTER: + Opc = TM.is64Bit() ? NVPTX::cvta_shared_cluster_64 + : NVPTX::cvta_shared_cluster; AlexMaclean wrote: The check here is for `TM.is64Bit()` not whether shared memory is 64 or 32 bits. What matters for cvta is the size of the generic pointers, which I think is always 64, right? If that is the case we'll never want the 32-bit variant. Even if there is something I'm missing and there are cases where we would want the 32 bit variant, we should still be checking PTX and SM version here and throwing an error as needed. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -4704,6 +4754,43 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) { CI->eraseFromParent(); return; } + case Intrinsic::nvvm_mapa_shared_cluster: { +// Create a new call with the correct address space. +NewCall = +Builder.CreateCall(NewFn, {CI->getArgOperand(0), CI->getArgOperand(1)}); +Value *Res = NewCall; +Res = Builder.CreateAddrSpaceCast( +Res, Builder.getPtrTy(NVPTXAS::ADDRESS_SPACE_GENERIC)); +Res = Builder.CreateAddrSpaceCast( +Res, Builder.getPtrTy(NVPTXAS::ADDRESS_SPACE_SHARED)); modiking wrote: Not in this case. `nvvm_mapa_shared_cluster` when upgraded returns a `shared_cluster` pointer but the original IR expects a `shared` pointer so the casts are done accordingly https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 : NVPTX::cvta_dshared; +case ADDRESS_SPACE_SHARED_CLUSTER: + Opc = TM.is64Bit() ? NVPTX::cvta_shared_cluster_64 + : NVPTX::cvta_shared_cluster; modiking wrote: Yeah 32-bit shared memory pointers are still supported still as an option even in 64-bit compilation. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -117,13 +117,15 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeNVPTXTarget() { static std::string computeDataLayout(bool is64Bit, bool UseShortPointers) { std::string Ret = "e"; - if (!is64Bit) -Ret += "-p:32:32"; - else if (UseShortPointers) -Ret += "-p3:32:32-p4:32:32-p5:32:32"; - // Tensor Memory (addrspace:6) is always 32-bits. - Ret += "-p6:32:32"; + // Distributed Shared Memory (addrspace:7) follows shared memory + // (addrspace:3). + if (!is64Bit) +Ret += "-p:32:32-p6:32:32-p7:32:32"; + else if (UseShortPointers) { +Ret += "-p3:32:32-p4:32:32-p5:32:32-p6:32:32-p7:32:32"; + } else +Ret += "-p6:32:32"; durga4github wrote: We need the comment from line 120 here. Please retain it as is. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
durga4github wrote: A general thought, Can we include the base changes in this PR and create a separate PR for the intrinsics-migration+MLIR changes? https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -4704,6 +4754,43 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) { CI->eraseFromParent(); return; } + case Intrinsic::nvvm_mapa_shared_cluster: { +// Create a new call with the correct address space. +NewCall = +Builder.CreateCall(NewFn, {CI->getArgOperand(0), CI->getArgOperand(1)}); +Value *Res = NewCall; +Res = Builder.CreateAddrSpaceCast( +Res, Builder.getPtrTy(NVPTXAS::ADDRESS_SPACE_GENERIC)); +Res = Builder.CreateAddrSpaceCast( +Res, Builder.getPtrTy(NVPTXAS::ADDRESS_SPACE_SHARED)); durga4github wrote: Should this be shared_cluster? https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
durga4github wrote: (Sorry I clicked the wrong button `Close` instead of `Comment`) https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/durga4github reopened https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/durga4github closed https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 : NVPTX::cvta_dshared; +case ADDRESS_SPACE_SHARED_CLUSTER: + Opc = TM.is64Bit() ? NVPTX::cvta_shared_cluster_64 + : NVPTX::cvta_shared_cluster; durga4github wrote: okay, I believe we still support 32-bit for shared-memory pointers, even in sm_90+? https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -2034,13 +2038,15 @@ multiclass F_ATOMIC_2_AS, preds>; defm _S : F_ATOMIC_2, preds>; + defm _S_C : F_ATOMIC_2, !listconcat([hasSM<80>], preds)>; defm _GEN : F_ATOMIC_2, preds>; } multiclass F_ATOMIC_3_AS preds = []> { defvar frag_pat = (frag node:$a, node:$b, node:$c); defm _G : F_ATOMIC_3, preds>; defm _S : F_ATOMIC_3, preds>; + defm _S_C : F_ATOMIC_3, !listconcat([hasSM<80>], preds)>; durga4github wrote: Did you mean hasSM<90> here? (May be it is right, I am just trying to get it clarified for myself) https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -0,0 +1,258 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -o - -mcpu=sm_90 -march=nvptx64 -mattr=+ptx80 | FileCheck %s +; RUN: %if ptxas-12.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| %ptxas-verify -arch=sm_90 %} durga4github wrote: and we only need +ptx78 here.. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -43,7 +43,10 @@ enum NVVMMemorySpace { /// Tensor memory space identifier. /// Tensor memory is available only in arch-accelerated /// variants from sm100 onwards. - kTensorMemorySpace = 6 + kTensorMemorySpace = 6, + /// Distributed shared memory space identifier. + /// Distributed shared memory is available only in sm80+. durga4github wrote: We have sm_86, sm_87 etc. that do not support dsmem. So, let us say "sm 90 onwards" here. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-hierarchy https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -137,6 +137,7 @@ def hasAtomBitwise64 : Predicate<"Subtarget->hasAtomBitwise64()">; def hasAtomMinMax64 : Predicate<"Subtarget->hasAtomMinMax64()">; def hasVote : Predicate<"Subtarget->hasVote()">; def hasDouble : Predicate<"Subtarget->hasDouble()">; +def hasClusters : Predicate<"Subtarget->hasClusters()">; AlexMaclean wrote: Is this actually getting used anywhere? If not please remove. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -2038,15 +2038,15 @@ multiclass F_ATOMIC_2_AS, preds>; defm _S : F_ATOMIC_2, preds>; - defm _DS : F_ATOMIC_2, !listconcat([hasSM<80>], preds)>; + defm _S_C : F_ATOMIC_2, !listconcat([hasSM<80>], preds)>; AlexMaclean wrote: The PTX doc seems to say this is supported only for sm_90 and ptx_78 https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 : NVPTX::cvta_dshared; +case ADDRESS_SPACE_SHARED_CLUSTER: + Opc = TM.is64Bit() ? NVPTX::cvta_shared_cluster_64 + : NVPTX::cvta_shared_cluster; AlexMaclean wrote: My understanding is that cluster is not supported until sm_90, and that sm_90+ do not support 32bit compilation. Is there something I'm missing? If not we should never select the 32-bit version here and instead check to ensure we're compiling for sm_90+. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -0,0 +1,258 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -o - -mcpu=sm_90 -march=nvptx64 -mattr=+ptx80 | FileCheck %s +; RUN: %if ptxas-12.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| %ptxas-verify -arch=sm_90 %} + +target triple = "nvptx64-nvidia-cuda" + +@llvm.used = appending global [1 x ptr] [ptr @test_distributed_shared_cluster], section "llvm.metadata" + +declare ptr addrspace(7) @llvm.nvvm.mapa.shared.cluster(ptr addrspace(3), i32) +declare i1 @llvm.nvvm.isspacep.shared.cluster(ptr) +declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() +declare ptr @llvm.nvvm.mapa(ptr, i32) + +define i32 @test_distributed_shared_cluster(ptr %ptr, ptr addrspace(3) %smem_ptr) local_unnamed_addr { +; CHECK-LABEL: test_distributed_shared_cluster( +; CHECK: { +; CHECK-NEXT:.reg .pred %p<13>; +; CHECK-NEXT:.reg .b16 %rs<5>; +; CHECK-NEXT:.reg .b32 %r<69>; +; CHECK-NEXT:.reg .f32 %f<2>; +; CHECK-NEXT:.reg .b64 %rd<24>; +; CHECK-NEXT:.reg .f64 %fd<2>; +; CHECK-EMPTY: +; CHECK-NEXT: // %bb.0: // %entry +; CHECK-NEXT:ld.param.u64 %rd2, [test_distributed_shared_cluster_param_0]; +; CHECK-NEXT:ld.param.u64 %rd3, [test_distributed_shared_cluster_param_1]; +; CHECK-NEXT:mov.u32 %r24, %ctaid.x; +; CHECK-NEXT:xor.b32 %r25, %r24, 1; +; CHECK-NEXT:isspacep.shared::cluster %p1, %rd2; +; CHECK-NEXT:mapa.u64 %rd4, %rd2, %r25; +; CHECK-NEXT:isspacep.shared::cluster %p2, %rd4; +; CHECK-NEXT:mapa.shared::cluster.u64 %rd5, %rd3, %r25; +; CHECK-NEXT:mov.b16 %rs1, 0x3C00; +; CHECK-NEXT:atom.shared::cluster.add.noftz.f16 %rs2, [%rd5], %rs1; +; CHECK-NEXT:mov.b16 %rs3, 0x3F80; +; CHECK-NEXT:atom.shared::cluster.add.noftz.bf16 %rs4, [%rd5], %rs3; +; CHECK-NEXT:atom.shared::cluster.add.f32 %f1, [%rd5], 0f3F80; +; CHECK-NEXT:atom.shared::cluster.add.f64 %fd1, [%rd5], 0d3FF0; +; CHECK-NEXT:atom.shared::cluster.add.u32 %r26, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.add.u64 %rd6, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.exch.b32 %r27, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.exch.b64 %rd7, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.min.s32 %r28, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.min.s64 %rd8, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.min.u32 %r29, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.min.u64 %rd9, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.max.s32 %r30, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.max.s64 %rd10, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.max.u32 %r31, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.max.u64 %rd11, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.inc.u32 %r32, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.dec.u32 %r33, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.and.b32 %r34, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.and.b64 %rd12, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.or.b32 %r35, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.or.b64 %rd13, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.xor.b32 %r36, [%rd5], 1; +; CHECK-NEXT:atom.shared::cluster.xor.b64 %rd14, [%rd5], 1; +; CHECK-NEXT:atom.relaxed.shared::cluster.cas.b32 %r37, [%rd5], 1, 0; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b32 %r38, [%rd5], 1, 0; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b32 %r39, [%rd5], 1, 0; +; CHECK-NEXT:atom.release.shared::cluster.cas.b32 %r40, [%rd5], 1, 0; +; CHECK-NEXT:atom.acq_rel.shared::cluster.cas.b32 %r41, [%rd5], 1, 0; +; CHECK-NEXT:atom.acq_rel.shared::cluster.cas.b32 %r42, [%rd5], 1, 0; +; CHECK-NEXT:fence.sc.sys; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b32 %r43, [%rd5], 1, 0; +; CHECK-NEXT:fence.sc.sys; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b32 %r44, [%rd5], 1, 0; +; CHECK-NEXT:fence.sc.sys; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b32 %r45, [%rd5], 1, 0; +; CHECK-NEXT:atom.relaxed.shared::cluster.cas.b64 %rd15, [%rd5], 1, 0; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b64 %rd16, [%rd5], 1, 0; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b64 %rd17, [%rd5], 1, 0; +; CHECK-NEXT:atom.release.shared::cluster.cas.b64 %rd18, [%rd5], 1, 0; +; CHECK-NEXT:atom.acq_rel.shared::cluster.cas.b64 %rd19, [%rd5], 1, 0; +; CHECK-NEXT:atom.acq_rel.shared::cluster.cas.b64 %rd20, [%rd5], 1, 0; +; CHECK-NEXT:fence.sc.sys; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b64 %rd21, [%rd5], 1, 0; +; CHECK-NEXT:fence.sc.sys; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b64 %rd22, [%rd5], 1, 0; +; CHECK-NEXT:fence.sc.sys; +; CHECK-NEXT:atom.acquire.shared::cluster.cas.b64 %rd23, [%rd5], 1, 0; +; CHECK-NEXT:and.b64 %rd1, %rd5, -4; +; CHECK-NEXT:cvt.u32.u64 %r46, %rd5; +; CHECK-NEXT:and.b32 %r47, %r46, 3; +; CHECK-NEXT:shl.b32 %r1, %r47, 3; +; CHECK-NEXT:mov.b
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/AlexMaclean commented: Backend changes look reasonable so far. One concern I have with this change is that until now we've assumed specific address-spaces are non-overlapping. You've addressed some of the places where this assumption is encoded but I think there are others you have not. For example, addrspacecasts between shared::cta and shared::cluster seem like they should be valid and should be expanded via a pair of cvta instructions through generic space, I think this would produce an error right now. I also wonder if InferAS or other places have made this assumption as well. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -0,0 +1,258 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -o - -mcpu=sm_90 -march=nvptx64 -mattr=+ptx80 | FileCheck %s +; RUN: %if ptxas-12.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| %ptxas-verify -arch=sm_90 %} + +target triple = "nvptx64-nvidia-cuda" + +@llvm.used = appending global [1 x ptr] [ptr @test_distributed_shared_cluster], section "llvm.metadata" + +declare ptr addrspace(7) @llvm.nvvm.mapa.shared.cluster(ptr addrspace(3), i32) +declare i1 @llvm.nvvm.isspacep.shared.cluster(ptr) +declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() +declare ptr @llvm.nvvm.mapa(ptr, i32) + +define i32 @test_distributed_shared_cluster(ptr %ptr, ptr addrspace(3) %smem_ptr) local_unnamed_addr { AlexMaclean wrote: Please break this up into more than one big function. Perhaps group some of the atomic cases together but otherwise each instruction/intrinsic should have it's own function. This will make the test easier to debug if something starts failing at some point. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
@@ -0,0 +1,258 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -o - -mcpu=sm_90 -march=nvptx64 -mattr=+ptx80 | FileCheck %s +; RUN: %if ptxas-12.0 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_90 -mattr=+ptx80| %ptxas-verify -arch=sm_90 %} AlexMaclean wrote: nit: the -march and -mtriple can be removed here. https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/AlexMaclean edited https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/modiking edited https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)
https://github.com/modiking edited https://github.com/llvm/llvm-project/pull/135444 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits