[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-23 Thread Christudasan Devadasan via llvm-branch-commits
cdevadas wrote: ### Merge activity * **Jul 23, 4:02 AM EDT**: @cdevadas started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/96163). https://github.com/llvm/llvm-project/pull/96163

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-23 Thread Christudasan Devadasan via llvm-branch-commits
cdevadas wrote: ### Merge activity * **Jul 23, 4:02 AM EDT**: @cdevadas started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/96162). https://github.com/llvm/llvm-project/pull/96162

[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-22 Thread Christudasan Devadasan via llvm-branch-commits
cdevadas wrote: The latest patch optimizes the PatFrag and the patterns written further by using OtherPredicates. The lit test changes in the latest patch are a missed optimization I incorrectly introduced earlier in this PR for GFX7. It is now fixed and matches the default behavior with the

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-17 Thread Christudasan Devadasan via llvm-branch-commits
cdevadas wrote: > I still think it is terribly surprising all of the test diff shows up in this > commit, and not the selection case Because the selection support is done in the next PR of the review stack, https://github.com/llvm/llvm-project/pull/96162. This patch takes care of choosing

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-10 Thread Christudasan Devadasan via llvm-branch-commits
https://github.com/cdevadas edited https://github.com/llvm/llvm-project/pull/96162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-10 Thread Christudasan Devadasan via llvm-branch-commits
@@ -658,17 +658,17 @@ define amdgpu_kernel void @image_bvh_intersect_ray_nsa_reassign(ptr %p_node_ptr, ; ; GFX1013-LABEL: image_bvh_intersect_ray_nsa_reassign: ; GFX1013: ; %bb.0: -; GFX1013-NEXT:s_load_dwordx8 s[0:7], s[0:1], 0x24 +; GFX1013-NEXT:s_load_dwordx8

[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-10 Thread Christudasan Devadasan via llvm-branch-commits
cdevadas wrote: Ping https://github.com/llvm/llvm-project/pull/96163 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-10 Thread Christudasan Devadasan via llvm-branch-commits
cdevadas wrote: Ping https://github.com/llvm/llvm-project/pull/96162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-04 Thread Christudasan Devadasan via llvm-branch-commits
@@ -183,10 +183,10 @@ define <2 x half> @local_atomic_fadd_v2f16_rtn(ptr addrspace(3) %ptr, <2 x half> define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) %ptr, <2 x i16> %data) { ; GFX940-LABEL: local_atomic_fadd_v2bf16_noret: ; GFX940: ; %bb.0:

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Christudasan Devadasan via llvm-branch-commits
https://github.com/cdevadas edited https://github.com/llvm/llvm-project/pull/96162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Christudasan Devadasan via llvm-branch-commits
https://github.com/cdevadas edited https://github.com/llvm/llvm-project/pull/96162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Christudasan Devadasan via llvm-branch-commits
@@ -183,10 +183,10 @@ define <2 x half> @local_atomic_fadd_v2f16_rtn(ptr addrspace(3) %ptr, <2 x half> define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) %ptr, <2 x i16> %data) { ; GFX940-LABEL: local_atomic_fadd_v2bf16_noret: ; GFX940: ; %bb.0:

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Christudasan Devadasan via llvm-branch-commits
https://github.com/cdevadas edited https://github.com/llvm/llvm-project/pull/96162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-03 Thread Christudasan Devadasan via llvm-branch-commits
@@ -183,10 +183,10 @@ define <2 x half> @local_atomic_fadd_v2f16_rtn(ptr addrspace(3) %ptr, <2 x half> define amdgpu_kernel void @local_atomic_fadd_v2bf16_noret(ptr addrspace(3) %ptr, <2 x i16> %data) { ; GFX940-LABEL: local_atomic_fadd_v2bf16_noret: ; GFX940: ; %bb.0:

[llvm-branch-commits] [llvm] [AMDGPU] Enable atomic optimizer for 64 bit divergent values (PR #96934)

2024-06-27 Thread Christudasan Devadasan via llvm-branch-commits
@@ -178,6 +178,20 @@ bool AMDGPUAtomicOptimizerImpl::run(Function ) { return Changed; } +static bool shouldOptimizeForType(Type *Ty) { + switch (Ty->getTypeID()) { + case Type::FloatTyID: + case Type::DoubleTyID: +return true; + case Type::IntegerTyID: { +if

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-24 Thread Christudasan Devadasan via llvm-branch-commits
@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const CombineInfo , return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM; } case S_LOAD_IMM: -switch (Width) { -default: - return 0; -case 2: - return AMDGPU::S_LOAD_DWORDX2_IMM; -

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-23 Thread Christudasan Devadasan via llvm-branch-commits
@@ -1701,17 +1732,33 @@ unsigned SILoadStoreOptimizer::getNewOpcode(const CombineInfo , return AMDGPU::S_BUFFER_LOAD_DWORDX8_SGPR_IMM; } case S_LOAD_IMM: -switch (Width) { -default: - return 0; -case 2: - return AMDGPU::S_LOAD_DWORDX2_IMM; -

[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-06-22 Thread Christudasan Devadasan via llvm-branch-commits
@@ -867,13 +867,104 @@ def SMRDBufferImm : ComplexPattern; def SMRDBufferImm32 : ComplexPattern; def SMRDBufferSgprImm : ComplexPattern; +class SMRDAlignedLoadPat : PatFrag <(ops node:$ptr), (Op node:$ptr), [{ + // Returns true if it is a naturally aligned multi-dword

[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-06-21 Thread Christudasan Devadasan via llvm-branch-commits
@@ -886,26 +977,17 @@ multiclass SMRD_Pattern { def : GCNPat < (smrd_load (SMRDSgpr i64:$sbase, i32:$soffset)), (vt (!cast(Instr#"_SGPR") $sbase, $soffset, 0))> { -let OtherPredicates = [isNotGFX9Plus]; - } - def : GCNPat < -(smrd_load (SMRDSgpr

[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-06-21 Thread Christudasan Devadasan via llvm-branch-commits
@@ -867,13 +867,104 @@ def SMRDBufferImm : ComplexPattern; def SMRDBufferImm32 : ComplexPattern; def SMRDBufferSgprImm : ComplexPattern; +class SMRDAlignedLoadPat : PatFrag <(ops node:$ptr), (Op node:$ptr), [{ + // Returns true if it is a naturally aligned multi-dword

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Christudasan Devadasan via llvm-branch-commits
cdevadas wrote: > This looks like it is affecting codegen even when xnack is disabled? That > should not happen. It shouldn't. I put the xnack replay subtarget check before using *_ec equivalents. See the code here:

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Christudasan Devadasan via llvm-branch-commits
https://github.com/cdevadas ready_for_review https://github.com/llvm/llvm-project/pull/96162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-06-20 Thread Christudasan Devadasan via llvm-branch-commits
https://github.com/cdevadas ready_for_review https://github.com/llvm/llvm-project/pull/96163 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-06-20 Thread Christudasan Devadasan via llvm-branch-commits
cdevadas wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/96162?utm_source=stack-comment-downstack-mergeability-warning;

[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-06-20 Thread Christudasan Devadasan via llvm-branch-commits
cdevadas wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/96163?utm_source=stack-comment-downstack-mergeability-warning;

[llvm-branch-commits] [clang] [llvm] AMDGPU: Remove ds atomic fadd intrinsics (PR #95396)

2024-06-13 Thread Christudasan Devadasan via llvm-branch-commits
@@ -2331,40 +2337,74 @@ static Value *upgradeARMIntrinsicCall(StringRef Name, CallBase *CI, Function *F, llvm_unreachable("Unknown function for ARM CallBase upgrade."); } +// These are expected to have have the arguments: cdevadas wrote: ```suggestion //

[llvm-branch-commits] [llvm] ff8a1ca - [AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.

2021-01-22 Thread Christudasan Devadasan via llvm-branch-commits
Author: Christudasan Devadasan Date: 2021-01-22T14:20:59+05:30 New Revision: ff8a1cae181438b97937848060da1efb67117ea4 URL: https://github.com/llvm/llvm-project/commit/ff8a1cae181438b97937848060da1efb67117ea4 DIFF:

[llvm-branch-commits] [llvm] c971bcd - [AMDGPU] Test clean up (NFC)

2021-01-22 Thread Christudasan Devadasan via llvm-branch-commits
Author: Christudasan Devadasan Date: 2021-01-22T13:38:52+05:30 New Revision: c971bcd2102b905e6469463fb8309ab3f7b2b8f2 URL: https://github.com/llvm/llvm-project/commit/c971bcd2102b905e6469463fb8309ab3f7b2b8f2 DIFF:

[llvm-branch-commits] [llvm] ae25a39 - AMDGPU/GlobalISel: Enable sret demotion

2021-01-07 Thread Christudasan Devadasan via llvm-branch-commits
Author: Christudasan Devadasan Date: 2021-01-08T10:56:35+05:30 New Revision: ae25a397e9de833ffbd5d8e3b480086404625cb7 URL: https://github.com/llvm/llvm-project/commit/ae25a397e9de833ffbd5d8e3b480086404625cb7 DIFF:

[llvm-branch-commits] [llvm] d68458b - [GlobalISel] Base implementation for sret demotion.

2021-01-05 Thread Christudasan Devadasan via llvm-branch-commits
Author: Christudasan Devadasan Date: 2021-01-06T10:30:50+05:30 New Revision: d68458bd56d9d55b05fca5447891aa8752d70509 URL: https://github.com/llvm/llvm-project/commit/d68458bd56d9d55b05fca5447891aa8752d70509 DIFF: