[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-07-14 Thread Fabian Ritter via llvm-branch-commits


@@ -908,19 +919,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),

ritter-x2a wrote:

What would be the behavior that we want from tablegen? Should the target be 
able to specify "PTRADD should be considered commutative in tablegen'erated 
ISel patterns"?
In general, PTRADD is not commutable, so treating it as commutable shouldn't be 
the default. We can only treat it as commutable here because we know that we 
are trying to lower it to an addition in this pattern.
We also don't want to treat PTRADD as commutable everywhere in the AMDGPU 
backend since my goal with this effort is to check if to-be-folded immediate 
offset additions are inbounds.

I'd prefer a solution that expresses that ptradds on AMDGPU should be folded 
into the addressing mode, and if that's not possible, they should be replaced 
by an ISD::ADD node and the ADD matching rules should be applied.
However, I haven't found a way to do that in the framework: Replacing 
ISD::PTRADD with ISD::ADD sounds like a legalization or DAGCombine task, but 
this shouldn't happen before the addressing mode is matched, which happens in 
the proper selection phase.

https://github.com/llvm/llvm-project/pull/143881
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-07-14 Thread Matt Arsenault via llvm-branch-commits


@@ -908,19 +919,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),

arsenm wrote:

Should really avoid this, commutable is supposed to be automatic. It may 
require a special case for ptradd in tablegen itself 

https://github.com/llvm/llvm-project/pull/143881
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-07-14 Thread Matt Arsenault via llvm-branch-commits


@@ -544,7 +545,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.

arsenm wrote:

You shouldn't need to explicitly commute the patterns, the pattern generator 
should do this for commutable nodes 

https://github.com/llvm/llvm-project/pull/143881
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-07-14 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh approved this pull request.


https://github.com/llvm/llvm-project/pull/143881
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-07-11 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From 531b230f3a828d5f39cf0d2393d18d961d6be42d Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 279de32a9cee8..4548beadf23ae 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -512,12 +512,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -544,7 +545,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -834,12 +838,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -908,19 +919,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-07-11 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From 531b230f3a828d5f39cf0d2393d18d961d6be42d Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 279de32a9cee8..4548beadf23ae 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -512,12 +512,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -544,7 +545,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -834,12 +838,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -908,19 +919,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-27 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From d2ca0b2c1b0d9bb3da82b3dfbf82b74ae2b3f978 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 279de32a9cee8..4548beadf23ae 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -512,12 +512,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -544,7 +545,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -834,12 +838,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -908,19 +919,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-27 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From d2ca0b2c1b0d9bb3da82b3dfbf82b74ae2b3f978 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 279de32a9cee8..4548beadf23ae 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -512,12 +512,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -544,7 +545,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -834,12 +838,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -908,19 +919,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-26 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From f8fbb5733c03f68f8ff12401e0ff3468bf392027 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 9ed054449c264..df45c37aeec6c 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -512,12 +512,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -544,7 +545,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -834,12 +838,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -908,19 +919,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-23 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From 71346e57396657e86898f1177339c0a7897422ac Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 89a9ecc27c6ed..86889e4fd6faf 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -508,12 +508,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -540,7 +541,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -830,12 +834,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -904,19 +915,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-23 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From 40319e7037d2057afb4d8814f1c897b85968532e Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 89a9ecc27c6ed..86889e4fd6faf 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -508,12 +508,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -540,7 +541,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -830,12 +834,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -904,19 +915,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-23 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From 40319e7037d2057afb4d8814f1c897b85968532e Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 89a9ecc27c6ed..86889e4fd6faf 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -508,12 +508,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -540,7 +541,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -830,12 +834,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -904,19 +915,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-13 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From 46090a8031fde937a76268ce7adbbdc6f42911ad Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_u64(ptr %base, 

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-13 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From 860123fb50e6b9d4a772c350873507e7faaa1f71 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_u64(ptr %base, 

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-13 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From 860123fb50e6b9d4a772c350873507e7faaa1f71 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_u64(ptr %base, 

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-13 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From b919098ad59718c1b5e642b458705e765c525d48 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_u64(ptr %base, 

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-13 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From b919098ad59718c1b5e642b458705e765c525d48 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_u64(ptr %base, 

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-13 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From f93590bac710750f993c86005c217b843cc5a863 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_u64(ptr %base, 

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-13 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/143881

>From f93590bac710750f993c86005c217b843cc5a863 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_u64(ptr %base, 

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-12 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a created 
https://github.com/llvm/llvm-project/pull/143881

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.

>From 3d96a5833b46f0aad869a616feee2b5cfe947e61 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.
---
 llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
 .../AMDGPU/ptradd-sdag-optimizations.ll   | 41 ++
 llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll   | 42 +++
 3 files changed, 52 insertions(+), 67 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 594b37bb6e21a..c3088d6bb1dca 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -748,12 +752,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -822,19 +833,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+ 

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-12 Thread Fabian Ritter via llvm-branch-commits

https://github.com/ritter-x2a ready_for_review 
https://github.com/llvm/llvm-project/pull/143881
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-12 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Fabian Ritter (ritter-x2a)


Changes

This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.

For SWDEV-516125.

---
Full diff: https://github.com/llvm/llvm-project/pull/143881.diff


3 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+26-10) 
- (modified) llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll (+12-29) 
- (modified) llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll (+14-28) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td 
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 594b37bb6e21a..c3088d6bb1dca 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts], 
True16Predicate = NotHasTrue
   defm: Ternary_i16_Pats_gfx9;
 } // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate = 
NotHasTrue16BitInsts
 
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
   (ops node:$x, node:$y, node:$z),
   // When the inner operation is used multiple times, selecting 3-op
   // instructions may still be beneficial -- if the other users can be
   // combined similarly. Let's be conservative for now.
-  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+  !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+  (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
   [{
 // Only use VALU ops when the result is divergent.
 if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
   let PredicateCodeUsesOperands = 1;
 }
 
-class ThreeOpFrag : 
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
   // The divergence predicate is irrelevant in GlobalISel, as we have
   // proper register bank checks. We just need to verify the constant
   // bus restriction when all the sources are considered.
@@ -748,12 +752,19 @@ def : GCNPat<
  (DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
  (V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
 
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
 def : GCNPat<
   (ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
   (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
 >;
 
+def : GCNPat <
+  // (ptradd z, (shl x, y)) -> ((x << y) + z)
+  (ThreeOpFrag i64:$src0, i32:$src1, 
i64:$src2),
+  (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
 def : VOPBinOpClampPat;
 def : VOPBinOpClampPat;
 
@@ -822,19 +833,24 @@ multiclass IMAD32_Pats  {
 
 // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a 
normal mul.
 // We need to separate this because otherwise OtherPredicates would be 
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0, 
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), 
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+  def : IMAD32_Mul24_Pats_Impl;
+  def : IMAD32_Mul24_Pats_Impl;
+}
 
 // exclude pre-GFX9 where it was slow
 let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus 
in {
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in 
{
   defm : IMAD32_Pats;
-  def : IMAD32_Mul24_Pat;
+  defm : IMAD32_Mul24_Pats;
 }
 
 def VOP3_PERMLANE_Profile : VOP3_Profile, 
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll 
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1) 
%p) {
 
 ; Use non-zero shift amounts in v_lshl_add_u64.
 define ptr @select_v_lshl_add_u64(ptr %base, i64 %voffset) {
-; GFX942_PTRADD-LABEL: select_v_lshl_add_u64:
-; GFX942_PTRADD:   ; %bb.0:
-; GFX942_PTRADD-NEXT:s_waitcnt vmcnt(0) e

[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)

2025-06-12 Thread Fabian Ritter via llvm-branch-commits

ritter-x2a wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/143881?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#143881** https://app.graphite.dev/github/pr/llvm/llvm-project/143881?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/143881?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#143880** https://app.graphite.dev/github/pr/llvm/llvm-project/143880?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#143673** https://app.graphite.dev/github/pr/llvm/llvm-project/143673?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#143672** https://app.graphite.dev/github/pr/llvm/llvm-project/143672?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#142778** https://app.graphite.dev/github/pr/llvm/llvm-project/142778?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#142777** https://app.graphite.dev/github/pr/llvm/llvm-project/142777?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#142739** https://app.graphite.dev/github/pr/llvm/llvm-project/142739?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#142738** https://app.graphite.dev/github/pr/llvm/llvm-project/142738?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#141725** https://app.graphite.dev/github/pr/llvm/llvm-project/141725?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/143881
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits