[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
@@ -908,19 +919,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
ritter-x2a wrote:
What would be the behavior that we want from tablegen? Should the target be
able to specify "PTRADD should be considered commutative in tablegen'erated
ISel patterns"?
In general, PTRADD is not commutable, so treating it as commutable shouldn't be
the default. We can only treat it as commutable here because we know that we
are trying to lower it to an addition in this pattern.
We also don't want to treat PTRADD as commutable everywhere in the AMDGPU
backend since my goal with this effort is to check if to-be-folded immediate
offset additions are inbounds.
I'd prefer a solution that expresses that ptradds on AMDGPU should be folded
into the addressing mode, and if that's not possible, they should be replaced
by an ISD::ADD node and the ADD matching rules should be applied.
However, I haven't found a way to do that in the framework: Replacing
ISD::PTRADD with ISD::ADD sounds like a legalization or DAGCombine task, but
this shouldn't happen before the addressing mode is matched, which happens in
the proper selection phase.
https://github.com/llvm/llvm-project/pull/143881
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
@@ -908,19 +919,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
arsenm wrote:
Should really avoid this, commutable is supposed to be automatic. It may
require a special case for ptradd in tablegen itself
https://github.com/llvm/llvm-project/pull/143881
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
@@ -544,7 +545,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
arsenm wrote:
You shouldn't need to explicitly commute the patterns, the pattern generator
should do this for commutable nodes
https://github.com/llvm/llvm-project/pull/143881
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/Pierre-vh approved this pull request. https://github.com/llvm/llvm-project/pull/143881 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From 531b230f3a828d5f39cf0d2393d18d961d6be42d Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 279de32a9cee8..4548beadf23ae 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -512,12 +512,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -544,7 +545,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -834,12 +838,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -908,19 +919,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From 531b230f3a828d5f39cf0d2393d18d961d6be42d Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 279de32a9cee8..4548beadf23ae 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -512,12 +512,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -544,7 +545,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -834,12 +838,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -908,19 +919,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From d2ca0b2c1b0d9bb3da82b3dfbf82b74ae2b3f978 Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 279de32a9cee8..4548beadf23ae 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -512,12 +512,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -544,7 +545,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -834,12 +838,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -908,19 +919,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From d2ca0b2c1b0d9bb3da82b3dfbf82b74ae2b3f978 Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 279de32a9cee8..4548beadf23ae 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -512,12 +512,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -544,7 +545,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -834,12 +838,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -908,19 +919,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From f8fbb5733c03f68f8ff12401e0ff3468bf392027 Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 9ed054449c264..df45c37aeec6c 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -512,12 +512,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -544,7 +545,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -834,12 +838,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -908,19 +919,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From 71346e57396657e86898f1177339c0a7897422ac Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 89a9ecc27c6ed..86889e4fd6faf 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -508,12 +508,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -540,7 +541,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -830,12 +834,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -904,19 +915,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From 40319e7037d2057afb4d8814f1c897b85968532e Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 89a9ecc27c6ed..86889e4fd6faf 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -508,12 +508,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -540,7 +541,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -830,12 +834,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -904,19 +915,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From 40319e7037d2057afb4d8814f1c897b85968532e Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 89a9ecc27c6ed..86889e4fd6faf 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -508,12 +508,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -540,7 +541,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -830,12 +834,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = HasLshlAddU64Inst in
+let SubtargetPredicate = HasLshlAddU64Inst in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = HasLshlAddU64Inst
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -904,19 +915,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From 46090a8031fde937a76268ce7adbbdc6f42911ad Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_u64(ptr %base,
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From 860123fb50e6b9d4a772c350873507e7faaa1f71 Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_u64(ptr %base,
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From 860123fb50e6b9d4a772c350873507e7faaa1f71 Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_u64(ptr %base,
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From b919098ad59718c1b5e642b458705e765c525d48 Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_u64(ptr %base,
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From b919098ad59718c1b5e642b458705e765c525d48 Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_u64(ptr %base,
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From f93590bac710750f993c86005c217b843cc5a863 Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_u64(ptr %base,
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/143881
>From f93590bac710750f993c86005c217b843cc5a863 Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index a005e0245b8ff..8054e75782539 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -806,12 +810,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -880,19 +891,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_u64(ptr %base,
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a created
https://github.com/llvm/llvm-project/pull/143881
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
>From 3d96a5833b46f0aad869a616feee2b5cfe947e61 Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Thu, 12 Jun 2025 07:44:37 -0400
Subject: [PATCH] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/VOP3Instructions.td| 36 +++-
.../AMDGPU/ptradd-sdag-optimizations.ll | 41 ++
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 42 +++
3 files changed, 52 insertions(+), 67 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 594b37bb6e21a..c3088d6bb1dca 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -748,12 +752,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -822,19 +833,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
https://github.com/ritter-x2a ready_for_review https://github.com/llvm/llvm-project/pull/143881 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
llvmbot wrote:
@llvm/pr-subscribers-backend-amdgpu
Author: Fabian Ritter (ritter-x2a)
Changes
This patch mirrors similar patterns for ISD::ADD. The main difference is
that ISD::ADD is commutative, so that a pattern definition for, e.g.,
(add (mul x, y), z), automatically also handles (add z, (mul x, y)).
ISD::PTRADD is not commutative, so we would need to handle these cases
explicitly. This patch only implements (ptradd z, (op x, y)) patterns,
where the nested operation (shift or multiply) is the offset of the
ptradd (i.e., the right operand), since base pointers that are the
result of a shift or multiply seem less likely.
For SWDEV-516125.
---
Full diff: https://github.com/llvm/llvm-project/pull/143881.diff
3 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+26-10)
- (modified) llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll (+12-29)
- (modified) llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll (+14-28)
``diff
diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
index 594b37bb6e21a..c3088d6bb1dca 100644
--- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td
@@ -484,12 +484,13 @@ let OtherPredicates = [isGFX10Plus, Has16BitInsts],
True16Predicate = NotHasTrue
defm: Ternary_i16_Pats_gfx9;
} // End OtherPredicates = [isGFX10Plus, Has16BitInsts], True16Predicate =
NotHasTrue16BitInsts
-class ThreeOpFragSDAG : PatFrag<
+class ThreeOpFragSDAG : PatFrag<
(ops node:$x, node:$y, node:$z),
// When the inner operation is used multiple times, selecting 3-op
// instructions may still be beneficial -- if the other users can be
// combined similarly. Let's be conservative for now.
- (op2 (HasOneUseBinOp node:$x, node:$y), node:$z),
+ !if(op1IsRight, (op2 node:$z, (HasOneUseBinOp node:$x, node:$y)),
+ (op2 (HasOneUseBinOp node:$x, node:$y), node:$z)),
[{
// Only use VALU ops when the result is divergent.
if (!N->isDivergent())
@@ -516,7 +517,10 @@ class ThreeOpFragSDAG : PatFrag<
let PredicateCodeUsesOperands = 1;
}
-class ThreeOpFrag :
ThreeOpFragSDAG {
+// Matches (op2 (op1 x, y), z) if op1IsRight = 0 and
+// matches (op2 z, (op1, x, y)) if op1IsRight = 1.
+class ThreeOpFrag : ThreeOpFragSDAG {
// The divergence predicate is irrelevant in GlobalISel, as we have
// proper register bank checks. We just need to verify the constant
// bus restriction when all the sources are considered.
@@ -748,12 +752,19 @@ def : GCNPat<
(DivergentBinFrag i32:$src0, IsPow2Plus1:$src1),
(V_LSHL_ADD_U32_e64 i32:$src0, (i32 (Log2_32 imm:$src1)), i32:$src0)>;
-let SubtargetPredicate = isGFX940Plus in
+let SubtargetPredicate = isGFX940Plus in {
def : GCNPat<
(ThreeOpFrag i64:$src0, i32:$src1, i64:$src2),
(V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
>;
+def : GCNPat <
+ // (ptradd z, (shl x, y)) -> ((x << y) + z)
+ (ThreeOpFrag i64:$src0, i32:$src1,
i64:$src2),
+ (V_LSHL_ADD_U64_e64 VSrc_b64:$src0, VSrc_b32:$src1, VSrc_b64:$src2)
+>;
+} // End SubtargetPredicate = isGFX940Plus
+
def : VOPBinOpClampPat;
def : VOPBinOpClampPat;
@@ -822,19 +833,24 @@ multiclass IMAD32_Pats {
// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a
normal mul.
// We need to separate this because otherwise OtherPredicates would be
overriden.
-class IMAD32_Mul24_Pat: GCNPat <
-(i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
-(inst $src0, $src1, $src2, 0 /* clamp */)
->;
+class IMAD32_Mul24_Pats_Impl : GCNPat <
+!if(mulIsRight, (i64 (AddOp i64:$src2, (i64 (AMDGPUmul_u24 i32:$src0,
i32:$src1,
+(i64 (AddOp (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)),
i64:$src2))),
+(inst $src0, $src1, $src2, 0 /* clamp */)>;
+
+multiclass IMAD32_Mul24_Pats {
+ def : IMAD32_Mul24_Pats_Impl;
+ def : IMAD32_Mul24_Pats_Impl;
+}
// exclude pre-GFX9 where it was slow
let OtherPredicates = [HasNotMADIntraFwdBug], SubtargetPredicate = isGFX9Plus
in {
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
let OtherPredicates = [HasMADIntraFwdBug], SubtargetPredicate = isGFX11Only in
{
defm : IMAD32_Pats;
- def : IMAD32_Mul24_Pat;
+ defm : IMAD32_Mul24_Pats;
}
def VOP3_PERMLANE_Profile : VOP3_Profile,
VOP3_OPSEL> {
diff --git a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
index d48bfe0bb7f21..34bb98550de04 100644
--- a/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
+++ b/llvm/test/CodeGen/AMDGPU/ptradd-sdag-optimizations.ll
@@ -266,18 +266,11 @@ define amdgpu_kernel void @fold_mad64(ptr addrspace(1)
%p) {
; Use non-zero shift amounts in v_lshl_add_u64.
define ptr @select_v_lshl_add_u64(ptr %base, i64 %voffset) {
-; GFX942_PTRADD-LABEL: select_v_lshl_add_u64:
-; GFX942_PTRADD: ; %bb.0:
-; GFX942_PTRADD-NEXT:s_waitcnt vmcnt(0) e
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Handle ISD::PTRADD in VOP3 patterns (PR #143881)
ritter-x2a wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/143881?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#143881** https://app.graphite.dev/github/pr/llvm/llvm-project/143881?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/143881?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#143880** https://app.graphite.dev/github/pr/llvm/llvm-project/143880?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#143673** https://app.graphite.dev/github/pr/llvm/llvm-project/143673?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#143672** https://app.graphite.dev/github/pr/llvm/llvm-project/143672?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#142778** https://app.graphite.dev/github/pr/llvm/llvm-project/142778?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#142777** https://app.graphite.dev/github/pr/llvm/llvm-project/142777?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#142739** https://app.graphite.dev/github/pr/llvm/llvm-project/142739?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#142738** https://app.graphite.dev/github/pr/llvm/llvm-project/142738?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141725** https://app.graphite.dev/github/pr/llvm/llvm-project/141725?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/143881 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
