[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)
https://github.com/cdevadas approved this pull request. https://github.com/llvm/llvm-project/pull/149292 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/149292
>From f46e89e232948948cc6646a7e6d8adab5c278f94 Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Thu, 17 Jul 2025 15:50:43 +0900
Subject: [PATCH 1/2] AMDGPU: Add pass to replace constant materialize with AV
pseudos
If we have a v_mov_b32 or v_accvgpr_write_b32 with an inline immediate,
replace it with a pseudo which writes to the combined AV_* class. This
relaxes the operand constraints, which will allow the allocator to
inflate the register class to AV_* to potentially avoid spilling.
The allocator does not know how to replace an instruction to enable
the change of register class. I originally tried to do this by changing
all of the places we introduce v_mov_b32 with immediate, but it's along
tail of niche cases that require manual updating. Plus we can restrict
this to only run on functions where we know we will be allocating AGPRs.
---
llvm/lib/Target/AMDGPU/AMDGPU.h | 3 +
llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def | 1 +
.../Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp | 108 ++
.../Target/AMDGPU/AMDGPUPrepareAGPRAlloc.h| 23
.../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 13 +++
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h | 2 +
llvm/lib/Target/AMDGPU/CMakeLists.txt | 1 +
llvm/lib/Target/AMDGPU/SIInstrInfo.h | 1 -
llvm/test/CodeGen/AMDGPU/agpr-remat.ll| 18 +--
.../AMDGPU/amdgpu-prepare-agpr-alloc.mir | 95 +++
.../branch-folding-implicit-def-subreg.ll | 46
llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 4 +-
llvm/test/CodeGen/AMDGPU/llc-pipeline.ll | 4 +
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll | 4 +-
.../CodeGen/AMDGPU/no-fold-accvgpr-mov.ll | 10 +-
.../CodeGen/AMDGPU/no-fold-accvgpr-mov.mir| 28 ++---
.../CodeGen/AMDGPU/no-fold-accvgpr-read.mir | 26 ++---
...al-regcopy-and-spill-missed-at-regalloc.ll | 20 ++--
.../CodeGen/AMDGPU/spill-vector-superclass.ll | 6 +-
19 files changed, 330 insertions(+), 83 deletions(-)
create mode 100644 llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
create mode 100644 llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.h
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgpu-prepare-agpr-alloc.mir
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 23f106a9c1d4d..007b481f84960 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -153,6 +153,9 @@ struct AMDGPULowerBufferFatPointersPass
const TargetMachine &TM;
};
+void initializeAMDGPUPrepareAGPRAllocLegacyPass(PassRegistry &);
+extern char &AMDGPUPrepareAGPRAllocLegacyID;
+
void initializeAMDGPUReserveWWMRegsLegacyPass(PassRegistry &);
extern char &AMDGPUReserveWWMRegsLegacyID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 250547acb1ee7..b6c6d927d0e89 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -114,6 +114,7 @@ MACHINE_FUNCTION_PASS("amdgpu-rewrite-partial-reg-uses",
GCNRewritePartialRegUse
MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", AMDGPUSetWavePriorityPass())
MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations",
GCNPreRAOptimizationsPass())
MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog",
AMDGPUPreloadKernArgPrologPass())
+MACHINE_FUNCTION_PASS("amdgpu-prepare-agpr-alloc",
AMDGPUPrepareAGPRAllocPass())
MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass())
MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass())
MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
new file mode 100644
index 0..63a21f8cdba4c
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
@@ -0,0 +1,108 @@
+//===-- AMDGPUPrepareAGPRAlloc.cpp
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Make simple transformations to relax register constraints for cases which
can
+// allocate to AGPRs or VGPRs. Replace materialize of inline immediates into
+// AGPR or VGPR with a pseudo with an AV_* class register constraint. This
+// allows later passes to inflate the register class if necessary. The register
+// allocator does not know to replace instructions to relax constraints.
+//
+//===--===//
+
+#include "AMDGPUPrepareAGPRAlloc.h"
+#include "AMDGPU.h"
+#include "GCNSubtarget.h"
+#include "SIMachineFunctionInfo.h"
+#include "SIRegist
[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)
https://github.com/rampitec approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/149292 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)
https://github.com/arsenm created
https://github.com/llvm/llvm-project/pull/149292
If we have a v_mov_b32 or v_accvgpr_write_b32 with an inline immediate,
replace it with a pseudo which writes to the combined AV_* class. This
relaxes the operand constraints, which will allow the allocator to
inflate the register class to AV_* to potentially avoid spilling.
The allocator does not know how to replace an instruction to enable
the change of register class. I originally tried to do this by changing
all of the places we introduce v_mov_b32 with immediate, but it's along
tail of niche cases that require manual updating. Plus we can restrict
this to only run on functions where we know we will be allocating AGPRs.
>From f46e89e232948948cc6646a7e6d8adab5c278f94 Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Thu, 17 Jul 2025 15:50:43 +0900
Subject: [PATCH] AMDGPU: Add pass to replace constant materialize with AV
pseudos
If we have a v_mov_b32 or v_accvgpr_write_b32 with an inline immediate,
replace it with a pseudo which writes to the combined AV_* class. This
relaxes the operand constraints, which will allow the allocator to
inflate the register class to AV_* to potentially avoid spilling.
The allocator does not know how to replace an instruction to enable
the change of register class. I originally tried to do this by changing
all of the places we introduce v_mov_b32 with immediate, but it's along
tail of niche cases that require manual updating. Plus we can restrict
this to only run on functions where we know we will be allocating AGPRs.
---
llvm/lib/Target/AMDGPU/AMDGPU.h | 3 +
llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def | 1 +
.../Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp | 108 ++
.../Target/AMDGPU/AMDGPUPrepareAGPRAlloc.h| 23
.../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 13 +++
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h | 2 +
llvm/lib/Target/AMDGPU/CMakeLists.txt | 1 +
llvm/lib/Target/AMDGPU/SIInstrInfo.h | 1 -
llvm/test/CodeGen/AMDGPU/agpr-remat.ll| 18 +--
.../AMDGPU/amdgpu-prepare-agpr-alloc.mir | 95 +++
.../branch-folding-implicit-def-subreg.ll | 46
llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 4 +-
llvm/test/CodeGen/AMDGPU/llc-pipeline.ll | 4 +
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll | 4 +-
.../CodeGen/AMDGPU/no-fold-accvgpr-mov.ll | 10 +-
.../CodeGen/AMDGPU/no-fold-accvgpr-mov.mir| 28 ++---
.../CodeGen/AMDGPU/no-fold-accvgpr-read.mir | 26 ++---
...al-regcopy-and-spill-missed-at-regalloc.ll | 20 ++--
.../CodeGen/AMDGPU/spill-vector-superclass.ll | 6 +-
19 files changed, 330 insertions(+), 83 deletions(-)
create mode 100644 llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
create mode 100644 llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.h
create mode 100644 llvm/test/CodeGen/AMDGPU/amdgpu-prepare-agpr-alloc.mir
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 23f106a9c1d4d..007b481f84960 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -153,6 +153,9 @@ struct AMDGPULowerBufferFatPointersPass
const TargetMachine &TM;
};
+void initializeAMDGPUPrepareAGPRAllocLegacyPass(PassRegistry &);
+extern char &AMDGPUPrepareAGPRAllocLegacyID;
+
void initializeAMDGPUReserveWWMRegsLegacyPass(PassRegistry &);
extern char &AMDGPUReserveWWMRegsLegacyID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 250547acb1ee7..b6c6d927d0e89 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -114,6 +114,7 @@ MACHINE_FUNCTION_PASS("amdgpu-rewrite-partial-reg-uses",
GCNRewritePartialRegUse
MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", AMDGPUSetWavePriorityPass())
MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations",
GCNPreRAOptimizationsPass())
MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog",
AMDGPUPreloadKernArgPrologPass())
+MACHINE_FUNCTION_PASS("amdgpu-prepare-agpr-alloc",
AMDGPUPrepareAGPRAllocPass())
MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass())
MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass())
MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
new file mode 100644
index 0..63a21f8cdba4c
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
@@ -0,0 +1,108 @@
+//===-- AMDGPUPrepareAGPRAlloc.cpp
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--
[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)
llvmbot wrote:
@llvm/pr-subscribers-backend-amdgpu
Author: Matt Arsenault (arsenm)
Changes
If we have a v_mov_b32 or v_accvgpr_write_b32 with an inline immediate,
replace it with a pseudo which writes to the combined AV_* class. This
relaxes the operand constraints, which will allow the allocator to
inflate the register class to AV_* to potentially avoid spilling.
The allocator does not know how to replace an instruction to enable
the change of register class. I originally tried to do this by changing
all of the places we introduce v_mov_b32 with immediate, but it's along
tail of niche cases that require manual updating. Plus we can restrict
this to only run on functions where we know we will be allocating AGPRs.
---
Patch is 69.33 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/149292.diff
19 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+3)
- (modified) llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def (+1)
- (added) llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp (+108)
- (added) llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.h (+23)
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+13)
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h (+2)
- (modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (-1)
- (modified) llvm/test/CodeGen/AMDGPU/agpr-remat.ll (+9-9)
- (added) llvm/test/CodeGen/AMDGPU/amdgpu-prepare-agpr-alloc.mir (+95)
- (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll
(+23-23)
- (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+4)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/no-fold-accvgpr-mov.ll (+4-6)
- (modified) llvm/test/CodeGen/AMDGPU/no-fold-accvgpr-mov.mir (+14-14)
- (modified) llvm/test/CodeGen/AMDGPU/no-fold-accvgpr-read.mir (+13-13)
- (modified)
llvm/test/CodeGen/AMDGPU/partial-regcopy-and-spill-missed-at-regalloc.ll
(+10-10)
- (modified) llvm/test/CodeGen/AMDGPU/spill-vector-superclass.ll (+3-3)
``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 23f106a9c1d4d..007b481f84960 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -153,6 +153,9 @@ struct AMDGPULowerBufferFatPointersPass
const TargetMachine &TM;
};
+void initializeAMDGPUPrepareAGPRAllocLegacyPass(PassRegistry &);
+extern char &AMDGPUPrepareAGPRAllocLegacyID;
+
void initializeAMDGPUReserveWWMRegsLegacyPass(PassRegistry &);
extern char &AMDGPUReserveWWMRegsLegacyID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 250547acb1ee7..b6c6d927d0e89 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -114,6 +114,7 @@ MACHINE_FUNCTION_PASS("amdgpu-rewrite-partial-reg-uses",
GCNRewritePartialRegUse
MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", AMDGPUSetWavePriorityPass())
MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations",
GCNPreRAOptimizationsPass())
MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog",
AMDGPUPreloadKernArgPrologPass())
+MACHINE_FUNCTION_PASS("amdgpu-prepare-agpr-alloc",
AMDGPUPrepareAGPRAllocPass())
MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass())
MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass())
MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
new file mode 100644
index 0..63a21f8cdba4c
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
@@ -0,0 +1,108 @@
+//===-- AMDGPUPrepareAGPRAlloc.cpp
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Make simple transformations to relax register constraints for cases which
can
+// allocate to AGPRs or VGPRs. Replace materialize of inline immediates into
+// AGPR or VGPR with a pseudo with an AV_* class register constraint. This
+// allows later passes to inflate the register class if necessary. The register
+// allocator does not know to replace instructions to relax constraints.
+//
+//===--===//
+
+#include "AMDGPUPrepareAGPRAlloc.h"
+#include "AMDGPU.h"
+#include "GCNSubtarget.h"
+#include "SIMachineFunctionInfo.h"
+#include "SIRegisterInfo.h"
+#include "llvm/CodeGen/LiveIntervals.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/InitializePasses
[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/149292 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/149292?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#149292** https://app.graphite.dev/github/pr/llvm/llvm-project/149292?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/149292?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#149291** https://app.graphite.dev/github/pr/llvm/llvm-project/149291?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#149099** https://app.graphite.dev/github/pr/llvm/llvm-project/149099?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/149292 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
