[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)

2025-07-17 Thread Christudasan Devadasan via llvm-branch-commits

https://github.com/cdevadas approved this pull request.


https://github.com/llvm/llvm-project/pull/149292
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)

2025-07-17 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/149292

>From f46e89e232948948cc6646a7e6d8adab5c278f94 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 17 Jul 2025 15:50:43 +0900
Subject: [PATCH 1/2] AMDGPU: Add pass to replace constant materialize with AV
 pseudos

If we have a v_mov_b32 or v_accvgpr_write_b32 with an inline immediate,
replace it with a pseudo which writes to the combined AV_* class. This
relaxes the operand constraints, which will allow the allocator to
inflate the register class to AV_* to potentially avoid spilling.

The allocator does not know how to replace an instruction to enable
the change of register class. I originally tried to do this by changing
all of the places we introduce v_mov_b32 with immediate, but it's along
tail of niche cases that require manual updating. Plus we can restrict
this to only run on functions where we know we will be allocating AGPRs.
---
 llvm/lib/Target/AMDGPU/AMDGPU.h   |   3 +
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def |   1 +
 .../Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp  | 108 ++
 .../Target/AMDGPU/AMDGPUPrepareAGPRAlloc.h|  23 
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  13 +++
 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h  |   2 +
 llvm/lib/Target/AMDGPU/CMakeLists.txt |   1 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  |   1 -
 llvm/test/CodeGen/AMDGPU/agpr-remat.ll|  18 +--
 .../AMDGPU/amdgpu-prepare-agpr-alloc.mir  |  95 +++
 .../branch-folding-implicit-def-subreg.ll |  46 
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  |   4 +-
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll  |   4 +
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll  |   4 +-
 .../CodeGen/AMDGPU/no-fold-accvgpr-mov.ll |  10 +-
 .../CodeGen/AMDGPU/no-fold-accvgpr-mov.mir|  28 ++---
 .../CodeGen/AMDGPU/no-fold-accvgpr-read.mir   |  26 ++---
 ...al-regcopy-and-spill-missed-at-regalloc.ll |  20 ++--
 .../CodeGen/AMDGPU/spill-vector-superclass.ll |   6 +-
 19 files changed, 330 insertions(+), 83 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
 create mode 100644 llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.h
 create mode 100644 llvm/test/CodeGen/AMDGPU/amdgpu-prepare-agpr-alloc.mir

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 23f106a9c1d4d..007b481f84960 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -153,6 +153,9 @@ struct AMDGPULowerBufferFatPointersPass
   const TargetMachine &TM;
 };
 
+void initializeAMDGPUPrepareAGPRAllocLegacyPass(PassRegistry &);
+extern char &AMDGPUPrepareAGPRAllocLegacyID;
+
 void initializeAMDGPUReserveWWMRegsLegacyPass(PassRegistry &);
 extern char &AMDGPUReserveWWMRegsLegacyID;
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 250547acb1ee7..b6c6d927d0e89 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -114,6 +114,7 @@ MACHINE_FUNCTION_PASS("amdgpu-rewrite-partial-reg-uses", 
GCNRewritePartialRegUse
 MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", AMDGPUSetWavePriorityPass())
 MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", 
GCNPreRAOptimizationsPass())
 MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", 
AMDGPUPreloadKernArgPrologPass())
+MACHINE_FUNCTION_PASS("amdgpu-prepare-agpr-alloc", 
AMDGPUPrepareAGPRAllocPass())
 MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass())
 MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass())
 MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
new file mode 100644
index 0..63a21f8cdba4c
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
@@ -0,0 +1,108 @@
+//===-- AMDGPUPrepareAGPRAlloc.cpp 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Make simple transformations to relax register constraints for cases which 
can
+// allocate to AGPRs or VGPRs. Replace materialize of inline immediates into
+// AGPR or VGPR with a pseudo with an AV_* class register constraint. This
+// allows later passes to inflate the register class if necessary. The register
+// allocator does not know to replace instructions to relax constraints.
+//
+//===--===//
+
+#include "AMDGPUPrepareAGPRAlloc.h"
+#include "AMDGPU.h"
+#include "GCNSubtarget.h"
+#include "SIMachineFunctionInfo.h"
+#include "SIRegist

[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)

2025-07-17 Thread Stanislav Mekhanoshin via llvm-branch-commits

https://github.com/rampitec approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/149292
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)

2025-07-17 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/149292

If we have a v_mov_b32 or v_accvgpr_write_b32 with an inline immediate,
replace it with a pseudo which writes to the combined AV_* class. This
relaxes the operand constraints, which will allow the allocator to
inflate the register class to AV_* to potentially avoid spilling.

The allocator does not know how to replace an instruction to enable
the change of register class. I originally tried to do this by changing
all of the places we introduce v_mov_b32 with immediate, but it's along
tail of niche cases that require manual updating. Plus we can restrict
this to only run on functions where we know we will be allocating AGPRs.

>From f46e89e232948948cc6646a7e6d8adab5c278f94 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 17 Jul 2025 15:50:43 +0900
Subject: [PATCH] AMDGPU: Add pass to replace constant materialize with AV
 pseudos

If we have a v_mov_b32 or v_accvgpr_write_b32 with an inline immediate,
replace it with a pseudo which writes to the combined AV_* class. This
relaxes the operand constraints, which will allow the allocator to
inflate the register class to AV_* to potentially avoid spilling.

The allocator does not know how to replace an instruction to enable
the change of register class. I originally tried to do this by changing
all of the places we introduce v_mov_b32 with immediate, but it's along
tail of niche cases that require manual updating. Plus we can restrict
this to only run on functions where we know we will be allocating AGPRs.
---
 llvm/lib/Target/AMDGPU/AMDGPU.h   |   3 +
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def |   1 +
 .../Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp  | 108 ++
 .../Target/AMDGPU/AMDGPUPrepareAGPRAlloc.h|  23 
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  13 +++
 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h  |   2 +
 llvm/lib/Target/AMDGPU/CMakeLists.txt |   1 +
 llvm/lib/Target/AMDGPU/SIInstrInfo.h  |   1 -
 llvm/test/CodeGen/AMDGPU/agpr-remat.ll|  18 +--
 .../AMDGPU/amdgpu-prepare-agpr-alloc.mir  |  95 +++
 .../branch-folding-implicit-def-subreg.ll |  46 
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  |   4 +-
 llvm/test/CodeGen/AMDGPU/llc-pipeline.ll  |   4 +
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll  |   4 +-
 .../CodeGen/AMDGPU/no-fold-accvgpr-mov.ll |  10 +-
 .../CodeGen/AMDGPU/no-fold-accvgpr-mov.mir|  28 ++---
 .../CodeGen/AMDGPU/no-fold-accvgpr-read.mir   |  26 ++---
 ...al-regcopy-and-spill-missed-at-regalloc.ll |  20 ++--
 .../CodeGen/AMDGPU/spill-vector-superclass.ll |   6 +-
 19 files changed, 330 insertions(+), 83 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
 create mode 100644 llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.h
 create mode 100644 llvm/test/CodeGen/AMDGPU/amdgpu-prepare-agpr-alloc.mir

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 23f106a9c1d4d..007b481f84960 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -153,6 +153,9 @@ struct AMDGPULowerBufferFatPointersPass
   const TargetMachine &TM;
 };
 
+void initializeAMDGPUPrepareAGPRAllocLegacyPass(PassRegistry &);
+extern char &AMDGPUPrepareAGPRAllocLegacyID;
+
 void initializeAMDGPUReserveWWMRegsLegacyPass(PassRegistry &);
 extern char &AMDGPUReserveWWMRegsLegacyID;
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 250547acb1ee7..b6c6d927d0e89 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -114,6 +114,7 @@ MACHINE_FUNCTION_PASS("amdgpu-rewrite-partial-reg-uses", 
GCNRewritePartialRegUse
 MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", AMDGPUSetWavePriorityPass())
 MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", 
GCNPreRAOptimizationsPass())
 MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", 
AMDGPUPreloadKernArgPrologPass())
+MACHINE_FUNCTION_PASS("amdgpu-prepare-agpr-alloc", 
AMDGPUPrepareAGPRAllocPass())
 MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass())
 MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass())
 MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
new file mode 100644
index 0..63a21f8cdba4c
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
@@ -0,0 +1,108 @@
+//===-- AMDGPUPrepareAGPRAlloc.cpp 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--

[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)

2025-07-17 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

If we have a v_mov_b32 or v_accvgpr_write_b32 with an inline immediate,
replace it with a pseudo which writes to the combined AV_* class. This
relaxes the operand constraints, which will allow the allocator to
inflate the register class to AV_* to potentially avoid spilling.

The allocator does not know how to replace an instruction to enable
the change of register class. I originally tried to do this by changing
all of the places we introduce v_mov_b32 with immediate, but it's along
tail of niche cases that require manual updating. Plus we can restrict
this to only run on functions where we know we will be allocating AGPRs.

---

Patch is 69.33 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/149292.diff


19 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+3) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def (+1) 
- (added) llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp (+108) 
- (added) llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.h (+23) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+13) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h (+2) 
- (modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1) 
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (-1) 
- (modified) llvm/test/CodeGen/AMDGPU/agpr-remat.ll (+9-9) 
- (added) llvm/test/CodeGen/AMDGPU/amdgpu-prepare-agpr-alloc.mir (+95) 
- (modified) llvm/test/CodeGen/AMDGPU/branch-folding-implicit-def-subreg.ll 
(+23-23) 
- (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+4) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/no-fold-accvgpr-mov.ll (+4-6) 
- (modified) llvm/test/CodeGen/AMDGPU/no-fold-accvgpr-mov.mir (+14-14) 
- (modified) llvm/test/CodeGen/AMDGPU/no-fold-accvgpr-read.mir (+13-13) 
- (modified) 
llvm/test/CodeGen/AMDGPU/partial-regcopy-and-spill-missed-at-regalloc.ll 
(+10-10) 
- (modified) llvm/test/CodeGen/AMDGPU/spill-vector-superclass.ll (+3-3) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 23f106a9c1d4d..007b481f84960 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -153,6 +153,9 @@ struct AMDGPULowerBufferFatPointersPass
   const TargetMachine &TM;
 };
 
+void initializeAMDGPUPrepareAGPRAllocLegacyPass(PassRegistry &);
+extern char &AMDGPUPrepareAGPRAllocLegacyID;
+
 void initializeAMDGPUReserveWWMRegsLegacyPass(PassRegistry &);
 extern char &AMDGPUReserveWWMRegsLegacyID;
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 250547acb1ee7..b6c6d927d0e89 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -114,6 +114,7 @@ MACHINE_FUNCTION_PASS("amdgpu-rewrite-partial-reg-uses", 
GCNRewritePartialRegUse
 MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", AMDGPUSetWavePriorityPass())
 MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", 
GCNPreRAOptimizationsPass())
 MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", 
AMDGPUPreloadKernArgPrologPass())
+MACHINE_FUNCTION_PASS("amdgpu-prepare-agpr-alloc", 
AMDGPUPrepareAGPRAllocPass())
 MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass())
 MACHINE_FUNCTION_PASS("amdgpu-wait-sgpr-hazards", AMDGPUWaitSGPRHazardsPass())
 MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
new file mode 100644
index 0..63a21f8cdba4c
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPrepareAGPRAlloc.cpp
@@ -0,0 +1,108 @@
+//===-- AMDGPUPrepareAGPRAlloc.cpp 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Make simple transformations to relax register constraints for cases which 
can
+// allocate to AGPRs or VGPRs. Replace materialize of inline immediates into
+// AGPR or VGPR with a pseudo with an AV_* class register constraint. This
+// allows later passes to inflate the register class if necessary. The register
+// allocator does not know to replace instructions to relax constraints.
+//
+//===--===//
+
+#include "AMDGPUPrepareAGPRAlloc.h"
+#include "AMDGPU.h"
+#include "GCNSubtarget.h"
+#include "SIMachineFunctionInfo.h"
+#include "SIRegisterInfo.h"
+#include "llvm/CodeGen/LiveIntervals.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/InitializePasses

[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)

2025-07-17 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/149292
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add pass to replace constant materialize with AV pseudos (PR #149292)

2025-07-17 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/149292?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#149292** https://app.graphite.dev/github/pr/llvm/llvm-project/149292?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/149292?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#149291** https://app.graphite.dev/github/pr/llvm/llvm-project/149291?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#149099** https://app.graphite.dev/github/pr/llvm/llvm-project/149099?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/149292
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits