[llvm-branch-commits] [llvm] AMDGPU: Fix cost model for 16-bit operations on gfx8 (PR #141943)
arsenm wrote: ### Merge activity * **Jun 17, 10:54 PM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/141943). https://github.com/llvm/llvm-project/pull/141943 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix cost model for 16-bit operations on gfx8 (PR #141943)
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/141943
>From 7fbe4e233098676cc2af8aaad48a1eb5f8cb360f Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Thu, 29 May 2025 14:41:33 +0200
Subject: [PATCH] AMDGPU: Fix cost model for 16-bit operations on gfx8
We should only divide the number of pieces to fit the packed instructions
if we actually have pk instructions. This increases the cost of copysign,
but is closer to the current codegen output. It could be much cheaper
than it is now.
---
.../AMDGPU/AMDGPUTargetTransformInfo.cpp | 2 +-
.../Analysis/CostModel/AMDGPU/canonicalize.ll | 24
.../Analysis/CostModel/AMDGPU/copysign.ll | 28 +--
.../SLPVectorizer/AMDGPU/slp-v2f16.ll | 12
4 files changed, 34 insertions(+), 32 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 58bfc0b80b24f..b2b25ac66677e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -721,7 +721,7 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
if (SLT == MVT::f64)
return LT.first * NElts * get64BitInstrCost(CostKind);
- if ((ST->has16BitInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
+ if ((ST->hasVOP3PInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
(ST->hasPackedFP32Ops() && SLT == MVT::f32))
NElts = (NElts + 1) / 2;
diff --git a/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
b/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
index e162edbf611e2..7ac4db3119210 100644
--- a/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
+++ b/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
@@ -22,12 +22,12 @@ define void @canonicalize_f16() {
;
; GFX8-LABEL: 'canonicalize_f16'
; GFX8-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 =
call half @llvm.canonicalize.f16(half undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f16
= call <2 x half> @llvm.canonicalize.v2f16(<2 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3f16
= call <3 x half> @llvm.canonicalize.v3f16(<3 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4f16
= call <4 x half> @llvm.canonicalize.v4f16(<4 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v5f16
= call <5 x half> @llvm.canonicalize.v5f16(<5 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 8 for instruction:
%v16f16 = call <16 x half> @llvm.canonicalize.v16f16(<16 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 48 for instruction:
%v17f16 = call <17 x half> @llvm.canonicalize.v17f16(<17 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2f16
= call <2 x half> @llvm.canonicalize.v2f16(<2 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v3f16
= call <3 x half> @llvm.canonicalize.v3f16(<3 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4f16
= call <4 x half> @llvm.canonicalize.v4f16(<4 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v5f16
= call <5 x half> @llvm.canonicalize.v5f16(<5 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 16 for instruction:
%v16f16 = call <16 x half> @llvm.canonicalize.v16f16(<16 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 96 for instruction:
%v17f16 = call <17 x half> @llvm.canonicalize.v17f16(<17 x half> undef)
; GFX8-NEXT: Cost Model: Found an estimated cost of 10 for instruction: ret
void
;
; GFX9-LABEL: 'canonicalize_f16'
@@ -62,12 +62,12 @@ define void @canonicalize_f16() {
;
; GFX8-SIZE-LABEL: 'canonicalize_f16'
; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%f16 = call half @llvm.canonicalize.f16(half undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%v2f16 = call <2 x half> @llvm.canonicalize.v2f16(<2 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v3f16 = call <3 x half> @llvm.canonicalize.v3f16(<3 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v4f16 = call <4 x half> @llvm.canonicalize.v4f16(<4 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction:
%v5f16 = call <5 x half> @llvm.canonicalize.v5f16(<5 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction:
%v16f16 = call <16 x half> @llvm.canonicalize.v16f16(<16 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 48 for instruction:
%v17f16 = call <17 x half> @llvm.canonicalize.v17f16(<17 x half> undef)
+; GFX8-SIZE-NEXT:
[llvm-branch-commits] [llvm] AMDGPU: Fix cost model for 16-bit operations on gfx8 (PR #141943)
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/141943
>From 7fbe4e233098676cc2af8aaad48a1eb5f8cb360f Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Thu, 29 May 2025 14:41:33 +0200
Subject: [PATCH] AMDGPU: Fix cost model for 16-bit operations on gfx8
We should only divide the number of pieces to fit the packed instructions
if we actually have pk instructions. This increases the cost of copysign,
but is closer to the current codegen output. It could be much cheaper
than it is now.
---
.../AMDGPU/AMDGPUTargetTransformInfo.cpp | 2 +-
.../Analysis/CostModel/AMDGPU/canonicalize.ll | 24
.../Analysis/CostModel/AMDGPU/copysign.ll | 28 +--
.../SLPVectorizer/AMDGPU/slp-v2f16.ll | 12
4 files changed, 34 insertions(+), 32 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 58bfc0b80b24f..b2b25ac66677e 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -721,7 +721,7 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
if (SLT == MVT::f64)
return LT.first * NElts * get64BitInstrCost(CostKind);
- if ((ST->has16BitInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
+ if ((ST->hasVOP3PInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
(ST->hasPackedFP32Ops() && SLT == MVT::f32))
NElts = (NElts + 1) / 2;
diff --git a/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
b/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
index e162edbf611e2..7ac4db3119210 100644
--- a/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
+++ b/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
@@ -22,12 +22,12 @@ define void @canonicalize_f16() {
;
; GFX8-LABEL: 'canonicalize_f16'
; GFX8-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 =
call half @llvm.canonicalize.f16(half undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f16
= call <2 x half> @llvm.canonicalize.v2f16(<2 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3f16
= call <3 x half> @llvm.canonicalize.v3f16(<3 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4f16
= call <4 x half> @llvm.canonicalize.v4f16(<4 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v5f16
= call <5 x half> @llvm.canonicalize.v5f16(<5 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 8 for instruction:
%v16f16 = call <16 x half> @llvm.canonicalize.v16f16(<16 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 48 for instruction:
%v17f16 = call <17 x half> @llvm.canonicalize.v17f16(<17 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2f16
= call <2 x half> @llvm.canonicalize.v2f16(<2 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v3f16
= call <3 x half> @llvm.canonicalize.v3f16(<3 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4f16
= call <4 x half> @llvm.canonicalize.v4f16(<4 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v5f16
= call <5 x half> @llvm.canonicalize.v5f16(<5 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 16 for instruction:
%v16f16 = call <16 x half> @llvm.canonicalize.v16f16(<16 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 96 for instruction:
%v17f16 = call <17 x half> @llvm.canonicalize.v17f16(<17 x half> undef)
; GFX8-NEXT: Cost Model: Found an estimated cost of 10 for instruction: ret
void
;
; GFX9-LABEL: 'canonicalize_f16'
@@ -62,12 +62,12 @@ define void @canonicalize_f16() {
;
; GFX8-SIZE-LABEL: 'canonicalize_f16'
; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%f16 = call half @llvm.canonicalize.f16(half undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%v2f16 = call <2 x half> @llvm.canonicalize.v2f16(<2 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v3f16 = call <3 x half> @llvm.canonicalize.v3f16(<3 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v4f16 = call <4 x half> @llvm.canonicalize.v4f16(<4 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction:
%v5f16 = call <5 x half> @llvm.canonicalize.v5f16(<5 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction:
%v16f16 = call <16 x half> @llvm.canonicalize.v16f16(<16 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 48 for instruction:
%v17f16 = call <17 x half> @llvm.canonicalize.v17f16(<17 x half> undef)
+; GFX8-SIZE-NEXT:
[llvm-branch-commits] [llvm] AMDGPU: Fix cost model for 16-bit operations on gfx8 (PR #141943)
llvmbot wrote:
@llvm/pr-subscribers-backend-amdgpu
Author: Matt Arsenault (arsenm)
Changes
We should only divide the number of pieces to fit the packed instructions
if we actually have pk instructions. This increases the cost of copysign,
but is closer to the current codegen output. It could be much cheaper
than it is now.
---
Patch is 137.25 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/141943.diff
6 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+1-1)
- (modified) llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll (+12-12)
- (modified) llvm/test/Analysis/CostModel/AMDGPU/copysign.ll (+14-14)
- (modified) llvm/test/Analysis/CostModel/AMDGPU/maximumnum.ll (+244-136)
- (modified) llvm/test/Analysis/CostModel/AMDGPU/minimumnum.ll (+244-136)
- (modified) llvm/test/Transforms/SLPVectorizer/AMDGPU/slp-v2f16.ll (+7-5)
``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 86a6e49fce027..0dbaf7c548f89 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -721,7 +721,7 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
if (SLT == MVT::f64)
return LT.first * NElts * get64BitInstrCost(CostKind);
- if ((ST->has16BitInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
+ if ((ST->hasVOP3PInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
(ST->hasPackedFP32Ops() && SLT == MVT::f32))
NElts = (NElts + 1) / 2;
diff --git a/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
b/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
index e162edbf611e2..7ac4db3119210 100644
--- a/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
+++ b/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
@@ -22,12 +22,12 @@ define void @canonicalize_f16() {
;
; GFX8-LABEL: 'canonicalize_f16'
; GFX8-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 =
call half @llvm.canonicalize.f16(half undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f16
= call <2 x half> @llvm.canonicalize.v2f16(<2 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3f16
= call <3 x half> @llvm.canonicalize.v3f16(<3 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4f16
= call <4 x half> @llvm.canonicalize.v4f16(<4 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v5f16
= call <5 x half> @llvm.canonicalize.v5f16(<5 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 8 for instruction:
%v16f16 = call <16 x half> @llvm.canonicalize.v16f16(<16 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 48 for instruction:
%v17f16 = call <17 x half> @llvm.canonicalize.v17f16(<17 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2f16
= call <2 x half> @llvm.canonicalize.v2f16(<2 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v3f16
= call <3 x half> @llvm.canonicalize.v3f16(<3 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4f16
= call <4 x half> @llvm.canonicalize.v4f16(<4 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v5f16
= call <5 x half> @llvm.canonicalize.v5f16(<5 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 16 for instruction:
%v16f16 = call <16 x half> @llvm.canonicalize.v16f16(<16 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 96 for instruction:
%v17f16 = call <17 x half> @llvm.canonicalize.v17f16(<17 x half> undef)
; GFX8-NEXT: Cost Model: Found an estimated cost of 10 for instruction: ret
void
;
; GFX9-LABEL: 'canonicalize_f16'
@@ -62,12 +62,12 @@ define void @canonicalize_f16() {
;
; GFX8-SIZE-LABEL: 'canonicalize_f16'
; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%f16 = call half @llvm.canonicalize.f16(half undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%v2f16 = call <2 x half> @llvm.canonicalize.v2f16(<2 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v3f16 = call <3 x half> @llvm.canonicalize.v3f16(<3 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v4f16 = call <4 x half> @llvm.canonicalize.v4f16(<4 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction:
%v5f16 = call <5 x half> @llvm.canonicalize.v5f16(<5 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction:
%v16f16 = call <16 x half> @llvm.canonicalize.v16f16(<16 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 48 f
[llvm-branch-commits] [llvm] AMDGPU: Fix cost model for 16-bit operations on gfx8 (PR #141943)
llvmbot wrote:
@llvm/pr-subscribers-llvm-analysis
Author: Matt Arsenault (arsenm)
Changes
We should only divide the number of pieces to fit the packed instructions
if we actually have pk instructions. This increases the cost of copysign,
but is closer to the current codegen output. It could be much cheaper
than it is now.
---
Patch is 137.25 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/141943.diff
6 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+1-1)
- (modified) llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll (+12-12)
- (modified) llvm/test/Analysis/CostModel/AMDGPU/copysign.ll (+14-14)
- (modified) llvm/test/Analysis/CostModel/AMDGPU/maximumnum.ll (+244-136)
- (modified) llvm/test/Analysis/CostModel/AMDGPU/minimumnum.ll (+244-136)
- (modified) llvm/test/Transforms/SLPVectorizer/AMDGPU/slp-v2f16.ll (+7-5)
``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 86a6e49fce027..0dbaf7c548f89 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -721,7 +721,7 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
if (SLT == MVT::f64)
return LT.first * NElts * get64BitInstrCost(CostKind);
- if ((ST->has16BitInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
+ if ((ST->hasVOP3PInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
(ST->hasPackedFP32Ops() && SLT == MVT::f32))
NElts = (NElts + 1) / 2;
diff --git a/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
b/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
index e162edbf611e2..7ac4db3119210 100644
--- a/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
+++ b/llvm/test/Analysis/CostModel/AMDGPU/canonicalize.ll
@@ -22,12 +22,12 @@ define void @canonicalize_f16() {
;
; GFX8-LABEL: 'canonicalize_f16'
; GFX8-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f16 =
call half @llvm.canonicalize.f16(half undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f16
= call <2 x half> @llvm.canonicalize.v2f16(<2 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3f16
= call <3 x half> @llvm.canonicalize.v3f16(<3 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4f16
= call <4 x half> @llvm.canonicalize.v4f16(<4 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v5f16
= call <5 x half> @llvm.canonicalize.v5f16(<5 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 8 for instruction:
%v16f16 = call <16 x half> @llvm.canonicalize.v16f16(<16 x half> undef)
-; GFX8-NEXT: Cost Model: Found an estimated cost of 48 for instruction:
%v17f16 = call <17 x half> @llvm.canonicalize.v17f16(<17 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2f16
= call <2 x half> @llvm.canonicalize.v2f16(<2 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v3f16
= call <3 x half> @llvm.canonicalize.v3f16(<3 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4f16
= call <4 x half> @llvm.canonicalize.v4f16(<4 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v5f16
= call <5 x half> @llvm.canonicalize.v5f16(<5 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 16 for instruction:
%v16f16 = call <16 x half> @llvm.canonicalize.v16f16(<16 x half> undef)
+; GFX8-NEXT: Cost Model: Found an estimated cost of 96 for instruction:
%v17f16 = call <17 x half> @llvm.canonicalize.v17f16(<17 x half> undef)
; GFX8-NEXT: Cost Model: Found an estimated cost of 10 for instruction: ret
void
;
; GFX9-LABEL: 'canonicalize_f16'
@@ -62,12 +62,12 @@ define void @canonicalize_f16() {
;
; GFX8-SIZE-LABEL: 'canonicalize_f16'
; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%f16 = call half @llvm.canonicalize.f16(half undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction:
%v2f16 = call <2 x half> @llvm.canonicalize.v2f16(<2 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v3f16 = call <3 x half> @llvm.canonicalize.v3f16(<3 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction:
%v4f16 = call <4 x half> @llvm.canonicalize.v4f16(<4 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction:
%v5f16 = call <5 x half> @llvm.canonicalize.v5f16(<5 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction:
%v16f16 = call <16 x half> @llvm.canonicalize.v16f16(<16 x half> undef)
-; GFX8-SIZE-NEXT: Cost Model: Found an estimated cost of 48 fo
[llvm-branch-commits] [llvm] AMDGPU: Fix cost model for 16-bit operations on gfx8 (PR #141943)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/141943 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Fix cost model for 16-bit operations on gfx8 (PR #141943)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/141943?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#141948** https://app.graphite.dev/github/pr/llvm/llvm-project/141948?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141947** https://app.graphite.dev/github/pr/llvm/llvm-project/141947?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141946** https://app.graphite.dev/github/pr/llvm/llvm-project/141946?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141945** https://app.graphite.dev/github/pr/llvm/llvm-project/141945?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141944** https://app.graphite.dev/github/pr/llvm/llvm-project/141944?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141943** https://app.graphite.dev/github/pr/llvm/llvm-project/141943?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/141943?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#141904** https://app.graphite.dev/github/pr/llvm/llvm-project/141904?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141903** https://app.graphite.dev/github/pr/llvm/llvm-project/141903?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/141943 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
