[llvm-branch-commits] [llvm] AMDGPU: Reduce cost of f64 copysign (PR #141944)
arsenm wrote: ### Merge activity * **Jun 17, 10:54 PM UTC**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/141944). https://github.com/llvm/llvm-project/pull/141944 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Reduce cost of f64 copysign (PR #141944)
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/141944
>From 641ab37922230a88206b08d07b76df77c9d82512 Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Thu, 29 May 2025 15:20:50 +0200
Subject: [PATCH] AMDGPU: Reduce cost of f64 copysign
The real implementation is 1 real instruction plus a constant
materialize. Call that a 1, it's not a real f64 operation.
---
.../AMDGPU/AMDGPUTargetTransformInfo.cpp | 12 ---
.../Analysis/CostModel/AMDGPU/copysign.ll | 32 +--
2 files changed, 23 insertions(+), 21 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index b2b25ac66677e..b79c9be3eac93 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -718,9 +718,6 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
MVT::SimpleValueType SLT = LT.second.getScalarType().SimpleTy;
- if (SLT == MVT::f64)
-return LT.first * NElts * get64BitInstrCost(CostKind);
-
if ((ST->hasVOP3PInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
(ST->hasPackedFP32Ops() && SLT == MVT::f32))
NElts = (NElts + 1) / 2;
@@ -731,6 +728,11 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
switch (ICA.getID()) {
case Intrinsic::fma:
case Intrinsic::fmuladd:
+if (SLT == MVT::f64) {
+ InstRate = get64BitInstrCost(CostKind);
+ break;
+}
+
if ((SLT == MVT::f32 && ST->hasFastFMAF32()) || SLT == MVT::f16)
InstRate = getFullRateInstrCost();
else {
@@ -741,8 +743,8 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
case Intrinsic::copysign:
return NElts * getFullRateInstrCost();
case Intrinsic::canonicalize: {
-assert(SLT != MVT::f64);
-InstRate = getFullRateInstrCost();
+InstRate =
+SLT == MVT::f64 ? get64BitInstrCost(CostKind) : getFullRateInstrCost();
break;
}
case Intrinsic::uadd_sat:
diff --git a/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
b/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
index 334bb341a3c3e..5b042a8a04603 100644
--- a/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
+++ b/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
@@ -245,25 +245,25 @@ define void @copysign_bf16() {
define void @copysign_f64() {
; ALL-LABEL: 'copysign_f64'
-; ALL-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %f64 =
call double @llvm.copysign.f64(double undef, double undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v2f64
= call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v3f64
= call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4f64
= call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %v5f64
= call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %v8f64
= call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 256 for instruction:
%v9f64 = call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x
double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 320 for instruction:
%v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x
double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f64 =
call double @llvm.copysign.f64(double undef, double undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2f64
= call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v3f64
= call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4f64
= call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v5f64
= call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v8f64
= call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v9f64
= call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction:
%v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x
double> undef
[llvm-branch-commits] [llvm] AMDGPU: Reduce cost of f64 copysign (PR #141944)
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/141944
>From 641ab37922230a88206b08d07b76df77c9d82512 Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Thu, 29 May 2025 15:20:50 +0200
Subject: [PATCH] AMDGPU: Reduce cost of f64 copysign
The real implementation is 1 real instruction plus a constant
materialize. Call that a 1, it's not a real f64 operation.
---
.../AMDGPU/AMDGPUTargetTransformInfo.cpp | 12 ---
.../Analysis/CostModel/AMDGPU/copysign.ll | 32 +--
2 files changed, 23 insertions(+), 21 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index b2b25ac66677e..b79c9be3eac93 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -718,9 +718,6 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
MVT::SimpleValueType SLT = LT.second.getScalarType().SimpleTy;
- if (SLT == MVT::f64)
-return LT.first * NElts * get64BitInstrCost(CostKind);
-
if ((ST->hasVOP3PInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
(ST->hasPackedFP32Ops() && SLT == MVT::f32))
NElts = (NElts + 1) / 2;
@@ -731,6 +728,11 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
switch (ICA.getID()) {
case Intrinsic::fma:
case Intrinsic::fmuladd:
+if (SLT == MVT::f64) {
+ InstRate = get64BitInstrCost(CostKind);
+ break;
+}
+
if ((SLT == MVT::f32 && ST->hasFastFMAF32()) || SLT == MVT::f16)
InstRate = getFullRateInstrCost();
else {
@@ -741,8 +743,8 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
case Intrinsic::copysign:
return NElts * getFullRateInstrCost();
case Intrinsic::canonicalize: {
-assert(SLT != MVT::f64);
-InstRate = getFullRateInstrCost();
+InstRate =
+SLT == MVT::f64 ? get64BitInstrCost(CostKind) : getFullRateInstrCost();
break;
}
case Intrinsic::uadd_sat:
diff --git a/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
b/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
index 334bb341a3c3e..5b042a8a04603 100644
--- a/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
+++ b/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
@@ -245,25 +245,25 @@ define void @copysign_bf16() {
define void @copysign_f64() {
; ALL-LABEL: 'copysign_f64'
-; ALL-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %f64 =
call double @llvm.copysign.f64(double undef, double undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v2f64
= call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v3f64
= call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4f64
= call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %v5f64
= call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %v8f64
= call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 256 for instruction:
%v9f64 = call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x
double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 320 for instruction:
%v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x
double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f64 =
call double @llvm.copysign.f64(double undef, double undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2f64
= call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v3f64
= call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4f64
= call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v5f64
= call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v8f64
= call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v9f64
= call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction:
%v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x
double> undef
[llvm-branch-commits] [llvm] AMDGPU: Reduce cost of f64 copysign (PR #141944)
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/141944
>From 0ddc81d117497e6caea3334f7e62ff1aa62f0e3a Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Thu, 29 May 2025 15:20:50 +0200
Subject: [PATCH] AMDGPU: Reduce cost of f64 copysign
The real implementation is 1 real instruction plus a constant
materialize. Call that a 1, it's not a real f64 operation.
---
.../AMDGPU/AMDGPUTargetTransformInfo.cpp | 12 ---
.../Analysis/CostModel/AMDGPU/copysign.ll | 32 +--
2 files changed, 23 insertions(+), 21 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index b2b25ac66677e..b79c9be3eac93 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -718,9 +718,6 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
MVT::SimpleValueType SLT = LT.second.getScalarType().SimpleTy;
- if (SLT == MVT::f64)
-return LT.first * NElts * get64BitInstrCost(CostKind);
-
if ((ST->hasVOP3PInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
(ST->hasPackedFP32Ops() && SLT == MVT::f32))
NElts = (NElts + 1) / 2;
@@ -731,6 +728,11 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
switch (ICA.getID()) {
case Intrinsic::fma:
case Intrinsic::fmuladd:
+if (SLT == MVT::f64) {
+ InstRate = get64BitInstrCost(CostKind);
+ break;
+}
+
if ((SLT == MVT::f32 && ST->hasFastFMAF32()) || SLT == MVT::f16)
InstRate = getFullRateInstrCost();
else {
@@ -741,8 +743,8 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
case Intrinsic::copysign:
return NElts * getFullRateInstrCost();
case Intrinsic::canonicalize: {
-assert(SLT != MVT::f64);
-InstRate = getFullRateInstrCost();
+InstRate =
+SLT == MVT::f64 ? get64BitInstrCost(CostKind) : getFullRateInstrCost();
break;
}
case Intrinsic::uadd_sat:
diff --git a/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
b/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
index 334bb341a3c3e..5b042a8a04603 100644
--- a/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
+++ b/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
@@ -245,25 +245,25 @@ define void @copysign_bf16() {
define void @copysign_f64() {
; ALL-LABEL: 'copysign_f64'
-; ALL-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %f64 =
call double @llvm.copysign.f64(double undef, double undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v2f64
= call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v3f64
= call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4f64
= call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %v5f64
= call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %v8f64
= call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 256 for instruction:
%v9f64 = call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x
double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 320 for instruction:
%v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x
double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f64 =
call double @llvm.copysign.f64(double undef, double undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2f64
= call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v3f64
= call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4f64
= call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v5f64
= call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v8f64
= call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v9f64
= call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction:
%v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x
double> undef
[llvm-branch-commits] [llvm] AMDGPU: Reduce cost of f64 copysign (PR #141944)
@@ -741,8 +743,8 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
case Intrinsic::copysign:
return NElts * getFullRateInstrCost();
case Intrinsic::canonicalize: {
-assert(SLT != MVT::f64);
-InstRate = getFullRateInstrCost();
+InstRate =
+SLT == MVT::f64 ? get64BitInstrCost(CostKind) : getFullRateInstrCost();
break;
}
case Intrinsic::uadd_sat:
arsenm wrote:
They are only integer intrinsics
https://github.com/llvm/llvm-project/pull/141944
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Reduce cost of f64 copysign (PR #141944)
@@ -741,8 +743,8 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
case Intrinsic::copysign:
return NElts * getFullRateInstrCost();
case Intrinsic::canonicalize: {
-assert(SLT != MVT::f64);
-InstRate = getFullRateInstrCost();
+InstRate =
+SLT == MVT::f64 ? get64BitInstrCost(CostKind) : getFullRateInstrCost();
break;
}
case Intrinsic::uadd_sat:
Pierre-vh wrote:
are those cases below fine with handling f64 now?
https://github.com/llvm/llvm-project/pull/141944
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Reduce cost of f64 copysign (PR #141944)
llvmbot wrote:
@llvm/pr-subscribers-backend-amdgpu
Author: Matt Arsenault (arsenm)
Changes
The real implementation is 1 real instruction plus a constant
materialize. Call that a 1, it's not a real f64 operation.
---
Full diff: https://github.com/llvm/llvm-project/pull/141944.diff
2 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp (+7-5)
- (modified) llvm/test/Analysis/CostModel/AMDGPU/copysign.ll (+16-16)
``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 0dbaf7c548f89..c1ccc8f6798a6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -718,9 +718,6 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
MVT::SimpleValueType SLT = LT.second.getScalarType().SimpleTy;
- if (SLT == MVT::f64)
-return LT.first * NElts * get64BitInstrCost(CostKind);
-
if ((ST->hasVOP3PInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
(ST->hasPackedFP32Ops() && SLT == MVT::f32))
NElts = (NElts + 1) / 2;
@@ -731,6 +728,11 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
switch (ICA.getID()) {
case Intrinsic::fma:
case Intrinsic::fmuladd:
+if (SLT == MVT::f64) {
+ InstRate = get64BitInstrCost(CostKind);
+ break;
+}
+
if ((SLT == MVT::f32 && ST->hasFastFMAF32()) || SLT == MVT::f16)
InstRate = getFullRateInstrCost();
else {
@@ -741,8 +743,8 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
case Intrinsic::copysign:
return NElts * getFullRateInstrCost();
case Intrinsic::canonicalize: {
-assert(SLT != MVT::f64);
-InstRate = getFullRateInstrCost();
+InstRate =
+SLT == MVT::f64 ? get64BitInstrCost(CostKind) : getFullRateInstrCost();
break;
}
case Intrinsic::uadd_sat:
diff --git a/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
b/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
index 334bb341a3c3e..5b042a8a04603 100644
--- a/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
+++ b/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
@@ -245,25 +245,25 @@ define void @copysign_bf16() {
define void @copysign_f64() {
; ALL-LABEL: 'copysign_f64'
-; ALL-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %f64 =
call double @llvm.copysign.f64(double undef, double undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v2f64
= call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v3f64
= call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4f64
= call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %v5f64
= call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %v8f64
= call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 256 for instruction:
%v9f64 = call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x
double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 320 for instruction:
%v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x
double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f64 =
call double @llvm.copysign.f64(double undef, double undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2f64
= call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v3f64
= call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4f64
= call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v5f64
= call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v8f64
= call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v9f64
= call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction:
%v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x
double> undef)
; ALL-NEXT: Cost Model: Found an estimated cost of 10 for instruction: ret
void
;
; ALL-SIZE-LABEL: '
[llvm-branch-commits] [llvm] AMDGPU: Reduce cost of f64 copysign (PR #141944)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/141944 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Reduce cost of f64 copysign (PR #141944)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/141944?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#141948** https://app.graphite.dev/github/pr/llvm/llvm-project/141948?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141947** https://app.graphite.dev/github/pr/llvm/llvm-project/141947?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141946** https://app.graphite.dev/github/pr/llvm/llvm-project/141946?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141945** https://app.graphite.dev/github/pr/llvm/llvm-project/141945?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141944** https://app.graphite.dev/github/pr/llvm/llvm-project/141944?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/141944?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#141943** https://app.graphite.dev/github/pr/llvm/llvm-project/141943?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141904** https://app.graphite.dev/github/pr/llvm/llvm-project/141904?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#141903** https://app.graphite.dev/github/pr/llvm/llvm-project/141903?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/141944 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Reduce cost of f64 copysign (PR #141944)
https://github.com/arsenm created
https://github.com/llvm/llvm-project/pull/141944
The real implementation is 1 real instruction plus a constant
materialize. Call that a 1, it's not a real f64 operation.
>From 19ab42a4fdba866aa40da8e2cc24967a72f6f482 Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Thu, 29 May 2025 15:20:50 +0200
Subject: [PATCH] AMDGPU: Reduce cost of f64 copysign
The real implementation is 1 real instruction plus a constant
materialize. Call that a 1, it's not a real f64 operation.
---
.../AMDGPU/AMDGPUTargetTransformInfo.cpp | 12 ---
.../Analysis/CostModel/AMDGPU/copysign.ll | 32 +--
2 files changed, 23 insertions(+), 21 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 0dbaf7c548f89..c1ccc8f6798a6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -718,9 +718,6 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
MVT::SimpleValueType SLT = LT.second.getScalarType().SimpleTy;
- if (SLT == MVT::f64)
-return LT.first * NElts * get64BitInstrCost(CostKind);
-
if ((ST->hasVOP3PInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
(ST->hasPackedFP32Ops() && SLT == MVT::f32))
NElts = (NElts + 1) / 2;
@@ -731,6 +728,11 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
switch (ICA.getID()) {
case Intrinsic::fma:
case Intrinsic::fmuladd:
+if (SLT == MVT::f64) {
+ InstRate = get64BitInstrCost(CostKind);
+ break;
+}
+
if ((SLT == MVT::f32 && ST->hasFastFMAF32()) || SLT == MVT::f16)
InstRate = getFullRateInstrCost();
else {
@@ -741,8 +743,8 @@ GCNTTIImpl::getIntrinsicInstrCost(const
IntrinsicCostAttributes &ICA,
case Intrinsic::copysign:
return NElts * getFullRateInstrCost();
case Intrinsic::canonicalize: {
-assert(SLT != MVT::f64);
-InstRate = getFullRateInstrCost();
+InstRate =
+SLT == MVT::f64 ? get64BitInstrCost(CostKind) : getFullRateInstrCost();
break;
}
case Intrinsic::uadd_sat:
diff --git a/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
b/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
index 334bb341a3c3e..5b042a8a04603 100644
--- a/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
+++ b/llvm/test/Analysis/CostModel/AMDGPU/copysign.ll
@@ -245,25 +245,25 @@ define void @copysign_bf16() {
define void @copysign_f64() {
; ALL-LABEL: 'copysign_f64'
-; ALL-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %f64 =
call double @llvm.copysign.f64(double undef, double undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v2f64
= call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %v3f64
= call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v4f64
= call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %v5f64
= call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 96 for instruction: %v8f64
= call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 256 for instruction:
%v9f64 = call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x
double> undef)
-; ALL-NEXT: Cost Model: Found an estimated cost of 320 for instruction:
%v16f64 = call <16 x double> @llvm.copysign.v16f64(<16 x double> undef, <16 x
double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %f64 =
call double @llvm.copysign.f64(double undef, double undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2f64
= call <2 x double> @llvm.copysign.v2f64(<2 x double> undef, <2 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %v3f64
= call <3 x double> @llvm.copysign.v3f64(<3 x double> undef, <3 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %v4f64
= call <4 x double> @llvm.copysign.v4f64(<4 x double> undef, <4 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v5f64
= call <5 x double> @llvm.copysign.v5f64(<5 x double> undef, <5 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %v8f64
= call <8 x double> @llvm.copysign.v8f64(<8 x double> undef, <8 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %v9f64
= call <9 x double> @llvm.copysign.v9f64(<9 x double> undef, <9 x double> undef)
+; ALL-NEXT: Cost Model: Found an estimated
