macurtis-amd wrote:
> To make the question more precise, would it make sense to change AMDGPUs
> unaligned access hook such that https://godbolt.org/z/Gv1j4vjqE will look the
> same as on X86?
>
> That should also fix the motivating problem here.
I'll investigate this as an alternative. Thank
nikic wrote:
To make the question more precise, would it make sense to change AMDGPUs
unaligned access hook such that https://godbolt.org/z/Gv1j4vjqE will look the
same as on X86?
That should also fix the motivating problem here.
https://github.com/llvm/llvm-project/pull/133301
__
arsenm wrote:
> and that's why you want to have single element loads that get inserted into a
> vector and then perform sub-vector extracts on it?
We care about eliminate stack usage at any cost. Big load are always better,
even if not naturally aligned. When we do care about alignment, we onl
arsenm wrote:
> We do support unaligned access IIUC, and it is actually required by HSA ABI.
> CC @arsenm
Unaligned access has been supported since almost always. There's just a mode
control for it, you can disable it in the driver though
https://github.com/llvm/llvm-project/pull/133301
__
shiltian wrote:
> I assume AMDGPU does not support unaligned loads
We do support unaligned access IIUC, and it is actually required by HSA ABI. CC
@arsenm
https://github.com/llvm/llvm-project/pull/133301
___
cfe-commits mailing list
cfe-commits@list
nikic wrote:
On x86, what we actually end up doing is to combine those to unaligned i64
loads (see https://godbolt.org/z/P5z674x4r), which is probably the best outcome
if they are supported. I assume AMDGPU does not support unaligned loads, and
that's why you want to have single element loads
https://github.com/ronlieb approved this pull request.
LGTM, probably needs a rebase
https://github.com/llvm/llvm-project/pull/133301
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
macurtis-amd wrote:
@nikic ping
https://github.com/llvm/llvm-project/pull/133301
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
macurtis-amd wrote:
@nikic ping
or if anyone else (@davemgreen @kazutakahirata?) has time, a review would be
much appreciated
https://github.com/llvm/llvm-project/pull/133301
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.o
macurtis-amd wrote:
@nikic ping ping ping
https://github.com/llvm/llvm-project/pull/133301
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
macurtis-amd wrote:
@nikic ping ping
Are you okay with the latest revision?
(Thanks for reviewing this. Apologies if I'm being annoying).
https://github.com/llvm/llvm-project/pull/133301
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://l
macurtis-amd wrote:
@nikic ping
https://github.com/llvm/llvm-project/pull/133301
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
macurtis-amd wrote:
@nikic Latest revision only promotes memsets where the region being set is
within an existing vector in the underlying alloca (as determined by
getTypePartition).
It is much more constrained as you can see by the much smaller number of tests
that are now affected, while st
macurtis-amd wrote:
I should preface this by mentioning that I'm not all that familiar with SROA,
so thank you for your patience.
> So if I understand correctly, you are marking memsets as unsplittable,
> lowering them to vector zero and smaller accesses to inserts/extracts.
>
Yes. As a naive
nikic wrote:
> > I don't think your general approach here is going to work. We need to be
> > careful about introducing vector operations out of thin air, because LLVM
> > is not going to second guess them. If you convert a memset to <32768 x i8>
> > ops here, LLVM is going to carry those all
@@ -1170,10 +1204,17 @@ class AllocaSlices::SliceBuilder : public
PtrUseVisitor {
if (!IsOffsetKnown)
return PI.setAborted(&II);
+bool Splittable;
+
+if (getVectorTypeFor(II, DL))
+ Splittable = isSplittableMemOp(AS.AI.getAllocatedType(),
II.isVolatile
https://github.com/nikic requested changes to this pull request.
So if I understand correctly, you are marking memsets as unsplittable, lowering
them to vector zero and smaller accesses to inserts/extracts.
I don't think your general approach here is going to work. We need to be
careful about
https://github.com/nikic edited https://github.com/llvm/llvm-project/pull/133301
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
macurtis-amd wrote:
@nikic ping
https://github.com/llvm/llvm-project/pull/133301
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
@@ -1011,6 +1011,31 @@ static Value *foldPHINodeOrSelectInst(Instruction &I) {
return foldSelectInst(cast(I));
}
+/// Returns a fixed vector type equivalent to the memory set by II or nullptr
if
+/// unable to do so.
+static FixedVectorType *getVectorTypeFor(const MemSetIns
https://github.com/macurtis-amd updated
https://github.com/llvm/llvm-project/pull/133301
>From 30b4ca3cc90ab043db7abb336d028e8bb8b71dcf Mon Sep 17 00:00:00 2001
From: Matthew Curtis
Date: Thu, 27 Mar 2025 14:04:40 -0500
Subject: [PATCH 1/4] [SROA] Vector promote some memsets
---
clang/test/Co
macurtis-amd wrote:
> The tests also don't make any sense to me, seeing as they compile down to a
> constant, unused insertelement.
@nikic I cleaned up the test function a bit, again starting from the original
example. They exercise the new code paths, though they still result in an
unused in
https://github.com/macurtis-amd updated
https://github.com/llvm/llvm-project/pull/133301
>From 51dd5f27b9a09851c746466f02821087305ad93c Mon Sep 17 00:00:00 2001
From: Matthew Curtis
Date: Thu, 27 Mar 2025 14:04:40 -0500
Subject: [PATCH 1/5] [SROA] Vector promote some memsets
---
clang/test/Co
macurtis-amd wrote:
Thanks Nikita for taking a look at this.
> Missing PR description.
I've updated the PR description.
> The tests also don't make any sense to me, seeing as they compile down to a
> constant, unused insertelement.
Yes. Tests are very synthetic. Reduced from the already red
https://github.com/macurtis-amd edited
https://github.com/llvm/llvm-project/pull/133301
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
https://github.com/nikic requested changes to this pull request.
Missing PR description. The tests also don't make any sense to me, seeing as
they compile down to a constant, unused insertelement.
https://github.com/llvm/llvm-project/pull/133301
___
c
macurtis-amd wrote:
@arsenm Are you okay with the latest revision?
https://github.com/llvm/llvm-project/pull/133301
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
arsenm wrote:
> @arsenm Any recommendations for appeasing the [undef
> deprecator](https://github.com/llvm/llvm-project/pull/133301#issuecomment-2759145421)?
I don't think you did anything other than update existing tests, I would ignore
it for the purposes of this change
https://github.com/
macurtis-amd wrote:
@arsenm Any recommendations for appeasing the [undef
deprecator](https://github.com/llvm/llvm-project/pull/133301#issuecomment-2759145421)?
https://github.com/llvm/llvm-project/pull/133301
___
cfe-commits mailing list
cfe-commits@l
@@ -1011,6 +1011,31 @@ static Value *foldPHINodeOrSelectInst(Instruction &I) {
return foldSelectInst(cast(I));
}
+/// Returns a fixed vector type equivalent to the memory set by II or nullptr
if
+/// unable to do so.
+static FixedVectorType *getVectorTypeFor(const MemSetIns
@@ -0,0 +1,124 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes='sroa' -S | FileCheck %s
+target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n8:16:3
@@ -1011,6 +1011,31 @@ static Value *foldPHINodeOrSelectInst(Instruction &I) {
return foldSelectInst(cast(I));
}
+/// Returns a fixed vector type equivalent to the memory set by II or nullptr
if
+/// unable to do so.
+static FixedVectorType *getVectorTypeFor(const MemSetIns
@@ -0,0 +1,124 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes='sroa' -S | FileCheck %s
+target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n8:16:3
@@ -1011,6 +1011,31 @@ static Value *foldPHINodeOrSelectInst(Instruction &I) {
return foldSelectInst(cast(I));
}
+/// Returns a fixed vector type equivalent to the memory set by II or nullptr
if
+/// unable to do so.
+static FixedVectorType *getVectorTypeFor(const MemSetIns
@@ -1011,6 +1011,31 @@ static Value *foldPHINodeOrSelectInst(Instruction &I) {
return foldSelectInst(cast(I));
}
+/// Returns a fixed vector type equivalent to the memory set by II or nullptr
if
+/// unable to do so.
+static FixedVectorType *getVectorTypeFor(const MemSetIns
@@ -1011,6 +1011,31 @@ static Value *foldPHINodeOrSelectInst(Instruction &I) {
return foldSelectInst(cast(I));
}
+/// Returns a fixed vector type equivalent to the memory set by II or nullptr
if
+/// unable to do so.
+static FixedVectorType *getVectorTypeFor(const MemSetIns
@@ -0,0 +1,124 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes='sroa' -S | FileCheck %s
+target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n8:16:3
@@ -1011,6 +1011,31 @@ static Value *foldPHINodeOrSelectInst(Instruction &I) {
return foldSelectInst(cast(I));
}
+/// Returns a fixed vector type equivalent to the memory set by II or nullptr
if
+/// unable to do so.
+static FixedVectorType *getVectorTypeFor(const MemSetIns
@@ -0,0 +1,124 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes='sroa' -S | FileCheck %s
+target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n8:16:3
@@ -1011,6 +1011,31 @@ static Value *foldPHINodeOrSelectInst(Instruction &I) {
return foldSelectInst(cast(I));
}
+/// Returns a fixed vector type equivalent to the memory set by II or nullptr
if
+/// unable to do so.
+static FixedVectorType *getVectorTypeFor(const MemSetIns
@@ -1170,10 +1191,23 @@ class AllocaSlices::SliceBuilder : public
PtrUseVisitor {
if (!IsOffsetKnown)
return PI.setAborted(&II);
+auto IsSplittable = [&]() {
macurtis-amd wrote:
Cleaned up code removes the helper altogether. Hopefully better no
macurtis-amd wrote:
> Missing new tests? I'd expect to see a few new targeted tests stressing
> different vector sizes and alignments, and not just updates of existing tests
Added a new test.
Thanks for the review!
https://github.com/llvm/llvm-project/pull/133301
_
@@ -2316,12 +2362,15 @@ static VectorType *isVectorPromotionViable(Partition
&P, const DataLayout &DL) {
// Put load and store types into a set for de-duplication.
for (const Slice &S : P) {
-Type *Ty;
+Type *Ty = nullptr;
if (auto *LI = dyn_cast(S.getUse()->
@@ -1170,10 +1191,23 @@ class AllocaSlices::SliceBuilder : public
PtrUseVisitor {
if (!IsOffsetKnown)
return PI.setAborted(&II);
+auto IsSplittable = [&]() {
+ FixedVectorType *VTy = getVectorTypeFor(II, DL);
+ Type *ATy = AS.AI.getAllocatedType();
+
+
@@ -1011,6 +1011,26 @@ static Value *foldPHINodeOrSelectInst(Instruction &I) {
return foldSelectInst(cast(I));
}
+/// Returns a fixed vector type equivalent to the memory set by II or nullptr
if
+/// unable to do so.
+static FixedVectorType *getVectorTypeFor(const MemSetIns
https://github.com/macurtis-amd updated
https://github.com/llvm/llvm-project/pull/133301
>From c0525fd7bd2c740b5b969e8e2913a878792a377c Mon Sep 17 00:00:00 2001
From: Matthew Curtis
Date: Thu, 27 Mar 2025 14:04:40 -0500
Subject: [PATCH 1/3] [SROA] Vector promote some memsets
---
clang/test/Co
https://github.com/macurtis-amd updated
https://github.com/llvm/llvm-project/pull/133301
>From c0525fd7bd2c740b5b969e8e2913a878792a377c Mon Sep 17 00:00:00 2001
From: Matthew Curtis
Date: Thu, 27 Mar 2025 14:04:40 -0500
Subject: [PATCH 1/2] [SROA] Vector promote some memsets
---
clang/test/Co
llvmbot wrote:
@llvm/pr-subscribers-clang
Author: None (macurtis-amd)
Changes
---
Patch is 32.26 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/133301.diff
7 Files Affected:
- (modified) clang/test/CodeGenOpenCL/amdgpu-nullptr.cl (+4-6)
-
https://github.com/macurtis-amd created
https://github.com/llvm/llvm-project/pull/133301
None
>From 096fde6a6eda7ba8d09444a0558f04c3c4531017 Mon Sep 17 00:00:00 2001
From: Matthew Curtis
Date: Thu, 27 Mar 2025 14:04:40 -0500
Subject: [PATCH] [SROA] Vector promote some memsets
---
clang/test/
github-actions[bot] wrote:
:warning: undef deprecator found issues in your code. :warning:
You can test this locally with the following command:
``bash
git diff -U0 --pickaxe-regex -S
'([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)'
08aedf7201e296af532575685372bb5ff7ed8b01
llvmbot wrote:
@llvm/pr-subscribers-backend-amdgpu
Author: None (macurtis-amd)
Changes
---
Patch is 32.26 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/133301.diff
7 Files Affected:
- (modified) clang/test/CodeGenOpenCL/amdgpu-nullptr.cl
51 matches
Mail list logo