[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-07-14 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> 1. Have what is proposed here as an initial step, with the addition that we 
> issue warnings on unguarded uses of builtins / ASM (similar to what 
> `__builtin_available` / `@available` do), and we clean-up non-extern 
> functions that become unreachable as a consequence of predicate expansion 
> (i.e. `foo` can only be called from within this module, and it was only being 
> called from a predicate guarded block, which was removed);

Gentle ping given that the above has been added, with the caveat that warning 
on unguarded ASM hasn't been added yet, and the diagnostics are conservative & 
optimistic (we assume that if a guard exists it is correct, and do not do a 
feature check). This will be addressed in a subsequent patch which builds on 
what is now there, because we need some additional internal discussion on the 
AMD-side around the shape of these diagnostics, and because it would add some 
more girth to what is already a pretty large change.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-07-03 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> @efriedma-quic was kind enough to have a call where we discussed this a bit 
> more. I'll update tomorrow with a potential way forward, for the group's 
> consideration.

Following up, here's a possible approach to making progress, broken down in 
phases, (@efriedma-quic can correct me if I am misrepresenting any of these):

1. Have what is proposed here as an initial step, with the addition that we 
issue warnings on unguarded uses of builtins / ASM (similar to what 
`__builtin_available` / `@available` do), and we clean-up non-extern functions 
that become unreachable as a consequence of predicate expansion (i.e. `foo` can 
only be called from within this module, and it was only being called from a 
predicate guarded block, which was removed);
2. Add attribute based checking for predicate guarded areas:
 - Functions can be annotated either with the existing `target` attribute 
or with a new `target_can_invoke` (name up for bike-shedding) attribute;
 - Within a predicate guarded scope, if we encounter contradictions, e.g. 
we call a `target("gfx9000")` function, or a 
`target_can_invoke(builtin_only_on_gfx9000)`, within a 
`__builtin_amdgcn_processor_is("gfx8999")`, that is an error
 - This should reward users that go through the effort of annotating their 
functions, making it much harder to write bugs
 - I'm not entirely sure how to do this well yet (nested guarded regions, 
where to track the currently active guard etc.), and it probably needs a bit 
more design, hence why it's a different phase
 - It is a pre-requisite for any attempt at making these general, rather 
than target specific
3. In relation with generalisation, if we go in that direction (i.e. other 
targets are interested / we think there's merit into hoisting these into 
generic Clang builtins), we will have to look at whether or not we want a 
different IR representation (possibly / probably along the lines of what has 
been discussed here), for cases where a target must run some potentially 
disruptive optimisations before and cannot just do the expansion right after 
Clang.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-07-02 Thread Eli Friedman via cfe-commits

efriedma-quic wrote:

> we only know the concrete target when we are finalising, which happens at a 
> completely different time-point, on possibly a different machine;

This is precisely why we want the frontend diagnostic: if we diagnose the bad 
cases in the frontend, later stages are guaranteed to lower correctly.  If we 
diagnose later, you don't know you messed up until you get crash reports from 
your users.

> conversely, for the abstract case we are targeting a generic target which has 
> all the features, so at most we could be somewhat spammy and say "be sure to 
> wrap this in a __builtin_amdgcn_is_invocable call;

I prefer to think of it as a generic target which has none of the features.

Yes, you might have to take some time to annotate your code, but once you have 
the annotations, it catches a lot of potential mistakes.



In case nobody else has brought it up, we currently do the following on Arm, 
which is conceptually similar:

```
#include 
__attribute((target("sve"))) void f() {
  svbool_t x = svptrue_b8(); // Allowed
}
void f2() {
  svbool_t x = svptrue_b8(); // Error: SVE isn't enabled.
}
```

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-07-02 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> So your users today are building for generic AMDGPU but using builtins that 
> are only available on a specific processor release? Presumably that code is 
> protected _somehow_ and their programs are not simply crashing at runtime. Is 
> that something you'd be able to leverage at all, or is completely ad hoc?

They do crash at run time, except not in the way one would expect - they fail 
when finalising / JIT-ing from SPIRV, which is still a compiler failure, except 
it's a BE / ISEL one. But yes, this is a current problem (which this is 
addressing). Here's an example (there are others):

- client code uses some builtins that are only available on RDNA (GFX10+), `#if 
__has_builtin(something_something)`;
- when targeting AMDGCN-flavoured SPIRV (`amdgcnspirv`), the union of builtins 
is available, since we don't know what the concrete target will end up being, 
and we want to maximally leverage features, so the check is true and the RDNA 
builtin ends up in SPIRV;
- the compiled library / executable gets executed on a CDNA machine;
- depending the nature of the intrinsic a JIT-time error ensues.

What we would like to do is to allow people to handle these cases with a linear 
translation from the above into `if 
(__builtin_amdgcn_is_invocable(something_something)`, which then would lead to 
having code that works everywhere with maximum capability (we don't have to 
reduce things to a common subset), without having to be linear in targets. I 
mention the latter because our device libs (which are not upstream), deal with 
this via a different convoluted mechanism, since there was no amdgcnspirv / 
generic target at the time, which requires generating separate bitcode per 
target, which is not long term viable as we get more and more targets.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-07-02 Thread John McCall via cfe-commits

rjmccall wrote:

Right, but the code still contains some kind of check to decide which version 
of the library function to call. Maybe it's implicit by multiversioned 
functions or something, but there's *something* in the source code, because if 
the user just writes a kernel that does nothing but call a builtin that's not 
always available, you've got a problem. My question is just whether that's 
something we can leverage.

And if these checks are all done in the library, the library can of course just 
be annotated.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-07-02 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> So your users today are building for generic AMDGPU but using builtins that 
> are only available on a specific processor release? Presumably that code is 
> protected _somehow_ and their programs are not simply crashing at runtime. Is 
> that something you'd be able to leverage at all, or is completely ad hoc?

This is basically how the NVIDIA device library and ROCm device library already 
work. In the latter case we just accept that `globalopt,dce` is required to 
clean that up after injecting the library code into the user's application. 
Part of this formalizes that.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-07-02 Thread John McCall via cfe-commits

rjmccall wrote:

So your users today are building for generic AMDGPU but using builtins that are 
only available on a specific processor release? Presumably that code is 
protected *somehow* and their programs are not simply crashing at runtime. Is 
that something you'd be able to leverage at all, or is completely ad hoc?

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-07-02 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> I think Eli is suggesting something like the rule for 
> [@available](https://clang.llvm.org/docs/LanguageExtensions.html#objective-c-available):
> 
> * If a builtin is unconditionally available, you can always use it without 
> any diagnostics.
> * If a builtin is only available on specific subtargets, you can only use it 
> in a block that's guarded by an `if (some_intrinsic(...))` condition that 
> will only pass if you're running on one of those subtargets.
> 
> So it's not that adding a check for the builtin will suddenly cause 
> un-checked calls to it to fail, it's that you have to have such a check to 
> use it in the first place, which makes sense because it's not always 
> available on the target.
> 

This is interesting, and I had / have looked at `@available` (albeit I am far 
from being a scholar on the topic). It makes logical sense, but I expect users 
will simply ignore it since it is very restrictive if we go with **you have to 
have a check to use a builtin**. It's not as if all builtin uses today which 
are present in user code are guarded by `__has_builtin` or by a check against 
an architecture macro. I will also note that as far as I can see 
__builtin_available, which we also provide for C & C++, at most warns 
, with the warning being opt-in. It also 
does not appear to generate any sort of special IR construct, it's just sugar 
over a call to a runtime provided interface, AFAICT. Furthermore, unlike for 
`__builtin_available`, there's no immediately apparent way to provide an useful 
warning here:

- if we're compiling for concrete we already know which builtins are available 
/ what target is present, so whether something is legal or not is fully 
determined;
- conversely, for the abstract case we are targeting a generic target which has 
all the features, so at most we could be somewhat spammy and say "be sure to 
wrap this in a `__builtin_amdgcn_is_invocable` call;
- this only covers a subset of cases, since there are also e.g. per target 
resource allocation choices, so now we have to hoist into Clang even more 
architecture details such as the size of shared memory i.e. we'd have to warn;
- this'd probably balloon into a non-trivial amount of checking (think the Sema 
footprint for `@available` is not exactly petite), we'd still at most get to 
warn, and would still have to run the IR pass, which is actually in a position 
to correctly diagnose an error state.

If the added warning is considered I can loot at adding that but I think that 
should be a separate patch / conversation since it'd mess with established 
builtin behaviour (as mentioned, one can reach for an unguarded builtin today 
without any additional diagnostics / invitation to touch `__has_builtin`, and 
there are examples where builtins that the FE believes work are actually not 
available on a target, see, for example, the math ones).

> Note that the `@available` model also includes an attribute for marking your 
> own functions as conditionally available, in which case (1) the entire body 
> of the function is considered guarded but (2) you have to guard calls to it. 
> That might be a necessary enhancement for your feature, too.

Unless I am missing something obvious, this brings us dangerously close to 
re-inventing language subsetting / restrictions, that are already present in 
single-source(-ish) offload languages. It ends up being `__device__` / 
`restrict(amp)` / `omp declare target` in slightly different clothes. I don't 
think that is a desirable effect here. At least for our use case (which is what 
these are trying to support), we need a mechanism that just works with what is 
already there, and can be directly used from C / C++, with existing C / C++ 
codebases i.e. has to work with what our library authors have, using idioms 
they are familiar with.


https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-07-02 Thread John McCall via cfe-commits

rjmccall wrote:

I think Eli is suggesting something like the rule for 
[@available](https://clang.llvm.org/docs/LanguageExtensions.html#objective-c-available):
- If a builtin is unconditionally available, you can always use it without any 
diagnostics.
- If a builtin is only available on specific subtargets, you can only use it in 
a block that's guarded by an `if (some_intrinsic(...))` condition that will 
only pass if you're running on one of those subtargets.

So it's not that adding a check for the builtin will suddenly cause un-checked 
calls to it to fail, it's that you have to have such a check to use it in the 
first place, which makes sense because it's not always available on the target.

Note that the `@available` model also includes an attribute for marking your 
own functions as conditionally available, in which case (1) the entire body of 
the function is considered guarded but (2) you have to guard calls to it. That 
might be a necessary enhancement for your feature, too.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-07-02 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> > Definitely, more than happy to have a 1-on-1 (2-on-1 even, since I think 
> > @AaronBallman also suggested something along these lines as well :) ).
> 
> Please email me with some times that will work for you.
> 
> > We've just made the call to foo() illegal on anything that is not gfx9000
> 
> I... don't think I'm suggesting this? The fact that a call to foo() from a 
> __builtin_amdgcn_processor_is block shouldn't imply anything about other 
> calls to foo().
> 

Perhaps I am misunderstanding, case in which I apologise. I started from: "We 
can tell, statically, that the first call is correctly guarded by an if 
statement: it's guaranteed it will never run on a non-gfx9000 processor. The 
second call, on the other hand, is not. So we can add a frontend rule: **the 
first call is legal, the second is not**." I'm saying we cannot really infer 
anything about the legality of a naked call to a builtin either, at this point. 
Because the builtin might be available on many processors / processors other 
than gfx9000. We can develop the argument to say "well, fine, what we actually 
meant here is `is_invocable`, rather than `processor_is`, and then thing work 
out", but the corollary to that appears to be that if you ever use the 
predicate on a builtin, you must touch every other use of that builtin within 
at least the same function, and relate it to the predicate evaluation.

> What I'm basically suggesting is just exposing SPIR-V specialization 
> constants as a C construct. Your example SPIR-V was something like:
> 
> ```
> %cmp = OpIEqual %bool %runtime_known_hw_id %hw_id_that_supports_feature
> if (%cmp = true) {
> /* some feature */
> } else {
> /* other feature */
> }
> ```
> 
> We want to come up with a corresponding C construct that's guaranteed to 
> compile to valid SPIR-V. My suggestion is something like:
> 
> ```
> if (__runtime_known_hw_id_eq("hw_id_that_supports_feature")) {
>   /* some feature */
> }
> ```
>

I'm confused as to what is different versus what this PR does, which is does 
generate valid SPIRV / LLVM IR. Perhaps there is an underlying assumption that 
there is some construct that makes the otherwise dead block still contain valid 
code, and there really isn't. There's an example I provided above where what is 
guarded is (static) finite resource allocation, not just the use of an 
intrinsic; we'd not know in the FE which is correct, and we cannot allocate 
both until we know the target at JIT / finalisation time (so before executing 
the code), and we cannot generate executable code with both allocation requests 
live, as the finite resource gets exhausted. So the only place where we can 
meaningfully deal with this is in the ME / over IR, before hitting the BE. We 
should be careful to avoid focusing on the `processor_is` / `hw_id` aspect, 
this leads to brittle code that has to constantly grow additional identity 
checks via `||` disjunction.

> In the body of the if statement, you can use whatever intrinsics are legal on 
> hw_id_that_supports_feature.
> 
> > we're just sliding in immediately after Clang, before optimisation
> 
> Isn't doing checks immediately after IR generation basically the same as 
> checking the AST, just on a slightly different representation?

Not in this case. There's at least two aspects that make a difference:

- linking in bitcode, which can allow more extensive analysis than what you can 
do per TU in the AST - this is minor, however please note the conversation 
above about having to be conservative around external symbols, and the risks of 
leaving them around;
- lack of information when generating the AST, when dealing with abstract 
targets like SPIRV (more specifically, AMDGCN flavoured SPIRV, for the purposes 
of this PR)
  - the FE targets `amdgcnspirv`, which is generic across all concrete AMDGPU 
targets (union of features);
  - the predicates proposed here offer customisation points for which the 
resolution is deferred to the point where the target is known;
  - we only know the concrete target when we are finalising, which happens at a 
completely different time-point, on possibly a different machine;
  - we cannot time-travel to inform the AST about this, but we can compose 
generic IR with target IR, and lower it as target IR (this is already how 
various flavours of device / offload libs work, so it's hardly novel).

None of the above matters for concrete targets, where we just resolve 
everything in the AST already, because we have full information in the FE.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread Eli Friedman via cfe-commits

efriedma-quic wrote:

> Whilst I am thankful for the feedback I think it is somewhat unfortunate that 
> we could not have a shared discussion about this, since I think that there 
> are some core misunderstandings that seem to recur, which makes forward 
> progress one way or the other difficult.

We didn't really say much on the call itself; we just spent a minute while we 
were going through controversial RFCs/PRs, to call this out as something that 
needed attention.

If you think this topic would benefit from a meeting, we can organize one... 
but maybe a 1-on-1 chat would be better to start with, just to make sure we're 
on the same page.

> The front-end cannot generate accurate diagnostics for the actual interesting 
> case where the target is abstract (AMDGCNSPIRV, or the generic target 
> @jhuber6 uses in GPU libc, if we extend things in that direction), because 
> there isn't enough information - we only know what the concrete target is, 
> and hence what features are available, at a point in time which is sequenced 
> after the front-end has finished processing (could be run-time JIT for 
> SPIR-V, could be bit code linking in a completely different compilation for 
> GPU libc etc.);

If you have a construct like the following:

```
if (__builtin_amdgcn_processor_is("gfx900"))) {
  some_gfx9000_specific_intrinsic();
}

some_gfx9000_specific_intrinsic()
```

We can tell, statically, that the first call is correctly guarded by an if 
statement: it's guaranteed it will never run on a non-gfx9000 processor.  The 
second call, on the other hand, is not.  So we can add a frontend rule: the 
first call is legal, the second is not.  Obviously the error has false 
positives, in the sense that we can't actually prove the second call is 
incorrect at runtime... but that's fine, probably.

What I don't want is that we end up with, essentially, the same constraint, but 
enforced by the backend.

> There is not watertight mechanism here in the presence of indirect function 
> calls / pointers to function

Sure; we can't stop people from calling arbitrary pointers.

> We do have a theoretical problem with guaranteeing that non-matching code 
> isn't emitted, because LLVM IR doesn't promise to leave a code sequence like 
> this alone:

There are ways to solve this: for example, we can make the 
llvm.compiler.supports produce a token, and staple that token onto the 
intrinsics using a bundle.  Making this work requires that IRGen knows which 
intrinic calls are actually impacted...

I care less about exactly how we solve this because we can adjust the solution 
later.  Whatever we expose in the frontend is much harder to change later.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> An intrinsic seems like the right IR model for CPU recognition, even for 
> targets that don't specifically need to late-resolve it. That should be much 
> easier for passes to optimize based on CPU settings than directly emitting 
> the compiler-rt reference in the frontend. I know that generating IR with 
> conservative target options and then bumping the target CPU in a pass is 
> something various people have been interested in, so late optimization is 
> specifically worth planning for here.
> 
> We do have a theoretical problem with guaranteeing that non-matching code 
> isn't emitted, because LLVM IR doesn't promise to leave a code sequence like 
> this alone:
> 
> ```
>   %0 = call @llvm.compiler_supports(...)
>   br i1 %0, label %foo, label %bar
> ```
> 
> LLVM could theoretically complicate this by e.g. introducing a PHI or an 
> `or`. But that's a general LLVM problem that any lowering would have to deal 
> with.

The solution we went with here (for our use case) is to just run the predicate 
expansion pass over pristine Clang generated IR, before any other optimisation. 
I think that @nikic suggested an alternative based on `callbr`, but that'd be 
somewhat challenging to represent in SPIRV which is important to us, but then 
again this could just be an implementation detail for cpu_is gets lowered, I 
guess? I.e., since we know we're only ever going to deal this early, we could 
just leave the call in place since we know no optimisation will complicate 
things, conversely other targets could go with `callbr` etc.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread John McCall via cfe-commits

rjmccall wrote:

An intrinsic seems like the right IR model for CPU recognition, even for 
targets that don't specifically need to late-resolve it. That should be much 
easier for passes to optimize based on CPU settings than directly emitting the 
compiler-rt reference in the frontend. I know that generating IR with 
conservative target options and then bumping the target CPU in a pass is 
something various people have been interested in, so late optimization is 
specifically worth planning for here.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

High liklihood that I'll need something similar for my GPU libraries so I'd 
prefer something not explicitly tied to SPIR-V.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> Let me add my few cents here.
> 
> > In the case where the target features are known during clang codegen, 
> > lowering is easy: you just skip generating the bodies of the if statements 
> > that don't match. If you want to some kind of "runtime" (actual runtime, or 
> > SPIR-V compilation-time) detection, it's not clear what the LLVM IR should 
> > look like: we only support specifying target features on a per-function 
> > level. But we can look at that separately.
> 
> Let me try to attempt to answer this question without introducing a new 
> builtin in clang (at first). In SPIR-V there is [specialization 
> constant](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#SpecializationSection)
>  which AFAIK doesn't have a direct LLVM IR counterpart. Some pseudo-code on 
> SPIR-V would be looking like this:
> 
> ```
> %int = OpTypeInt 32 1
> %runtime_known_hw_id = OpSpecConstant %int 0 // global var
> %hw_id_that_supports_feature = OpConstant %int 42
> 
> kernel void foo(...) {
> /* ... */
> %cmp = OpIEqual %bool %runtime_known_hw_id %hw_id_that_supports_feature
> if (%cmp = true) {
> /* some feature */
> } else {
> /* other feature */
> }
> ```
> 
> At runtime, when such SPIR-V module is JIT compiled OpSpecConstant 
> materializes, so DCE (or better say some variation of DCE that is enforced to 
> work with optnone) will be able to reason about %cmp result removing the dead 
> branch, so we won't get unsupported feature at codegen.
> 
> Problem is: how to generate such SPIR-V from clang. So my understanding, that 
> the new builtin should eventually lowered (by SPIR-V backend?) to a construct 
> like in the pseudo-code, though that is not what is currently happening. And 
> I believe, that existing `__builtin_cpu_supports` is not a good match for 
> such lowering.

This is one possible implementation indeed, for a workflow that goes from 
SPIR-V to ISA, or chooses to do the DCE in SPIR-V. Due to having to compose 
with an existing mature toolchain, rather than starting fresh, we have a 
slightly different flow where we reverse translate to LLVM IR and "resume" 
compilation from that point. Hence, the implicitly inserted never to be emitted 
globals, which play the role the spec constants play in your example, when 
coupled with the dedicated predicate expansion pass. Something similar could be 
added to e.g. `spirv-opt`. Thank you for the example, it is helpful.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread Dmitry Sidorov via cfe-commits

MrSidims wrote:

Let me add my few cents here.

> In the case where the target features are known during clang codegen, 
> lowering is easy: you just skip generating the bodies of the if statements 
> that don't match. If you want to some kind of "runtime" (actual runtime, or 
> SPIR-V compilation-time) detection, it's not clear what the LLVM IR should 
> look like: we only support specifying target features on a per-function 
> level. But we can look at that separately.

Let me try to attempt to answer this question without introducing a new builtin 
in clang (at first). In SPIR-V there is [specialization 
constant](https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#SpecializationSection)
 which AFAIK doesn't have a direct LLVM IR counterpart.
Some pseudo-code on SPIR-V would be looking like this:
```
%int = OpTypeInt 32 1
%runtime_known_hw_id = OpSpecConstant %int 0 // global var
%hw_id_that_supports_feature = OpConstant %int 42

kernel void foo(...) {
/* ... */
%cmp = OpIEqual %bool %runtime_known_hw_id %hw_id_that_supports_feature
if (%cmp = true) {
/* some feature */
} else {
/* other feature */
}
```
At runtime, when such SPIR-V module is JIT compiled OpSpecConstant 
materializes, so DCE (or better say some variation of DCE that is enforced to 
work with optnone) will be able to reason about %cmp result removing the dead 
branch, so we won't get unsupported feature at codegen.

Problem is: how to generate such SPIR-V from clang. So my understanding, that 
the new builtin should eventually lowered (by SPIR-V backend?) to a construct 
like in the pseudo-code, though that is not what is currently happening. And I 
believe, that existing `__builtin_cpu_supports` is not a good match for such 
lowering.


https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> Alex, can you talk about why your design decides to check for specific 
> builtins rather than building out the set of features supported by 
> `__builtin_cpu_supports`?

I went into it a bit above without having seen your question (race condition I 
guess:) ), but to have it in one spot:

- AMDGPU features are a bit volatile and subject to disruptive change, sadly 
(we should be better about this but it's going to be a marathon, and it's not 
entirely under our - LLVM compiler team - control);
- We don't really document the features / they are formulated in a way that 
makes sense for the BE, and maybe for a compiler dev, but would be extremely 
confusing for an user - for example note that we have about a dozen `DOT` 
related features, which aren't always inclusive of each other, so you cannot 
actually infer that `DOTn` implies `DOTn-1`; 
- Conversely, the builtins devs reach for most often implement some specific 
capability i.e. just mirror an ISA instruction that they want to use (e.g. 
`mfma` / `wmma`), and these are documented via the ISA docs we publish, so 
having a per-builtin check seemed to match common usage and benefited from what 
is already in place as opposed to depending on hypothetical long-pole changes.

Now, this is specific to AMDGPU, I don't want to speculate too much about how 
other targets deal with this - which is another reason for which these are 
target builtins rather than going for something more generic.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread John McCall via cfe-commits

rjmccall wrote:

> Slightly independently, cpu_supports might turn out a bit difficult to use, 
> at least for us (and I suspect other targets), as the feature definitions are 
> often ad hoc and sadly mutable.

Hmm. Well, you get to define what feature names you recognize in 
`__builtin_cpu_supports`, so there's no reason to use marketing names, and if 
they're unstable you probably shouldn't. But I would imagine there's *some* 
stable technical grouping that's coarser-grained than whether an individual 
builtin is available.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> Right, I don't see any semantic reason why `__builtin_cpu_is` or 
> `__builtin_cpu_supports` shouldn't be resolved statically if we have that 
> information on hand. `-mcpu` / `-march` are not usually sufficient for 
> folding `__builtin_cpu_is`, since those attributes just specify a minimum 
> architecture and the builtin is doing an exact check, but that's emergent and 
> shouldn't be taken as an inherent limitation of the builtin.

This would be up to the target to evaluate, as it'd have knowledge of whether 
the argument to the call is sufficient. When I initially implemented this, I 
added additional `supportsConstEvalCpuIs` and a `tryToFoldCpuIs` interfaces 
(name is up for bikeshedding) to the `TargetInfo` base, and then a target would 
be in a position to choose whether to implement those and what to do about it. 
For example, in our case, the entire ExpandPredicate stuff from this patch 
would've ended up as the implementation for the fold attempt. As I said, if 
folks would prefer that (or if they just want `__builtin_cpu_is` to gain the 
abilities), I can either switch `__builtin_amdgcn_processor_is` back to that or 
create a separate PR.

Slightly independently, `cpu_supports` might turn out a bit difficult to use, 
at least for us (and I suspect other targets), as the feature definitions are 
often ad hoc and sadly mutable. Hence, telling users to check for this or that 
particular feature would be brittle and intrusive (we define features in a way 
that makes sense for the BE primarily, and don't really document them). 
Conversely, builtin names are somewhat more stable and easier to reason about.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread John McCall via cfe-commits

rjmccall wrote:

Alex, can you talk about why your design decides to check for specific builtins 
rather than building out the set of features supported by 
`__builtin_cpu_supports`?

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread John McCall via cfe-commits

rjmccall wrote:

> > On a different point: I don't think this builtin is actually semantically 
> > different from `__builtin_cpu_is`. As long as we're not treating it as 
> > `constexpr`, the fact that it's lowered by the compiler and doesn't need a 
> > runtime check is just a happy property of GPU targeting rather than a 
> > fundamental difference. You could certainly imagine targets that _do_ 
> > simply do this with a runtime switch. And the behavior of allowing 
> > additional builtin to be used within the guarded block seems like a nice 
> > feature that other targets would probably like to take advantage of.
> > We could allow `__builtin_processor_is` as an alternative name for that 
> > builtin if folks feel weird about having "cpu" in the name for a GPU target.
> 
> The `processor_is` interface initially did not exist, but rather 
> `__builtin_cpu_is` gained the ability to be statically resolved in the FE in 
> certain cases / generate no run time code. There was strong opposition from 
> some of my colleagues (some of which are on this thread) claiming that the 
> semantics of `__builtin_cpu_is` mandate the existence of a run time check. 
> The "cpu" bit wasn't really a problem:)
> 
> If you / other Clang owners are happy with extending `__builtin_cpu_is`, 
> personally I would prefer that since I believe that it can be beneficial for 
> targets other than ours / GPUs in general. For example, even for x86, there's 
> a difference between e.g. `x86_64-v2` and `znver5`, which could be resolved 
> in the FE and remove the need to do a cpuid check at run time, and then go 
> via a function call rather than direct inline code.

Right, I don't see any semantic reason why `__builtin_cpu_is` or 
`__builtin_cpu_supports` shouldn't be resolved statically if we have that 
information on hand. `-mcpu` / `-march` are not usually sufficient for folding 
`__builtin_cpu_is`, since those attributes just specify a minimum architecture 
and the builtin is doing an exact check, but that's emergent and shouldn't be 
taken as an inherent limitation of the builtin.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> We briefly discussed this in the clang area team meeting, and we weren't 
> really happy with the design as-is. The underlying idea behind the feature 
> makes sense, but semantics of the actual builtin is ugly: there's a loose 
> connection between the condition check, and the region of code it's guarding.
>

Whilst I am thankful for the feedback I think it is somewhat unfortunate that 
we could not have a shared discussion about this, since I think that there are 
some core misunderstandings that seem to recur, which makes forward progress 
one way or the other difficult.
 
> I spent a bit more time thinking about it after the meeting. Here's a 
> potential alternative design: we add a new kind of if statement, a 
> "processor-feature-if", spelled something like `if 
> __amdgcn_processor_is("gfx900") {}`. In the body of the if statement, you're 
> allowed to use builtins that would otherwise be illegal. This ensures a 
> direct connection between the feature check and the corresponding builtins, 
> so the frontend can analyze your usage and generate accurate diagnostics.
> 

This has been considered, and doesn't quite address the use case (without 
ending up where the currently proposed design already is). Whilst this would 
have been significantly easier to discuss directly, I will try to enumerate the 
issues here:

1. we should not be touching the HLL at such an intrinsic (pun purely 
accidental) level i.e. we should not inject bespoke keywords / control 
structures etc. - this is extremely risky and can well be weaponised into 
making arguments (Clang / LLVM have implemented this, so it is clearly the way) 
about what should be standardised, and there isn't nearly enough experience to 
warrant that; we'd also be preventing meaningful cross-compiler single source / 
forcing other compilers to implement the exact same novel control structure; 
it's easier to detect a builtin than to detect whether a FE supports a novel 
keyword / control structure;
2. In relation with the above, I think that part of the confusion here is that 
the assumption is that the use case here is  a mechanism like `__device__` i.e. 
it's just about inline specifying a blob of "device" code in some "host" 
source, which is essentially ad-hoc language subsetting / dialect generation - 
that is absolutely not the case;
3. Whilst the discussion is using `processor_is` examples, that is for ease of 
parsing, arguably the `is_invocable` check is significantly more useful, as it 
operates at the right granularity; a particular builtin might be available 
across many architectures, so checking for that rather than a particular 
processor ensures that the code will tightly adapt, without changes; we are not 
just trying to find a way to mirror `-mcpu`;
4. The front-end cannot generate accurate diagnostics for the actual 
interesting case where the target is abstract (AMDGCNSPIRV, or the `generic` 
target @jhuber6 uses in GPU libc, if we extend things in that direction), 
because there isn't enough information - we only know what the concrete target 
is, and hence what features are available, at a point in time which is 
sequenced after the front-end has finished processing (could be run-time JIT 
for SPIR-V, could be bit code linking in a completely different compilation for 
GPU libc etc.);
5. There is not watertight mechanism here in the presence of indirect function 
calls / pointers to function, unless we start infecting function types 
(otherwise stated, the ABI) with their feature requirements, which would be 
extremely unfortunate, and probably intractable (because now things like the 
dynamic linker have to start caring); there is no "safe-design with accurate 
diagnostics" that prevents some user from checking for a predicate, then 
calling, via pointer, a function they imported from a library that is utterly 
incompatible with the invariants established by the predicate; these are 
specialist tools, with sharp edges, of which we do have quite a few already.

> In the case where the target features are known during clang codegen, 
> lowering is easy: you just skip generating the bodies of the if statements 
> that don't match. If you want to some kind of "runtime" (actual runtime, or 
> SPIR-V compilation-time) detection, it's not clear what the LLVM IR should 
> look like: we only support specifying target features on a per-function 
> level. But we can look at that separately.

As I have already mentioned below, this is just duplicating existing 
functionality, possibly in a more verbose and roundabout way. It is also 
already handled by what is being proposed, hence the awareness of it was 
present when the currently proposed design was put together. The interesting 
case is the second one, so, sadly, we cannot just look at that separately (and, 
IMHO, should not come up with novel IR constructs to solve this).

If the core of the objection here is that Clang really doesn't like that we're 
doing semantic c

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> On a different point: I don't think this builtin is actually semantically 
> different from `__builtin_cpu_is`. As long as we're not treating it as 
> `constexpr`, the fact that it's lowered by the compiler and doesn't need a 
> runtime check is just a happy property of GPU targeting rather than a 
> fundamental difference. You could certainly imagine targets that _do_ simply 
> do this with a runtime switch. And the behavior of allowing additional 
> builtin to be used within the guarded block seems like a nice feature that 
> other targets would probably like to take advantage of.
> 
> We could allow `__builtin_processor_is` as an alternative name for that 
> builtin if folks feel weird about having "cpu" in the name for a GPU target.

The `processor_is` interface initially did not exist, but rather 
`__builtin_cpu_is` gained the ability to be statically resolved in the FE in 
certain cases / generate no run time code. There was strong opposition from 
some of my colleagues (some of which are on this thread) claiming that the 
semantics of `__builtin_cpu_is` mandate the existence of a run time check. The 
"cpu" bit wasn't really a problem:) 

If you / other Clang owners are happy with extending `__builtin_cpu_is`, 
personally I would prefer that since I believe that it can be beneficial for 
targets other than ours / GPUs in general. For example, even for x86, there's a 
difference between e.g. `x86_64-v2` and `znver5`, which could be resolved in 
the FE and remove the need to do a cpuid check at run time, and then go via a 
function call rather than direct inline code.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-30 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> I mean, I'm not particularly attached to the syntax of the "if". I guess we 
> could designate `if (__builtin_amdgcn_processor_is("gfx900")) {}` as a 
> "processor-feature-if". The point is that we need to know at the AST level 
> which processor features are available for each statement.

I don't quite see how to parse this statement to make it address the actual use 
case. These are useful because we cannot know, at the AST level (in the FE) 
which processor features are available. If we knew that we don't really need 
any additional mechanism, so this is just a different way to type `#if defined` 
/ `__has_builtin`, which is not what is desired.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-27 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> We could allow `__builtin_processor_is` as an alternative name for that 
> builtin if folks feel weird about having "cpu" in the name for a GPU target.

We already use `-mcpu=gfx942` for targeting the GPU processor so I don't think 
it makes a huge difference. I've never heard of `__builtin_cpu_is`, doesn't 
seem like it has a single test in `clang`. Realistically what the GPU use-case 
needs is a way to avoid the normal builtin feature checks and guarantee that 
any children of that check get trimmed at O0 before the backend runs but after 
clang code generation.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-27 Thread John McCall via cfe-commits

rjmccall wrote:

Yeah, I agree with the other parts of your design, enabling the builtins within 
the guarded statements is a great way to handle it.

On a different point: I don't think this builtin is actually semantically 
different from `__builtin_cpu_is`.  As long as we're not treating it as 
`constexpr`, the fact that it's lowered by the compiler and doesn't need a 
runtime check is just a happy property of GPU targeting rather than a 
fundamental difference. You could certainly imagine targets that *do* simply do 
this with a runtime switch. And the behavior of allowing additional builtin to 
be used within the guarded block seems like a nice feature that other targets 
would probably like to take advantage of.

We could allow `__builtin_processor_is`  as an alternative name for that 
builtin if folks feel weird about having "cpu" in the name for a GPU target.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-27 Thread Eli Friedman via cfe-commits

efriedma-quic wrote:

I mean, I'm not particularly attached to the syntax of the "if".  I guess we 
could designate `if (__builtin_amdgcn_processor_is("gfx900")) {}` as a 
"processor-feature-if". The point is that we need to know at the AST level 
which processor features are available for each statement.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-27 Thread John McCall via cfe-commits

rjmccall wrote:

Recognizing when the `if` condition is just a call to the builtin (possibly 
parenthesized or `&&`ed) seems totally sufficient to me. You could check that 
the builtin isn't used in any other position if you want, but I don't think 
that's really necessary.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-27 Thread Eli Friedman via cfe-commits

efriedma-quic wrote:

We briefly discussed this in the clang area team meeting, and we weren't really 
happy with the design as-is.  The underlying idea behind the feature makes 
sense, but semantics of the actual builtin is ugly: there's a loose connection 
between the condition check, and the region of code it's guarding.

I spent a bit more time thinking about it after the meeting.  Here's a 
potential alternative design: we add a new kind of if statement, a 
"processor-feature-if", spelled something like `if 
__amdgcn_processor_is("gfx900") {}`.  In the body of the if statement, you're 
allowed to use builtins that would otherwise be illegal.  This ensures a direct 
connection between the feature check and the corresponding builtins, so the 
frontend can analyze your usage and generate accurate diagnostics.

In the case where the target features are known during clang codegen, lowering 
is easy: you just skip generating the bodies of the if statements that don't 
match. If you want to some kind of "runtime" (actual runtime, or SPIR-V 
compilation-time) detection, it's not clear what the LLVM IR should look like: 
we only support specifying target features on a per-function level. But we can 
look at that separately.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-24 Thread Alex Voicu via cfe-commits


@@ -0,0 +1,157 @@
+//===- AMDGPUExpandFeaturePredicates.cpp - Feature Predicate Expander Pass 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+// This file implements a pass that deals with expanding AMDGCN generic feature
+// predicates into target specific quantities / sequences. In this context, a
+// generic feature predicate is an implementation detail global variable that
+// is inserted by the FE as a consequence of using either the __builtin_cpu_is
+// or the __builtin_amdgcn_is_invocable special builtins on an abstract target
+// (AMDGCNSPIRV). These placeholder globals are used to guide target specific
+// lowering, once the concrete target is known, by way of constant folding 
their
+// value all the way into a terminator (i.e. a controlled block) or into a no
+// live use scenario. The pass makes a best effort attempt to look through
+// calls, i.e. a constant evaluatable passthrough of a predicate value will
+// generally work, however we hard fail if the folding fails, to avoid obtuse
+// BE errors or opaque run time errors. This pass should run as early as
+// possible / immediately after Clang CodeGen, so that the optimisation 
pipeline
+// and the BE operate with concrete target data.
+//===--===//
+
+#include "AMDGPU.h"
+#include "AMDGPUTargetMachine.h"
+#include "GCNSubtarget.h"
+
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/Local.h"
+
+#include 
+#include 
+
+using namespace llvm;
+
+namespace {
+template  void collectUsers(Value *V, C &Container) {
+  assert(V && "Must pass an existing Value!");
+
+  for (auto &&U : V->users())
+if (auto *I = dyn_cast(U))
+  Container.insert(Container.end(), I);
+}
+
+inline void setPredicate(const GCNSubtarget &ST, GlobalVariable *P) {
+  const auto IsFeature = P->getName().starts_with("llvm.amdgcn.has");
+  const auto Offset =
+  IsFeature ? sizeof("llvm.amdgcn.has") : sizeof("llvm.amdgcn.is");
+
+  auto PV = P->getName().substr(Offset).str();
+  if (IsFeature) {
+auto Dx = PV.find(',');
+while (Dx != std::string::npos) {
+  PV.insert(++Dx, {'+'});
+
+  Dx = PV.find(',', Dx);
+}
+PV.insert(PV.cbegin(), '+');
+  }
+
+  auto *PTy = P->getValueType();
+  P->setLinkage(GlobalValue::PrivateLinkage);
+  P->setExternallyInitialized(false);
+
+  if (IsFeature)
+P->setInitializer(ConstantInt::getBool(PTy, ST.checkFeatures(PV)));
+  else
+P->setInitializer(ConstantInt::getBool(PTy, PV == ST.getCPU()));
+}
+
+std::pair
+unfoldableFound(Function *Caller, GlobalVariable *P, Instruction *NoFold) {
+  std::string W;
+  raw_string_ostream OS(W);
+
+  OS << "Impossible to constant fold feature predicate: " << *P << " used by "
+ << *NoFold << ", please simplify.\n";
+
+  Caller->getContext().diagnose(
+  DiagnosticInfoUnsupported(*Caller, W, NoFold->getDebugLoc(), DS_Error));
+
+  return {PreservedAnalyses::none(), false};
+}
+
+std::pair handlePredicate(const GCNSubtarget &ST,
+   GlobalVariable *P) {
+  setPredicate(ST, P);
+
+  SmallPtrSet ToFold;
+  collectUsers(P, ToFold);
+
+  if (ToFold.empty())
+return {PreservedAnalyses::all(), true};
+
+  do {
+auto *I = *ToFold.begin();
+ToFold.erase(I);
+
+if (auto *C = ConstantFoldInstruction(I, P->getDataLayout())) {

AlexVlx wrote:

In what regards unreachable BBs, this looks like so because I hadn't fully 
considered the implications, and because my understanding is that we (LLVM, not 
AMDGPU) unconditionally run 
['UnreachableBlockElimPass'](https://github.com/llvm/llvm-project/blob/e175ecff936287823b5443d7b2d443fc6569f31f/llvm/include/llvm/Passes/CodeGenPassBuilder.h#L717),
 irrespective of optimisation level. I *think* the latter is not incorrect, and 
that there is at least one other transform ('LowerInvokePass') that creates 
unreachable BBs and leaves them around. Having said that, it's not very 
hygienic and I will add cleanup for unreachable BBs.

With functions it's a bit trickier, and can actually get into somewhat 
convoluted use cases, which these predicates, as low-level target specific 
things, are not meant for. To be more specific, with normal use one would 
expect that for any and all functions the user would've applied predicates 
locally at the finest possible granularity i.e.

```cpp
// THIS
void foo() {
if (__builtin_processor_is("gfx900"))
do_something();
else if (__built

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-24 Thread Alex Voicu via cfe-commits


@@ -0,0 +1,157 @@
+//===- AMDGPUExpandFeaturePredicates.cpp - Feature Predicate Expander Pass 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+// This file implements a pass that deals with expanding AMDGCN generic feature
+// predicates into target specific quantities / sequences. In this context, a
+// generic feature predicate is an implementation detail global variable that
+// is inserted by the FE as a consequence of using either the __builtin_cpu_is
+// or the __builtin_amdgcn_is_invocable special builtins on an abstract target
+// (AMDGCNSPIRV). These placeholder globals are used to guide target specific
+// lowering, once the concrete target is known, by way of constant folding 
their
+// value all the way into a terminator (i.e. a controlled block) or into a no
+// live use scenario. The pass makes a best effort attempt to look through
+// calls, i.e. a constant evaluatable passthrough of a predicate value will
+// generally work, however we hard fail if the folding fails, to avoid obtuse
+// BE errors or opaque run time errors. This pass should run as early as
+// possible / immediately after Clang CodeGen, so that the optimisation 
pipeline
+// and the BE operate with concrete target data.
+//===--===//
+
+#include "AMDGPU.h"
+#include "AMDGPUTargetMachine.h"
+#include "GCNSubtarget.h"
+
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/Local.h"
+
+#include 
+#include 
+
+using namespace llvm;
+
+namespace {
+template  void collectUsers(Value *V, C &Container) {
+  assert(V && "Must pass an existing Value!");
+
+  for (auto &&U : V->users())
+if (auto *I = dyn_cast(U))
+  Container.insert(Container.end(), I);
+}
+
+inline void setPredicate(const GCNSubtarget &ST, GlobalVariable *P) {
+  const auto IsFeature = P->getName().starts_with("llvm.amdgcn.has");
+  const auto Offset =
+  IsFeature ? sizeof("llvm.amdgcn.has") : sizeof("llvm.amdgcn.is");
+
+  auto PV = P->getName().substr(Offset).str();
+  if (IsFeature) {
+auto Dx = PV.find(',');
+while (Dx != std::string::npos) {
+  PV.insert(++Dx, {'+'});
+
+  Dx = PV.find(',', Dx);
+}
+PV.insert(PV.cbegin(), '+');
+  }
+
+  auto *PTy = P->getValueType();
+  P->setLinkage(GlobalValue::PrivateLinkage);
+  P->setExternallyInitialized(false);
+
+  if (IsFeature)
+P->setInitializer(ConstantInt::getBool(PTy, ST.checkFeatures(PV)));
+  else
+P->setInitializer(ConstantInt::getBool(PTy, PV == ST.getCPU()));
+}
+
+std::pair
+unfoldableFound(Function *Caller, GlobalVariable *P, Instruction *NoFold) {
+  std::string W;
+  raw_string_ostream OS(W);
+
+  OS << "Impossible to constant fold feature predicate: " << *P << " used by "
+ << *NoFold << ", please simplify.\n";
+
+  Caller->getContext().diagnose(
+  DiagnosticInfoUnsupported(*Caller, W, NoFold->getDebugLoc(), DS_Error));
+
+  return {PreservedAnalyses::none(), false};
+}
+
+std::pair handlePredicate(const GCNSubtarget &ST,
+   GlobalVariable *P) {

AlexVlx wrote:

Oh, this is a good question, it's probably gotten lost in the lengthy 
conversation. We have two cases, let me try to clarify:

1. We are targeting a concrete `gfx###` target, for which the features and 
capabilities are fully known at compile time / we know what we are lowering for 
-> the predicates get expanded and resolved in the FE, they never reach codegen 
/ get emitted in IR;
2. We are targeting `amdgcnspirv`, which is abstract and for which the actual 
concrete target is only known at run time i.e. there's a lack of information / 
temporal decoupling:
 - the predicates allow one to write code that adapts to the capabilities 
of the actual target that the code will execute on;
 - we only know the target once we resume compilation for the concrete 
target, hence the need to emit them in IR, and then expand.

The ultimate state of affairs (not there yet due to historical issues / ongoing 
work) is that for the 2nd case the IR we generate SPIRV from is directly the 
pristine Clang output (+transforms needed for SPIRV, which do not impact 
these), so when we resume compilation at run time, it's on un-optimised 
FE-output IR. Furthermore, the expansion pass runs unconditionally, and is 
independent from optimisation level (which also implies it needs to be better 
about cleaning after itself, which I still owe an answer for). Hopefully that 
helps 

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-24 Thread Alex Voicu via cfe-commits


@@ -0,0 +1,157 @@
+//===- AMDGPUExpandFeaturePredicates.cpp - Feature Predicate Expander Pass 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+// This file implements a pass that deals with expanding AMDGCN generic feature
+// predicates into target specific quantities / sequences. In this context, a
+// generic feature predicate is an implementation detail global variable that
+// is inserted by the FE as a consequence of using either the __builtin_cpu_is
+// or the __builtin_amdgcn_is_invocable special builtins on an abstract target
+// (AMDGCNSPIRV). These placeholder globals are used to guide target specific
+// lowering, once the concrete target is known, by way of constant folding 
their
+// value all the way into a terminator (i.e. a controlled block) or into a no
+// live use scenario. The pass makes a best effort attempt to look through
+// calls, i.e. a constant evaluatable passthrough of a predicate value will
+// generally work, however we hard fail if the folding fails, to avoid obtuse
+// BE errors or opaque run time errors. This pass should run as early as
+// possible / immediately after Clang CodeGen, so that the optimisation 
pipeline
+// and the BE operate with concrete target data.
+//===--===//
+
+#include "AMDGPU.h"
+#include "AMDGPUTargetMachine.h"
+#include "GCNSubtarget.h"
+
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/Local.h"
+
+#include 
+#include 
+
+using namespace llvm;
+
+namespace {
+template  void collectUsers(Value *V, C &Container) {
+  assert(V && "Must pass an existing Value!");
+
+  for (auto &&U : V->users())
+if (auto *I = dyn_cast(U))
+  Container.insert(Container.end(), I);
+}
+
+inline void setPredicate(const GCNSubtarget &ST, GlobalVariable *P) {
+  const auto IsFeature = P->getName().starts_with("llvm.amdgcn.has");
+  const auto Offset =
+  IsFeature ? sizeof("llvm.amdgcn.has") : sizeof("llvm.amdgcn.is");
+
+  auto PV = P->getName().substr(Offset).str();
+  if (IsFeature) {
+auto Dx = PV.find(',');
+while (Dx != std::string::npos) {
+  PV.insert(++Dx, {'+'});
+
+  Dx = PV.find(',', Dx);
+}
+PV.insert(PV.cbegin(), '+');
+  }
+
+  auto *PTy = P->getValueType();
+  P->setLinkage(GlobalValue::PrivateLinkage);
+  P->setExternallyInitialized(false);
+
+  if (IsFeature)
+P->setInitializer(ConstantInt::getBool(PTy, ST.checkFeatures(PV)));
+  else
+P->setInitializer(ConstantInt::getBool(PTy, PV == ST.getCPU()));
+}
+
+std::pair
+unfoldableFound(Function *Caller, GlobalVariable *P, Instruction *NoFold) {
+  std::string W;
+  raw_string_ostream OS(W);
+
+  OS << "Impossible to constant fold feature predicate: " << *P << " used by "
+ << *NoFold << ", please simplify.\n";
+
+  Caller->getContext().diagnose(
+  DiagnosticInfoUnsupported(*Caller, W, NoFold->getDebugLoc(), DS_Error));
+
+  return {PreservedAnalyses::none(), false};
+}
+
+std::pair handlePredicate(const GCNSubtarget &ST,
+   GlobalVariable *P) {

AlexVlx wrote:

> So to clarify, optimizations will never be applied during the compilation to 
> amdgcnspirv? If that's the case, I guess it's not likely that IR will be 
> transformed in problematic ways.
>

Yes, this is the intention, it is still ongoing work - empirically we are not 
running into any of the potential issues you brought up, which is why I went 
ahead with upstreaming this part which is fairly important for library work 
(hard to author high-performance generic libs without this sort of mechanism). 
By the end of this year we should end up generating SPIRV from Clang's LLVMIR 
output, with no optimisations applied.

> It did occur to me that a way to guarantee that the folding works is by using 
> a callbr intrinsic, something like this:
> 
> ```llvm
> callbr void @llvm.amdgcn.processor.is(metadata "gfx803") to label 
> %unsupported [label %supported]
> ```
> 
> This would make the check fundamentally inseparable from the control flow.
> 
> But I guess you'd have trouble round-tripping that via SPIRV...

Ah, I actually hadn't thought of that but having had a glance yes, it's 
difficult to round trip. Something to consider in the future and if / when we 
try to make this generic rather than target specific, if there is interest.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-com

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-24 Thread Nikita Popov via cfe-commits


@@ -0,0 +1,157 @@
+//===- AMDGPUExpandFeaturePredicates.cpp - Feature Predicate Expander Pass 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+// This file implements a pass that deals with expanding AMDGCN generic feature
+// predicates into target specific quantities / sequences. In this context, a
+// generic feature predicate is an implementation detail global variable that
+// is inserted by the FE as a consequence of using either the __builtin_cpu_is
+// or the __builtin_amdgcn_is_invocable special builtins on an abstract target
+// (AMDGCNSPIRV). These placeholder globals are used to guide target specific
+// lowering, once the concrete target is known, by way of constant folding 
their
+// value all the way into a terminator (i.e. a controlled block) or into a no
+// live use scenario. The pass makes a best effort attempt to look through
+// calls, i.e. a constant evaluatable passthrough of a predicate value will
+// generally work, however we hard fail if the folding fails, to avoid obtuse
+// BE errors or opaque run time errors. This pass should run as early as
+// possible / immediately after Clang CodeGen, so that the optimisation 
pipeline
+// and the BE operate with concrete target data.
+//===--===//
+
+#include "AMDGPU.h"
+#include "AMDGPUTargetMachine.h"
+#include "GCNSubtarget.h"
+
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/Local.h"
+
+#include 
+#include 
+
+using namespace llvm;
+
+namespace {
+template  void collectUsers(Value *V, C &Container) {
+  assert(V && "Must pass an existing Value!");
+
+  for (auto &&U : V->users())
+if (auto *I = dyn_cast(U))
+  Container.insert(Container.end(), I);
+}
+
+inline void setPredicate(const GCNSubtarget &ST, GlobalVariable *P) {
+  const auto IsFeature = P->getName().starts_with("llvm.amdgcn.has");
+  const auto Offset =
+  IsFeature ? sizeof("llvm.amdgcn.has") : sizeof("llvm.amdgcn.is");
+
+  auto PV = P->getName().substr(Offset).str();
+  if (IsFeature) {
+auto Dx = PV.find(',');
+while (Dx != std::string::npos) {
+  PV.insert(++Dx, {'+'});
+
+  Dx = PV.find(',', Dx);
+}
+PV.insert(PV.cbegin(), '+');
+  }
+
+  auto *PTy = P->getValueType();
+  P->setLinkage(GlobalValue::PrivateLinkage);
+  P->setExternallyInitialized(false);
+
+  if (IsFeature)
+P->setInitializer(ConstantInt::getBool(PTy, ST.checkFeatures(PV)));
+  else
+P->setInitializer(ConstantInt::getBool(PTy, PV == ST.getCPU()));
+}
+
+std::pair
+unfoldableFound(Function *Caller, GlobalVariable *P, Instruction *NoFold) {
+  std::string W;
+  raw_string_ostream OS(W);
+
+  OS << "Impossible to constant fold feature predicate: " << *P << " used by "
+ << *NoFold << ", please simplify.\n";
+
+  Caller->getContext().diagnose(
+  DiagnosticInfoUnsupported(*Caller, W, NoFold->getDebugLoc(), DS_Error));
+
+  return {PreservedAnalyses::none(), false};
+}
+
+std::pair handlePredicate(const GCNSubtarget &ST,
+   GlobalVariable *P) {

nikic wrote:

So to clarify, optimizations will never be applied during the compilation to 
amdgcnspirv? If that's the case, I guess it's not likely that IR will be 
transformed in problematic ways.

It did occur to me that a way to guarantee that the folding works is by using a 
callbr intrinsic, something like this:
```llvm
callbr void @llvm.amdgcn.processor.is(metadata "gfx803") to label %unsupported 
[label %supported]
```
This would make the check fundamentally inseparable from the control flow.

But I guess you'd have trouble round-tripping that via SPIRV...

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-24 Thread Nikita Popov via cfe-commits

https://github.com/nikic edited https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-24 Thread Nikita Popov via cfe-commits


@@ -0,0 +1,157 @@
+//===- AMDGPUExpandFeaturePredicates.cpp - Feature Predicate Expander Pass 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+// This file implements a pass that deals with expanding AMDGCN generic feature
+// predicates into target specific quantities / sequences. In this context, a
+// generic feature predicate is an implementation detail global variable that
+// is inserted by the FE as a consequence of using either the __builtin_cpu_is
+// or the __builtin_amdgcn_is_invocable special builtins on an abstract target
+// (AMDGCNSPIRV). These placeholder globals are used to guide target specific
+// lowering, once the concrete target is known, by way of constant folding 
their
+// value all the way into a terminator (i.e. a controlled block) or into a no
+// live use scenario. The pass makes a best effort attempt to look through
+// calls, i.e. a constant evaluatable passthrough of a predicate value will
+// generally work, however we hard fail if the folding fails, to avoid obtuse
+// BE errors or opaque run time errors. This pass should run as early as
+// possible / immediately after Clang CodeGen, so that the optimisation 
pipeline
+// and the BE operate with concrete target data.
+//===--===//
+
+#include "AMDGPU.h"
+#include "AMDGPUTargetMachine.h"
+#include "GCNSubtarget.h"
+
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/Local.h"
+
+#include 
+#include 
+
+using namespace llvm;
+
+namespace {
+template  void collectUsers(Value *V, C &Container) {
+  assert(V && "Must pass an existing Value!");
+
+  for (auto &&U : V->users())
+if (auto *I = dyn_cast(U))
+  Container.insert(Container.end(), I);
+}
+
+inline void setPredicate(const GCNSubtarget &ST, GlobalVariable *P) {
+  const auto IsFeature = P->getName().starts_with("llvm.amdgcn.has");
+  const auto Offset =
+  IsFeature ? sizeof("llvm.amdgcn.has") : sizeof("llvm.amdgcn.is");
+
+  auto PV = P->getName().substr(Offset).str();
+  if (IsFeature) {
+auto Dx = PV.find(',');
+while (Dx != std::string::npos) {
+  PV.insert(++Dx, {'+'});
+
+  Dx = PV.find(',', Dx);
+}
+PV.insert(PV.cbegin(), '+');
+  }
+
+  auto *PTy = P->getValueType();
+  P->setLinkage(GlobalValue::PrivateLinkage);
+  P->setExternallyInitialized(false);
+
+  if (IsFeature)
+P->setInitializer(ConstantInt::getBool(PTy, ST.checkFeatures(PV)));
+  else
+P->setInitializer(ConstantInt::getBool(PTy, PV == ST.getCPU()));
+}
+
+std::pair
+unfoldableFound(Function *Caller, GlobalVariable *P, Instruction *NoFold) {
+  std::string W;
+  raw_string_ostream OS(W);
+
+  OS << "Impossible to constant fold feature predicate: " << *P << " used by "
+ << *NoFold << ", please simplify.\n";
+
+  Caller->getContext().diagnose(
+  DiagnosticInfoUnsupported(*Caller, W, NoFold->getDebugLoc(), DS_Error));
+
+  return {PreservedAnalyses::none(), false};
+}
+
+std::pair handlePredicate(const GCNSubtarget &ST,
+   GlobalVariable *P) {

nikic wrote:

Possibly I'm misunderstanding how the pipeline here looks like. My assumption 
was that you have something like this going on:
```
clang generates IR -> compilation 1 without known target -> compilation 2 with 
known target
```
Where the predicates are expanded at the start of compilation 2, but 
compilation 1 could have arbitrarily optimized the IR.

If the resolution always happens immediately on the clang-generated IR, then I 
don't understand the purpose of the feature (as compared to always resolving in 
the frontend, that is).

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-23 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,157 @@
+//===- AMDGPUExpandFeaturePredicates.cpp - Feature Predicate Expander Pass 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+// This file implements a pass that deals with expanding AMDGCN generic feature
+// predicates into target specific quantities / sequences. In this context, a
+// generic feature predicate is an implementation detail global variable that
+// is inserted by the FE as a consequence of using either the __builtin_cpu_is
+// or the __builtin_amdgcn_is_invocable special builtins on an abstract target
+// (AMDGCNSPIRV). These placeholder globals are used to guide target specific
+// lowering, once the concrete target is known, by way of constant folding 
their
+// value all the way into a terminator (i.e. a controlled block) or into a no
+// live use scenario. The pass makes a best effort attempt to look through
+// calls, i.e. a constant evaluatable passthrough of a predicate value will
+// generally work, however we hard fail if the folding fails, to avoid obtuse
+// BE errors or opaque run time errors. This pass should run as early as
+// possible / immediately after Clang CodeGen, so that the optimisation 
pipeline
+// and the BE operate with concrete target data.
+//===--===//
+
+#include "AMDGPU.h"
+#include "AMDGPUTargetMachine.h"
+#include "GCNSubtarget.h"
+
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/Local.h"
+
+#include 
+#include 
+
+using namespace llvm;
+
+namespace {
+template  void collectUsers(Value *V, C &Container) {
+  assert(V && "Must pass an existing Value!");
+
+  for (auto &&U : V->users())
+if (auto *I = dyn_cast(U))
+  Container.insert(Container.end(), I);
+}
+
+inline void setPredicate(const GCNSubtarget &ST, GlobalVariable *P) {
+  const auto IsFeature = P->getName().starts_with("llvm.amdgcn.has");
+  const auto Offset =
+  IsFeature ? sizeof("llvm.amdgcn.has") : sizeof("llvm.amdgcn.is");
+
+  auto PV = P->getName().substr(Offset).str();
+  if (IsFeature) {
+auto Dx = PV.find(',');
+while (Dx != std::string::npos) {
+  PV.insert(++Dx, {'+'});
+
+  Dx = PV.find(',', Dx);
+}
+PV.insert(PV.cbegin(), '+');
+  }
+
+  auto *PTy = P->getValueType();
+  P->setLinkage(GlobalValue::PrivateLinkage);
+  P->setExternallyInitialized(false);
+
+  if (IsFeature)
+P->setInitializer(ConstantInt::getBool(PTy, ST.checkFeatures(PV)));
+  else
+P->setInitializer(ConstantInt::getBool(PTy, PV == ST.getCPU()));
+}
+
+std::pair
+unfoldableFound(Function *Caller, GlobalVariable *P, Instruction *NoFold) {
+  std::string W;
+  raw_string_ostream OS(W);
+
+  OS << "Impossible to constant fold feature predicate: " << *P << " used by "
+ << *NoFold << ", please simplify.\n";
+
+  Caller->getContext().diagnose(
+  DiagnosticInfoUnsupported(*Caller, W, NoFold->getDebugLoc(), DS_Error));
+
+  return {PreservedAnalyses::none(), false};
+}
+
+std::pair handlePredicate(const GCNSubtarget &ST,
+   GlobalVariable *P) {
+  setPredicate(ST, P);
+
+  SmallPtrSet ToFold;
+  collectUsers(P, ToFold);
+
+  if (ToFold.empty())
+return {PreservedAnalyses::all(), true};
+
+  do {
+auto *I = *ToFold.begin();
+ToFold.erase(I);
+
+if (auto *C = ConstantFoldInstruction(I, P->getDataLayout())) {
+  collectUsers(I, ToFold);
+  I->replaceAllUsesWith(C);
+  I->eraseFromParent();
+  continue;
+} else if (I->isTerminator() && ConstantFoldTerminator(I->getParent())) {
+  continue;
+} else if (I->users().empty()) {
+  continue;
+}
+
+return unfoldableFound(I->getParent()->getParent(), P, I);
+  } while (!ToFold.empty());
+
+  return {PreservedAnalyses::none(), true};
+}
+} // Unnamed namespace.
+
+PreservedAnalyses
+AMDGPUExpandFeaturePredicatesPass::run(Module &M, ModuleAnalysisManager &MAM) {
+  if (M.empty())
+return PreservedAnalyses::all();
+
+  SmallVector Predicates;
+  for (auto &&G : M.globals()) {
+if (!G.isDeclaration() || !G.hasName())
+  continue;
+if (G.getName().starts_with("llvm.amdgcn."))
+  Predicates.push_back(&G);
+  }
+
+  if (Predicates.empty())
+return PreservedAnalyses::all();
+
+  const auto &ST = TM.getSubtarget(
+  *find_if(M, [](auto &&F) { return !F.isIntrinsic(); }));

arsenm wrote:

in the real world the subtarget features for xnack may still differ between 
functions in a module 

https://github.co

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-23 Thread Matt Arsenault via cfe-commits


@@ -0,0 +1,157 @@
+//===- AMDGPUExpandFeaturePredicates.cpp - Feature Predicate Expander Pass 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+// This file implements a pass that deals with expanding AMDGCN generic feature
+// predicates into target specific quantities / sequences. In this context, a
+// generic feature predicate is an implementation detail global variable that
+// is inserted by the FE as a consequence of using either the __builtin_cpu_is
+// or the __builtin_amdgcn_is_invocable special builtins on an abstract target
+// (AMDGCNSPIRV). These placeholder globals are used to guide target specific
+// lowering, once the concrete target is known, by way of constant folding 
their
+// value all the way into a terminator (i.e. a controlled block) or into a no
+// live use scenario. The pass makes a best effort attempt to look through
+// calls, i.e. a constant evaluatable passthrough of a predicate value will
+// generally work, however we hard fail if the folding fails, to avoid obtuse
+// BE errors or opaque run time errors. This pass should run as early as
+// possible / immediately after Clang CodeGen, so that the optimisation 
pipeline
+// and the BE operate with concrete target data.
+//===--===//
+
+#include "AMDGPU.h"
+#include "AMDGPUTargetMachine.h"
+#include "GCNSubtarget.h"
+
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/Local.h"
+
+#include 
+#include 
+
+using namespace llvm;
+
+namespace {
+template  void collectUsers(Value *V, C &Container) {
+  assert(V && "Must pass an existing Value!");
+
+  for (auto &&U : V->users())
+if (auto *I = dyn_cast(U))
+  Container.insert(Container.end(), I);
+}
+
+inline void setPredicate(const GCNSubtarget &ST, GlobalVariable *P) {
+  const auto IsFeature = P->getName().starts_with("llvm.amdgcn.has");
+  const auto Offset =
+  IsFeature ? sizeof("llvm.amdgcn.has") : sizeof("llvm.amdgcn.is");
+
+  auto PV = P->getName().substr(Offset).str();
+  if (IsFeature) {
+auto Dx = PV.find(',');
+while (Dx != std::string::npos) {
+  PV.insert(++Dx, {'+'});
+
+  Dx = PV.find(',', Dx);
+}
+PV.insert(PV.cbegin(), '+');
+  }
+
+  auto *PTy = P->getValueType();
+  P->setLinkage(GlobalValue::PrivateLinkage);
+  P->setExternallyInitialized(false);
+
+  if (IsFeature)
+P->setInitializer(ConstantInt::getBool(PTy, ST.checkFeatures(PV)));
+  else
+P->setInitializer(ConstantInt::getBool(PTy, PV == ST.getCPU()));
+}
+
+std::pair
+unfoldableFound(Function *Caller, GlobalVariable *P, Instruction *NoFold) {
+  std::string W;
+  raw_string_ostream OS(W);
+
+  OS << "Impossible to constant fold feature predicate: " << *P << " used by "
+ << *NoFold << ", please simplify.\n";
+
+  Caller->getContext().diagnose(
+  DiagnosticInfoUnsupported(*Caller, W, NoFold->getDebugLoc(), DS_Error));
+
+  return {PreservedAnalyses::none(), false};
+}
+
+std::pair handlePredicate(const GCNSubtarget &ST,
+   GlobalVariable *P) {
+  setPredicate(ST, P);
+
+  SmallPtrSet ToFold;
+  collectUsers(P, ToFold);
+
+  if (ToFold.empty())
+return {PreservedAnalyses::all(), true};
+
+  do {
+auto *I = *ToFold.begin();
+ToFold.erase(I);
+
+if (auto *C = ConstantFoldInstruction(I, P->getDataLayout())) {
+  collectUsers(I, ToFold);
+  I->replaceAllUsesWith(C);
+  I->eraseFromParent();
+  continue;
+} else if (I->isTerminator() && ConstantFoldTerminator(I->getParent())) {
+  continue;
+} else if (I->users().empty()) {
+  continue;
+}
+
+return unfoldableFound(I->getParent()->getParent(), P, I);
+  } while (!ToFold.empty());
+
+  return {PreservedAnalyses::none(), true};
+}
+} // Unnamed namespace.
+
+PreservedAnalyses
+AMDGPUExpandFeaturePredicatesPass::run(Module &M, ModuleAnalysisManager &MAM) {
+  if (M.empty())
+return PreservedAnalyses::all();
+
+  SmallVector Predicates;
+  for (auto &&G : M.globals()) {
+if (!G.isDeclaration() || !G.hasName())
+  continue;
+if (G.getName().starts_with("llvm.amdgcn."))
+  Predicates.push_back(&G);
+  }
+
+  if (Predicates.empty())
+return PreservedAnalyses::all();
+
+  const auto &ST = TM.getSubtarget(
+  *find_if(M, [](auto &&F) { return !F.isIntrinsic(); }));

arsenm wrote:

It's not convenient, but you should evaluate this in each individual function 
context. Really most of the targets sho

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-23 Thread Alex Voicu via cfe-commits


@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
 QualType FromType = OnlyArg->getType();
+// __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+// type, although this is almost always an error and we advise against it

AlexVlx wrote:

>From the bottom up, anything but Friday should be good, including today 
>starting from now to now + 6 hours:) I'm in the UK, so the delta is not so 
>large anyway, pick something that fits your schedule and I'll probably be able 
>to make it work.

For your example at the bottom, the ASM is non-problematic in that it goes 
through. Now substitute it with a builtin that is only there iff SSE3 is 
available, or try to bind registers from the extended x86_64 set and compile 
for an x86 target, and it'll go back to failing at compile time. It's that 
latter part that is problematic even for the user's experience.

I suspect that part of the issue here is that something like X86 hides a lot of 
this stuff under normal circumstances because folks don't really normally grab 
for special functionality, or feel the need for it. But if we have a look at 
the many SIMD extension sets, as well as the attempts at defining various 
levels of capability (the v1, v2, v3 things) I think the same challenge exists 
there, it's just not an immediate concern. What we're trying to solve is that 
whilst it makes perfect sense for a BE, any BE, to bind very tightly to the 
target, it is sometimes beneficial for the IR coming out of the FE to be 
generic and usable by many targets, without loss of capability. Without a 
mechanism as the one here one is either degraded to lowest common denominator 
capability, or has to play games trying to define capability levels, which 
generally end up being too coarse.

Also, please note that, in spite of me mentioning x86, at this point we are not 
proposing this for general use, but rather as a target specific BI, which 
hopefully reduces risk / contains any perceived novelty to parts where it's 
already been found to be useful:) 

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-23 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> I'm generally very unhappy about any kind of functionality that can cause 
> compilation failures either because the optimizer did not optimize enough 
> (including at O0) or because it optimized too much (producing code patterns 
> that are no longer recognized as trivially dead).

Fortunately, that wouldn't be the case here, I don't think, unless you have 
something specific in mind (asides from the inquiry about what happens with now 
inaccessible blocks / dead functions, which I'll address where it was asked).

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-23 Thread Alex Voicu via cfe-commits


@@ -0,0 +1,157 @@
+//===- AMDGPUExpandFeaturePredicates.cpp - Feature Predicate Expander Pass 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+// This file implements a pass that deals with expanding AMDGCN generic feature
+// predicates into target specific quantities / sequences. In this context, a
+// generic feature predicate is an implementation detail global variable that
+// is inserted by the FE as a consequence of using either the __builtin_cpu_is
+// or the __builtin_amdgcn_is_invocable special builtins on an abstract target
+// (AMDGCNSPIRV). These placeholder globals are used to guide target specific
+// lowering, once the concrete target is known, by way of constant folding 
their
+// value all the way into a terminator (i.e. a controlled block) or into a no
+// live use scenario. The pass makes a best effort attempt to look through
+// calls, i.e. a constant evaluatable passthrough of a predicate value will

AlexVlx wrote:

The comment is vestigial from a prior iteration of this, thank you for catching 
that, I have to correct it.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-23 Thread Alex Voicu via cfe-commits


@@ -0,0 +1,157 @@
+//===- AMDGPUExpandFeaturePredicates.cpp - Feature Predicate Expander Pass 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+// This file implements a pass that deals with expanding AMDGCN generic feature
+// predicates into target specific quantities / sequences. In this context, a
+// generic feature predicate is an implementation detail global variable that
+// is inserted by the FE as a consequence of using either the __builtin_cpu_is
+// or the __builtin_amdgcn_is_invocable special builtins on an abstract target
+// (AMDGCNSPIRV). These placeholder globals are used to guide target specific
+// lowering, once the concrete target is known, by way of constant folding 
their
+// value all the way into a terminator (i.e. a controlled block) or into a no
+// live use scenario. The pass makes a best effort attempt to look through
+// calls, i.e. a constant evaluatable passthrough of a predicate value will
+// generally work, however we hard fail if the folding fails, to avoid obtuse
+// BE errors or opaque run time errors. This pass should run as early as
+// possible / immediately after Clang CodeGen, so that the optimisation 
pipeline
+// and the BE operate with concrete target data.
+//===--===//
+
+#include "AMDGPU.h"
+#include "AMDGPUTargetMachine.h"
+#include "GCNSubtarget.h"
+
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Pass.h"
+#include "llvm/Transforms/Utils/Local.h"
+
+#include 
+#include 
+
+using namespace llvm;
+
+namespace {
+template  void collectUsers(Value *V, C &Container) {
+  assert(V && "Must pass an existing Value!");
+
+  for (auto &&U : V->users())
+if (auto *I = dyn_cast(U))
+  Container.insert(Container.end(), I);
+}
+
+inline void setPredicate(const GCNSubtarget &ST, GlobalVariable *P) {
+  const auto IsFeature = P->getName().starts_with("llvm.amdgcn.has");
+  const auto Offset =
+  IsFeature ? sizeof("llvm.amdgcn.has") : sizeof("llvm.amdgcn.is");
+
+  auto PV = P->getName().substr(Offset).str();
+  if (IsFeature) {
+auto Dx = PV.find(',');
+while (Dx != std::string::npos) {
+  PV.insert(++Dx, {'+'});
+
+  Dx = PV.find(',', Dx);
+}
+PV.insert(PV.cbegin(), '+');
+  }
+
+  auto *PTy = P->getValueType();
+  P->setLinkage(GlobalValue::PrivateLinkage);
+  P->setExternallyInitialized(false);
+
+  if (IsFeature)
+P->setInitializer(ConstantInt::getBool(PTy, ST.checkFeatures(PV)));
+  else
+P->setInitializer(ConstantInt::getBool(PTy, PV == ST.getCPU()));
+}
+
+std::pair
+unfoldableFound(Function *Caller, GlobalVariable *P, Instruction *NoFold) {
+  std::string W;
+  raw_string_ostream OS(W);
+
+  OS << "Impossible to constant fold feature predicate: " << *P << " used by "
+ << *NoFold << ", please simplify.\n";
+
+  Caller->getContext().diagnose(
+  DiagnosticInfoUnsupported(*Caller, W, NoFold->getDebugLoc(), DS_Error));
+
+  return {PreservedAnalyses::none(), false};
+}
+
+std::pair handlePredicate(const GCNSubtarget &ST,
+   GlobalVariable *P) {
+  setPredicate(ST, P);
+
+  SmallPtrSet ToFold;
+  collectUsers(P, ToFold);
+
+  if (ToFold.empty())
+return {PreservedAnalyses::all(), true};
+
+  do {
+auto *I = *ToFold.begin();
+ToFold.erase(I);
+
+if (auto *C = ConstantFoldInstruction(I, P->getDataLayout())) {
+  collectUsers(I, ToFold);
+  I->replaceAllUsesWith(C);
+  I->eraseFromParent();
+  continue;
+} else if (I->isTerminator() && ConstantFoldTerminator(I->getParent())) {
+  continue;
+} else if (I->users().empty()) {
+  continue;
+}
+
+return unfoldableFound(I->getParent()->getParent(), P, I);
+  } while (!ToFold.empty());
+
+  return {PreservedAnalyses::none(), true};
+}
+} // Unnamed namespace.
+
+PreservedAnalyses
+AMDGPUExpandFeaturePredicatesPass::run(Module &M, ModuleAnalysisManager &MAM) {
+  if (M.empty())
+return PreservedAnalyses::all();
+
+  SmallVector Predicates;
+  for (auto &&G : M.globals()) {
+if (!G.isDeclaration() || !G.hasName())
+  continue;
+if (G.getName().starts_with("llvm.amdgcn."))
+  Predicates.push_back(&G);
+  }
+
+  if (Predicates.empty())
+return PreservedAnalyses::all();
+
+  const auto &ST = TM.getSubtarget(
+  *find_if(M, [](auto &&F) { return !F.isIntrinsic(); }));

AlexVlx wrote:

It does but the  (`gfxSMTH`) target is uniform per compilation. The mechanism 
is roundabout but there's no other con

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-23 Thread Aaron Ballman via cfe-commits


@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
 QualType FromType = OnlyArg->getType();
+// __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+// type, although this is almost always an error and we advise against it

AaronBallman wrote:

> > > The __has_builtin counter-example actually does not work and cannot work, 
> > > please see: https://gcc.godbolt.org/z/7G5Y1d85b.
> > 
> > 
> > I cannot imagine a situation in which this isn't indicative of a bug, but 
> > perhaps this situation is the same one that necessitated [this 
> > PR](https://github.com/llvm/llvm-project/pull/126324#issuecomment-2706655366)
> >  which eventually concluded that we should change the behavior of 
> > `__has_builtin` rather than introduce a new builtin.
> 
> This is not actually a bug, it's intended behaviour. To obtain what you 
> expect the `b` would have to be `constexpr`, and then the `if` itself would 
> have to be `if constexpr`. Otherwise there's no binding commitment to 
> evaluate this at compile time (and, in effect, if this gets trivially 
> evaluated / removed in the FE, it induces dependence on optimisation level).

I... am an idiot. :-D Sorry, I think I must have been braindead when I wrote 
that because you're exactly correct. Sorry for the noise!

> > Backing up a step.. my expectation is that this eventually lowers down to a 
> > test and jump which jumps past the target code if the test fails. e.g.,
> > ```
> >   %0 = load i8, ptr %b, align 1
> >   %loadedv = trunc i8 %0 to i1
> >   br i1 %loadedv, label %if.then, label %if.end
> > 
> > if.then:
> >   # the target-specific instructions live here
> >   br label %if.end
> > 
> > if.end:
> >   ret void
> > ```
> >   
> > So we'd be generating instructions for the target which may be invalid if 
> > the test lies. If something did change that value so it no longer 
> > represents the predicate, I think that's UB (and we could help users catch 
> > that UB via a sanitizer check if we wanted to, rather than try to make the 
> > backend have to try to figure it out at compile time).
> 
> This cannot work reliably (e.g. there are instructions that would simply fail 
> at ISEL, and a run time jump doesn't mean that you do not lower to ISA the 
> jumped around block), and introducing dependence on sanitizers seems not 
> ideal. Furthermore, a run time jump isn't free, which is a concern for us, 
> and we also already have a mechanism for that case 
> (`__attribute__((target))`). Note that these can also control e.g. resource 
> allocation, so actually generating both might lead to arbitrary exhaustion of 
> a limited resource, and spurious compilation failures, consider e.g. (I'll 
> use CUDA/HIP syntax):
> 
> ```c++
> // This is a bit odd, and technically a race because multiple lanes write to 
> shared_buf
> void foo() {
>   __shared__ int* shared_buf;
>   if (__builtin_amdgcn_processor_is("gfx950") {
> __shared__ int buf[70 * 1024];
> shared_buf = buf;
>   } else {
> __shared__ int buf[60 * 1024];
> shared_buf = buf;
>   }
> 
>   __syncthreads();
>   // use shared_buf
> ```
> 
> If we tried to lower that we'd exhaust LDS, and spuriously fail to compile. 
> This would have originated from perfectly valid uses of `#if 
> defined(__gfx950__) #else`. We'd like these to work, so we must unambiguously 
> do the fold ourselves.

Okay, so the situation is different than what I expected. I was unaware this 
would cause ISEL failures.

> > > if for a chain from point of cast to final point of use folding fails 
> > > (because you passed your value to an
> > > opaque function, modified it based on a run time value etc.), you get an 
> > > error and a diagnostic.
> > 
> > 
> > I was thinking you would not get a diagnostic; you'd get the behavior you 
> > asked for, which may be utter nonsense.
> 
> One of the difficulties here (ignoring that the utter nonsense behaviour at 
> run time might be nasal demons - GPUs aren't always as polite as to issue a 
> `SIGILL` and graciously die:)) is that not all constructs / IR sequences / 
> ASM uses lower into ISA, so what the user is more likely to get is an ICE 
> with an error that makes no sense unless they work on LLVM. That's fairly 
> grim user experience, IMHO, and one that we have the ability to prevent.

Yeah, we obviously don't want the user experience to be compiler crashes. :-) 

> > Am I missing something still? If so, maybe it would be quicker for us to 
> > hop in a telecon call? I'm going to be out of the office until Monday, but 
> > I'm happy to meet with you if that's more productive.
> 
> I would be absolutely happy to if you think it'd help. I regret not coming to 
> the Sofia meeting, we could've probably sorted this out directly with a 
> laptop:)

FWIW, I'm still pretty uncomfortable about this design. I keep coming back to 
this feeling really novel and seeming like it's designe

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-18 Thread Alex Voicu via cfe-commits


@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
 QualType FromType = OnlyArg->getType();
+// __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+// type, although this is almost always an error and we advise against it

AlexVlx wrote:

> > The __has_builtin counter-example actually does not work and cannot work, 
> > please see: https://gcc.godbolt.org/z/7G5Y1d85b.
> 
> I cannot imagine a situation in which this isn't indicative of a bug, but 
> perhaps this situation is the same one that necessitated [this 
> PR](https://github.com/llvm/llvm-project/pull/126324#issuecomment-2706655366) 
> which eventually concluded that we should change the behavior of 
> `__has_builtin` rather than introduce a new builtin.
> 

This is not actually a bug, it's intended behaviour. To obtain what you expect 
the `b` would have to be `constexpr`, and then the `if` itself would have to be 
`if constexpr`. Otherwise there's no binding commitment to evaluate this at 
compile time (and, in effect, if this gets trivially evaluated / removed in the 
FE, it induces dependence on optimisation level).

> > furthermore, the ...later... bit is pretty important: what happens on that 
> > path?
> 
> Anything in the world besides changing the value of `has_builtin` to 
> something other than what `__has_builtin` returned.
> 
> > if you do any of those, your distant has_builtin variable no longer 
> > reflects the predicate, which is the issue;
> 
> Why is that an issue? If the variable no longer reflects the predicate, 
> that's not on the compiler to figure out how to deal with, that's "play silly 
> games, win silly prizes".
>

It is a difficult conversation to have and not exactly what users want to hear, 
so making it as hard as possible to end up in an exchange where you have to say 
"welp, that was a silly game" cannot hurt. If anything, it's compassionate 
behaviour!

> Backing up a step.. my expectation is that this eventually lowers down to a 
> test and jump which jumps past the target code if the test fails. e.g.,
> 
> ```
>   %0 = load i8, ptr %b, align 1
>   %loadedv = trunc i8 %0 to i1
>   br i1 %loadedv, label %if.then, label %if.end
> 
> if.then:
>   # the target-specific instructions live here
>   br label %if.end
> 
> if.end:
>   ret void
> ```
> 
> So we'd be generating instructions for the target which may be invalid if the 
> test lies. If something did change that value so it no longer represents the 
> predicate, I think that's UB (and we could help users catch that UB via a 
> sanitizer check if we wanted to, rather than try to make the backend have to 
> try to figure it out at compile time).
> 

This cannot work reliably (e.g. there are instructions that would simply fail 
at ISEL, and a run time jump doesn't mean that you do not lower to ISA the 
jumped around block), and introducing dependence on sanitizers seems not ideal. 
Furthermore, a run time jump isn't free, which is a concern for us, and we also 
already have a mechanism for that case (`__attribute__((target))`).  Note that 
these can also control e.g. resource allocation, so actually generating both 
might lead to arbitrary exhaustion of a limited resource, and spurious 
compilation failures, consider e.g. (I'll use CUDA/HIP syntax):

```cpp
// This is a bit odd, and technically a race because multiple lanes write to 
shared_buf
void foo() {
  __shared__ int* shared_buf;
  if (__builtin_amdgcn_processor_is("gfx950") {
__shared__ int buf[70 * 1024];
shared_buf = buf;
  } else {
__shared__ int buf[60 * 1024];
shared_buf = buf;
  }

  __syncthreads();
  // use shared_buf
```

If we tried to lower that we'd exhaust LDS, and spuriously fail to compile. 
This would have originated from perfectly valid uses of `#if 
defined(__gfx950__) #else`. We'd like these to work, so we must unambiguously 
do the fold ourselves.

> > if for a chain from point of cast to final point of use folding fails 
> > (because you passed your value to an
> > opaque function, modified it based on a run time value etc.), you get an 
> > error and a diagnostic.
> 
> I was thinking you would not get a diagnostic; you'd get the behavior you 
> asked for, which may be utter nonsense.
> 

One of the difficulties here (ignoring that the utter nonsense behaviour at run 
time might be nasal demons - GPUs aren't always as polite as to issue a 
`SIGILL` and graciously die:)) is that not all constructs / IR sequences / ASM 
uses lower into ISA, so what the user is more likely to get is an ICE with an 
error that makes no sense unless they work on LLVM. That's fairly grim user 
experience, IMHO, and one that we have the ability to prevent.

> Am I missing something still? If so, maybe it would be quicker for us to hop 
> in a telecon call? I'm going to be out of the office until Monday, but I'm 
> happy to meet with you if that's more productive.

I would be abso

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-18 Thread Aaron Ballman via cfe-commits


@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
 QualType FromType = OnlyArg->getType();
+// __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+// type, although this is almost always an error and we advise against it

AaronBallman wrote:

> I suspect that the sense that something is tied to optimiser behaviour is due 
> to my reply above, which perhaps was insufficiently clear - apologies.

No worries, this is complex stuff! I appreciate your willingness to talk me 
through it. :-)

> The __has_builtin counter-example actually does not work and cannot work, 
> please see: https://gcc.godbolt.org/z/7G5Y1d85b.

I cannot imagine a situation in which this isn't indicative of a bug, but 
perhaps this situation is the same one that necessitated [this 
PR](https://github.com/llvm/llvm-project/pull/126324#issuecomment-2706655366) 
which eventually concluded that we should change the behavior of 
`__has_builtin` rather than introduce a new builtin.

> furthermore, the ...later... bit is pretty important: what happens on that 
> path?

Anything in the world besides changing the value of `has_builtin` to something 
other than what `__has_builtin` returned.

>  if you do any of those, your distant has_builtin variable no longer reflects 
> the predicate, which is the issue;

Why is that an issue? If the variable no longer reflects the predicate, that's 
not on the compiler to figure out how to deal with, that's "play silly games, 
win silly prizes".

Backing up a step.. my expectation is that this eventually lowers down to a 
test and jump which jumps past the target code if the test fails. e.g.,
```
  %0 = load i8, ptr %b, align 1
  %loadedv = trunc i8 %0 to i1
  br i1 %loadedv, label %if.then, label %if.end

if.then:
  # the target-specific instructions live here
  br label %if.end

if.end:
  ret void
```
So we'd be generating instructions for the target which may be invalid if the 
test lies. If something did change that value so it no longer represents the 
predicate, I think that's UB (and we could help users catch that UB via a 
sanitizer check if we wanted to, rather than try to make the backend have to 
try to figure it out at compile time).

> if for a chain from point of cast to final point of use folding fails 
> (because you passed your value to an
opaque function, modified it based on a run time value etc.), you get an error 
and a diagnostic.

I was thinking you would not get a diagnostic; you'd get the behavior you asked 
for, which may be utter nonsense.

Am I missing something still? If so, maybe it would be quicker for us to hop in 
a telecon call? I'm going to be out of the office until Monday, but I'm happy 
to meet with you if that's more productive.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-18 Thread Alex Voicu via cfe-commits


@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
 QualType FromType = OnlyArg->getType();
+// __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+// type, although this is almost always an error and we advise against it

AlexVlx wrote:

Let's take these piecewise. Your first example actually works / those are 
equivalent. I think the danger here is assuming that the sort of easy to type 
examples we are playing with are representative of where issues show up - they 
are not. The cases where things break down are somewhat more intricate - I 
chose pushing into a container via a function with side-effects on purpose. 

I suspect that the sense that something is tied to optimiser behaviour is due 
to my reply above, which perhaps was insufficiently clear - apologies. I was 
trying to explain why making it trivial to store these as `bool`eans somewhere 
leads to having to run large parts of the optimisation pipeline. There is no 
dependence on the optimiser, `O0` and `Ox` behave in the same way in what 
regards the predicates, because we have a dedicated pass that unconditionally 
runs early in the pipeline, irrespective of optimisation level, and either 
succeeds at the needed folding or fails and diagnoses.

The __has_builtin counter-example actually does not work and cannot work, 
please see: . It fact, it's the essence of 
why we need these, the fact that that pattern does not work and cannot work, 
and yet it is extremely useful. Those situations are materially different 
because:

- this is not about calling some generic omni-available code, it's about 
calling target specific code - this has to be statically decided on in the 
compiler, we MUST know if the target can run it or not, which is why this is a 
target specific BI;
- furthermore, the `...later...` bit is pretty important: what happens on that 
path? do you pass the boolean by reference into an `extern` function which gets 
linked in at run time (i.e. no idea what it does)? do you mutate the value 
based on a run time value? if you do any of those, your distant `has_builtin` 
variable no longer reflects the predicate, which is the issue;
- the answer to the above bit might be "make it `constexpr`" - sure, but then 
it rolls back into not working for abstract targets / resolving these late, 
which is the gap in functionality with things like the `__has_builtin` macro 
that these try to fill.

` I think the default expectation is that you should be able to query the 
processor information at any point you want, store the results anywhere you 
want, and use them later with the expected semantics` - I don't think this is 
actually the case, unless what you are thinking about is `__builtin_cpu_is`, 
which is a different mechanism that operates at execution time.

Overall, this might be less profound / convoluted than we've made it seem:

1. use the predicates as intended, things work;
2. explicitly cast to `bool` and then stash:
a) if the chain formed from the point of cast to the final point of use can 
be folded in a terminator, for all uses of 
the cast, happy days;
b) if for a chain from point of cast to final point of use folding fails 
(because you passed your value to an 
opaque function, modified it based on a run time value etc.), you get 
an error and a diagnostic.

This is independent from optimisation level, and essentially matches what you 
would have to do with `__has_builtin` as well (except you'd have to make the 
stashed variable `constexpr` and then make the control structure be something 
like `if constexpr`).

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-18 Thread Aaron Ballman via cfe-commits


@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
 QualType FromType = OnlyArg->getType();
+// __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+// type, although this is almost always an error and we advise against it

AaronBallman wrote:

> The feature builtins allow a linear, inline transform from ... to ... which 
> is easy to teach / adopt and matches what folks are already doing, and just 
> works as is, both on concrete, compile-time finalised targets, and on 
> abstract run-time finalised ones. 

They work as-is so long as you hold them right. What I think is not easy to 
teach or adopt is the fact that it's tied so tightly to optimizer behavior in 
surprising ways. The fact that these two are not necessarily equivalent (esp as 
the logic becomes more complex) is a pretty reasonable concern to have:
```
if (__builtin_amdgcn_processor_is("gfx900"))
  do_stuff();

bool b = (bool)__builtin_amdgcn_processor_is("gfx900");
if (b)
  do_stuff();
```
because that's not intuitive. What's worse, debugging this when your intuition 
is wrong will be *incredibly* difficult because the behavior in a -O0 build is 
very likely going to do the right thing, so you end up debugging optimized code 
instead.

> Furthermore, is the current formulation particularly novel?

The shape of it isn't novel (a feature test which takes an argument and returns 
something truthy), which is actually why I'm pushing back so hard. This looks 
and feels like `__has_builtin` or other feature testing macros. AFAIK, those 
aren't tied to optimizer behavior so tightly. I think the default expectation 
is that you should be able to query the processor information at any point you 
want, store the results anywhere you want, and use them later with the expected 
semantics; any other behavior starts to feel like a miscompile.

As a counter-example, consider `__has_builtin`:
```
bool has_builtin = __has_builtin(__builtin_whatever);
// ...later...
if (has_builtin)
  __builtin_whatever();
```
if the call to `__has_builtin` returned false and we still ended up calling 
`__builtin_whatever`, users would be baffled as to why. And if it returned 
`true` and we didn't call the builtin, users would be just as baffled. So what 
I'm struggling to understand is why these situations are materially different 
from each other.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-18 Thread Alex Voicu via cfe-commits


@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
 QualType FromType = OnlyArg->getType();
+// __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+// type, although this is almost always an error and we advise against it

AlexVlx wrote:

Not really, although cleanliness, just like beauty, is in the eye of the 
beholder:) The feature builtins work locally, and naturally controls what might 
be a single ASM block, or builtin invocation, or mmio access etc. Going with 
target_clones has a number of implementation problems (it'd require a bunch of 
other things that are neither there nor impending for AMDGPU / SPIRV to 
materialise first), it forces outlining / adopting a different software 
development paradigm, and one that is fairly spammy (do I duplicate the full 
function (10s / 100s of lines, possibly) just because I might have some target 
specific behaviour? Do I start sharding out things that are target specific 
essentially creating a parallel world of `target_clones` based builtins, even 
though this code might be mature and gnarly etc.). The feature builtins allow a 
linear, inline transform from:

```cpp
#if defined(__gfx900__)
__builtin_amdgcn_only_on_gfx900();
#endif
```

to

```cpp
if (__builtin_amdgcn_processor_is("gfx900"))
__builtin_amdgcn_only_on_gfx900();
```
which is easy to teach / adopt and matches what folks are already doing, and 
just works as is, both on concrete, compile-time finalised targets, and on 
abstract run-time finalised ones. Also please consider that we are focusing on 
the `processor_is` case, however the preferable, more adaptable solution would 
be to check for the availability of a builtin, rather than an architecture, and 
that'd be even more spammy (and would make even more sense inline). 

Furthermore, is the current formulation particularly novel? It's simply a 
builtin that returns an irregular type with an explicit conversion operator for 
`bool`. Holding it right in this case matches what people already do; none of 
the exciting indirections we are discussing here would be at all possible 
today. Finally, we already have e.g. 
[__builtin_available](https://clang.llvm.org/docs/LanguageExtensions.html#objective-c-available)
 or `__nvvm_reflect`, which tackle similar problems via similar means, and can 
be misused in similar ways.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-18 Thread Aaron Ballman via cfe-commits


@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
 QualType FromType = OnlyArg->getType();
+// __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+// type, although this is almost always an error and we advise against it

AaronBallman wrote:

Thank you!

Wouldn't a cleaner design be: use the `__attribute__((target_clones))` 
attribute on a function declaration and call that function? e.g.,
```
// Original code:
void foo() {
  if (__builtin_amdgcn_processor_is("gfx900")) {
do_gfx900_stuff();
  }
}

// New code:
__attribute__((target_clones("gfx900")) inline void func() {
  do_gfx900_stuff();
}

__attribute__((target_clones("default")) inline void func() {
  do_fallback_stuff();
}

void foo() {
  func();
}
```
(If you can't tell, I'm still trying to find some way to accomplish what you 
need but without introducing a novel behavior for a builtin; I'm worried about 
the usability of the current design because the feature really only works if 
you hold it just right.)

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-18 Thread Aaron Ballman via cfe-commits


@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
 QualType FromType = OnlyArg->getType();
+// __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+// type, although this is almost always an error and we advise against it

AaronBallman wrote:

Let me see if I'm on the same page now. The backend will generate code for 
gfx900 and the programmer will guard that block of code with `if 
(__builtin_amdgcn_processor_is("gfx900"))`. So if the predicate and the block 
it controls become disjointed somehow, the user can get incorrect behavior at 
runtime. Am I on the right track?

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-17 Thread Alex Voicu via cfe-commits


@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
 QualType FromType = OnlyArg->getType();
+// __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+// type, although this is almost always an error and we advise against it

AlexVlx wrote:

Very good questions. Optimisation matters because of e.g. inlining - it would 
not really do much much in the case I outlined, except if you had `foo` inlined 
into `bar`, then statically resolved the predicate value and picked just that. 
This is doable, but not guaranteed, and wouldn't happen at O0, which induces a 
dependence we do not want. There is a dedicated pass we are adding which does 
basic const folding of a predicate, so if that doesn't work (because of wading 
into one of these intricacies), the user gets a fairly obnoxious error - we do 
try to tell them where there's a problem and what happened, but it's still a 
bit opaque.

But let's step back a bit to the first case i.e. why would stashing these in a 
vector / container be dangerous (but let's generalise to squirrelling in 
general - I picked a fairly banal example, but substitute, say, an 
`atomic` in there). Let's recall what these are meant for: controlling 
the lowering of target specific sequences. Now, irrespective if the target is 
concrete (`gfxSMTH`), i.e. we expand the predicate value in the FE, or abstract 
(`amdgcnspirv`), i.e. we only know the concrete target at run time, we 
absolutely MUST remove sequences that are not viable on a particular target, if 
it's not present. I.e. if we're on `gfx900`, we have to be able to statically 
and unambiguously remove sequences that are absolutely not viable there. 
Otherwise, what obtains is at best a very opaque BE error or, more dangerously, 
an ISA sequence being executed to unknowable consequences. If we just allow 
users to for arbitrarily complex squirrelling / accessing schemes, we lose that 
ability:

- even if we are on a concrete target, the FE cannot drop expressions with 
potential side-effects even if by some stroke of luck an arbitrarily complex 
stash + access chain were otherwise viable for full evaluation;
- on an abstract target the dedicated ME pass isn't going to look through 
calls, and if we tried to look through calls then we run into having to 
consider things like alloca removal, CSE etc. -> essentially we have to run 
pretty intrusive optimisation to even have hope, which messes up O0.

Overall, the intention is to make things difficult to misuse - these are 
definitely not just cute booleans, they have teeth and can bite! So mandating 
needing an explicit cast / making the type irregular gives an easy tell / code 
smell when some gnarly bug unavoidably shows up - "oh, you seem to have cast 
this here, what was the intention?".

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-17 Thread Aaron Ballman via cfe-commits


@@ -9102,6 +9102,15 @@ bool InitializationSequence::Diagnose(Sema &S,
 
   case FK_ConversionFailed: {
 QualType FromType = OnlyArg->getType();
+// __amdgpu_feature_predicate_t can be explicitly cast to the logical op
+// type, although this is almost always an error and we advise against it

AaronBallman wrote:

I think I'm still missing something. If
```
if (__builtin_amdgcn_processor_is("gfx900"))
  ps.push_back(true);
```
works why would
```
ps.push_back(__builtin_amdgcn_processor_is("gfx900"));
```
be dangerous?

> but is not something we can support in the limit (once it's buried under 5 
> additional layers of indirection, at -O0 etc.)

Why would optimization modes matter?

Apologies if these are dumb questions on my part. :-)


https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-16 Thread Yaxun Liu via cfe-commits

https://github.com/yxsamliu approved this pull request.

LGTM. Thanks

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-16 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

Gentle ping.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-10 Thread Alex Voicu via cfe-commits


@@ -0,0 +1,21 @@
+// REQUIRES: amdgpu-registered-target
+// REQUIRES: spirv-registered-target
+// RUN: %clang_cc1 -fsyntax-only -verify -triple amdgcn -Wno-unused-value %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple spirv64-amd-amdhsa 
-Wno-unused-value %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64 -aux-triple amdgcn 
-Wno-unused-value %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64 -aux-triple 
spirv64-amd-amdhsa -Wno-unused-value %s
+
+// expected-no-diagnostics
+
+#define __device__ __attribute__((device))
+#define __global__ __attribute__((global))
+
+__device__ void foo() {
+if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_permlanex16))
+return __builtin_trap();
+}
+
+__global__ void bar() {
+if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_permlanex16))

AlexVlx wrote:

Excellent question - what happens is we diagnose  
'err_amdgcn_is_invocable_arg_invalid_value'; this is tested covered in 
`clang/test/CodeGen/amdgpu-feature-builtins-invalid-use.cpp`, see e.g. around 
line 40.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-10 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-10 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,21 @@
+// REQUIRES: amdgpu-registered-target
+// REQUIRES: spirv-registered-target
+// RUN: %clang_cc1 -fsyntax-only -verify -triple amdgcn -Wno-unused-value %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple spirv64-amd-amdhsa 
-Wno-unused-value %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64 -aux-triple amdgcn 
-Wno-unused-value %s
+// RUN: %clang_cc1 -fsyntax-only -verify -triple x86_64 -aux-triple 
spirv64-amd-amdhsa -Wno-unused-value %s
+
+// expected-no-diagnostics
+
+#define __device__ __attribute__((device))
+#define __global__ __attribute__((global))
+
+__device__ void foo() {
+if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_permlanex16))
+return __builtin_trap();
+}
+
+__global__ void bar() {
+if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_permlanex16))

jhuber6 wrote:

What happens when you call this with a normal function or invalid name?

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-10 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 commented:

This looks generally good to me, but I'll let the clang code owners make the 
final decision.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-10 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

Gentle ping.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-02 Thread Alex Voicu via cfe-commits

https://github.com/AlexVlx edited 
https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-06-02 Thread Alex Voicu via cfe-commits

https://github.com/AlexVlx commented:

> First, please take a look at the LLVM coding standard re the use of 'auto'.
> 
> Second: The use of a special type for these builtins is a little novel 
> (though I see the predicate type already exists?), but I guess I'm ok with 
> it.  I have some concerns with how the conversions for it work, particularly 
> being represented always as an `i1`, but the tests you have look about right.
> 
> I would like to see a test that is effectively:
> 
> ```
> bool f() {
> return __builtin_amdgcn_processor_is(...);
> }
> ```
> 
> (and maybe one returning 'auto' to make sure it is deduced properly). 

Apologies, I missed this earlier, and only got around to adding them now. 
Please do note that for cases where the function return type is `bool` one 
cannot directly return a predicate value, as it does not get implicitly casted 
to `bool`. This matches the behaviour of a type with an `explicit` conversion 
operator, which is what we're modelling.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-22 Thread Alex Voicu via cfe-commits


@@ -4966,6 +4966,89 @@ If no address spaces names are provided, all address 
spaces are fenced.
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global")
 
+__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable
+^^^
+
+``__builtin_amdgcn_processor_is`` and ``__builtin_amdgcn_is_invocable`` provide
+a functional mechanism for programatically querying:
+
+* the identity of the current target processor;
+* the capability of the current target processor to invoke a particular 
builtin.
+
+**Syntax**:
+
+.. code-block:: c
+
+  __amdgpu_feature_predicate_t __builtin_amdgcn_processor_is(const char*);
+  __amdgpu_feature_predicate_t __builtin_amdgcn_is_invocable(builtin_name);
+
+**Example of use**:
+
+.. code-block:: c++
+
+  if (__builtin_amdgcn_processor_is("gfx1201") ||
+  __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var))
+__builtin_amdgcn_s_sleep_var(x);
+
+  if (!__builtin_amdgcn_processor_is("gfx906"))
+__builtin_amdgcn_s_wait_event_export_ready();
+  else if (__builtin_amdgcn_processor_is("gfx1010") ||
+   __builtin_amdgcn_processor_is("gfx1101"))
+__builtin_amdgcn_s_ttracedata_imm(1);
+
+  while (__builtin_amdgcn_processor_is("gfx1101")) *p += x;
+
+  do {
+break;
+  } while (__builtin_amdgcn_processor_is("gfx1010"));
+
+  for (; __builtin_amdgcn_processor_is("gfx1201"); ++*p) break;
+
+  if 
(__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready))
+__builtin_amdgcn_s_wait_event_export_ready();
+  else if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_ttracedata_imm))
+__builtin_amdgcn_s_ttracedata_imm(1);
+
+  do {
+break;
+  } while (
+  __builtin_amdgcn_is_invocable(__builtin_amdgcn_global_load_tr_b64_i32));
+
+  for (; __builtin_amdgcn_is_invocable(__builtin_amdgcn_permlane64); ++*p)
+break;
+
+**Description**:
+
+The builtins return a value of type ``__amdgpu_feature_predicate_t``, which is 
a
+target specific type that behaves as if its C++ definition was the following:

AlexVlx wrote:

Yes, it does work in C, although I'm not convinced it will see significant use. 
That being said, there's no good reason to make it C++ only. I have tried to 
add an explanation for that context / provide a fleshed out example, please do 
let me know if it's more or less aligned with what you had in mind. Thanks!

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-22 Thread Alex Voicu via cfe-commits


@@ -29,6 +29,8 @@ MODULE_PASS("amdgpu-printf-runtime-binding", 
AMDGPUPrintfRuntimeBindingPass())
 MODULE_PASS("amdgpu-remove-incompatible-functions", 
AMDGPURemoveIncompatibleFunctionsPass(*this))
 MODULE_PASS("amdgpu-sw-lower-lds", AMDGPUSwLowerLDSPass(*this))
 MODULE_PASS("amdgpu-unify-metadata", AMDGPUUnifyMetadataPass())
+MODULE_PASS("amdgpu-expand-feature-predicates",
+AMDGPUExpandFeaturePredicatesPass(*this))

AlexVlx wrote:

I'm not sure either, but better safe than sorry:) Done.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-22 Thread Alex Voicu via cfe-commits


@@ -13338,4 +13338,23 @@ def err_acc_device_type_multiple_archs
 // AMDGCN builtins diagnostics
 def err_amdgcn_load_lds_size_invalid_value : Error<"invalid size value">;
 def note_amdgcn_load_lds_size_valid_value : Note<"size must be %select{1, 2, 
or 4|1, 2, 4, 12 or 16}0">;
+def err_amdgcn_processor_is_arg_not_literal
+: Error<"the argument to __builtin_amdgcn_processor_is must be a string "
+"literal">;
+def err_amdgcn_processor_is_arg_invalid_value
+: Error<"the argument to __builtin_amdgcn_processor_is must be a valid "
+"AMDGCN processor identifier; '%0' is not valid">;

AlexVlx wrote:

Yes, this is an excellent idea, thank you for it. Done.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Erich Keane via cfe-commits

https://github.com/erichkeane commented:

First, please take a look at the LLVM coding standard re the use of 'auto'.

Second: The use of a special type for these builtins is a little novel (though 
I see the predicate type already exists?), but I guess I'm ok with it.  I have 
some concerns with how the conversions for it work, particularly being 
represented always as an `i1`, but the tests you have look about right.

I would like to see a test that is effectively:

```
bool f() {
return __builtin_amdgcn_processor_is(...);
}
```

(and maybe one returning 'auto' to make sure it is deduced properly).

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Erich Keane via cfe-commits


@@ -13338,4 +13338,23 @@ def err_acc_device_type_multiple_archs
 // AMDGCN builtins diagnostics
 def err_amdgcn_load_lds_size_invalid_value : Error<"invalid size value">;
 def note_amdgcn_load_lds_size_valid_value : Note<"size must be %select{1, 2, 
or 4|1, 2, 4, 12 or 16}0">;
+def err_amdgcn_processor_is_arg_not_literal
+: Error<"the argument to __builtin_amdgcn_processor_is must be a string "
+"literal">;
+def err_amdgcn_processor_is_arg_invalid_value
+: Error<"the argument to __builtin_amdgcn_processor_is must be a valid "
+"AMDGCN processor identifier; '%0' is not valid">;

erichkeane wrote:

Is there value/etc to printing the list like we do with -`march`?

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Erich Keane via cfe-commits


@@ -581,6 +581,9 @@ llvm::Type *CodeGenTypes::ConvertType(QualType T) {
   case BuiltinType::Id:
\
 return llvm::TargetExtType::get(getLLVMContext(), "amdgcn.named.barrier",  
\
 {}, {Scope});
+#define AMDGPU_FEATURE_PREDICATE_TYPE(Name, Id, SingletonId, Width, Align) 
\
+  case BuiltinType::Id:
\
+return llvm::IntegerType::getInt1Ty(getLLVMContext());

erichkeane wrote:

Why an int-1 type instead of 'bool' type?  Won't this cause problems if it is 
returned? Are we making sure we force casts correctly, else this is going to be 
a bug factory when emitting it.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Erich Keane via cfe-commits


@@ -366,4 +367,72 @@ void SemaAMDGPU::handleAMDGPUMaxNumWorkGroupsAttr(Decl *D,
   addAMDGPUMaxNumWorkGroupsAttr(D, AL, AL.getArgAsExpr(0), YExpr, ZExpr);
 }
 
+Expr *SemaAMDGPU::ExpandAMDGPUPredicateBI(CallExpr *CE) {
+  auto &Ctx = getASTContext();

erichkeane wrote:

None of these are supposed to be 'auto' unless the type itself is on the RHS.  
Goes for most of this function.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Erich Keane via cfe-commits


@@ -64,6 +68,11 @@ class SemaAMDGPU : public SemaBase {
   void handleAMDGPUNumVGPRAttr(Decl *D, const ParsedAttr &AL);
   void handleAMDGPUMaxNumWorkGroupsAttr(Decl *D, const ParsedAttr &AL);
   void handleAMDGPUFlatWorkGroupSizeAttr(Decl *D, const ParsedAttr &AL);
+
+  /// Expand a valid use of the feature identification builtins into its
+  /// corresponding sequence of instructions.
+  Expr *ExpandAMDGPUPredicateBI(CallExpr *CE);

erichkeane wrote:

```suggestion
  Expr *ExpandAMDGPUPredicateBuiltIn(CallExpr *CE);
```

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Erich Keane via cfe-commits


@@ -758,6 +758,10 @@ AMDGPU Support
 ^^
 
 - Bump the default code object version to 6. ROCm 6.3 is required to run any 
program compiled with COV6.
+- Introduced a new target specific builtin ``__builtin_amdgcn_processor_is``,
+  a late / deferred query for the current target processor

erichkeane wrote:

```suggestion
  a late / deferred query for the current target processor.
```

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Erich Keane via cfe-commits


@@ -6653,6 +6654,22 @@ ExprResult Sema::BuildCallExpr(Scope *Scope, Expr *Fn, 
SourceLocation LParenLoc,
   if (Result.isInvalid()) return ExprError();
   Fn = Result.get();
 
+  // The __builtin_amdgcn_is_invocable builtin is special, and will be resolved
+  // later, when we check boolean conditions, for now we merely forward it
+  // without any additional checking.
+  if (Fn->getType() == Context.BuiltinFnTy && ArgExprs.size() == 1 &&
+  ArgExprs[0]->getType() == Context.BuiltinFnTy) {
+auto *FD = cast(Fn->getReferencedDeclOfCallee());
+
+if (FD->getName() == "__builtin_amdgcn_is_invocable") {
+  auto FnPtrTy = Context.getPointerType(FD->getType());

erichkeane wrote:

More 'auto' use.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Erich Keane via cfe-commits


@@ -758,6 +758,10 @@ AMDGPU Support
 ^^
 
 - Bump the default code object version to 6. ROCm 6.3 is required to run any 
program compiled with COV6.
+- Introduced a new target specific builtin ``__builtin_amdgcn_processor_is``,
+  a late / deferred query for the current target processor
+- Introduced a new target specific builtin ``__builtin_amdgcn_is_invocable``,
+  which enables fine-grained, per-builtin, feature availability

erichkeane wrote:

```suggestion
  which enables fine-grained, per-builtin, feature availability.
```

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Erich Keane via cfe-commits

https://github.com/erichkeane edited 
https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Aaron Ballman via cfe-commits

https://github.com/AaronBallman commented:

I think the proposed approach is a reasonable direction. WDYT @erichkeane ?

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Aaron Ballman via cfe-commits


@@ -4966,6 +4966,89 @@ If no address spaces names are provided, all address 
spaces are fenced.
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global")
 
+__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable
+^^^
+
+``__builtin_amdgcn_processor_is`` and ``__builtin_amdgcn_is_invocable`` provide
+a functional mechanism for programatically querying:
+
+* the identity of the current target processor;
+* the capability of the current target processor to invoke a particular 
builtin.
+
+**Syntax**:
+
+.. code-block:: c
+
+  __amdgpu_feature_predicate_t __builtin_amdgcn_processor_is(const char*);
+  __amdgpu_feature_predicate_t __builtin_amdgcn_is_invocable(builtin_name);
+
+**Example of use**:
+
+.. code-block:: c++
+
+  if (__builtin_amdgcn_processor_is("gfx1201") ||
+  __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var))
+__builtin_amdgcn_s_sleep_var(x);
+
+  if (!__builtin_amdgcn_processor_is("gfx906"))
+__builtin_amdgcn_s_wait_event_export_ready();
+  else if (__builtin_amdgcn_processor_is("gfx1010") ||
+   __builtin_amdgcn_processor_is("gfx1101"))
+__builtin_amdgcn_s_ttracedata_imm(1);
+
+  while (__builtin_amdgcn_processor_is("gfx1101")) *p += x;
+
+  do {
+break;
+  } while (__builtin_amdgcn_processor_is("gfx1010"));
+
+  for (; __builtin_amdgcn_processor_is("gfx1201"); ++*p) break;
+
+  if 
(__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready))
+__builtin_amdgcn_s_wait_event_export_ready();
+  else if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_ttracedata_imm))
+__builtin_amdgcn_s_ttracedata_imm(1);
+
+  do {
+break;
+  } while (
+  __builtin_amdgcn_is_invocable(__builtin_amdgcn_global_load_tr_b64_i32));
+
+  for (; __builtin_amdgcn_is_invocable(__builtin_amdgcn_permlane64); ++*p)
+break;
+
+**Description**:
+
+The builtins return a value of type ``__amdgpu_feature_predicate_t``, which is 
a
+target specific type that behaves as if its C++ definition was the following:

AaronBallman wrote:

Does this builtin work in C? If so, the docs should be updated to make it clear 
that this behavior applies to C as well as C++ and explain what it means in a 
bit more detail (presume that C users have no idea how C++ idioms work).

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Aaron Ballman via cfe-commits

https://github.com/AaronBallman edited 
https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Yaxun Liu via cfe-commits


@@ -585,6 +597,23 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 llvm::Value *Env = EmitScalarExpr(E->getArg(0));
 return Builder.CreateCall(F, {Env});
   }
+  case AMDGPU::BI__builtin_amdgcn_processor_is: {
+assert(CGM.getTriple().isSPIRV() &&

yxsamliu wrote:

I am OK with leaving this for future

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-16 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

Gentle ping.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-12 Thread Alex Voicu via cfe-commits

https://github.com/AlexVlx updated 
https://github.com/llvm/llvm-project/pull/134016



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-08 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

@erichkeane @AaronBallman if, when you have time, you could please indicate if 
the new direction is at least generally aligned with what you had in mind, it'd 
be appreciated!

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-07 Thread Alex Voicu via cfe-commits

https://github.com/AlexVlx updated 
https://github.com/llvm/llvm-project/pull/134016



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-07 Thread Alex Voicu via cfe-commits

https://github.com/AlexVlx edited 
https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-07 Thread Alex Voicu via cfe-commits

https://github.com/AlexVlx commented:

> A side question, is it legal to use the builtin in unstructured control flow, 
> like here: https://godbolt.org/z/no7Kzv19r ? Note, if the answer is "no", 
> then enforcing the builtin to initialize something would (probably) 
> automatically prevent this case, as clang would error out with:
> ```
> error: cannot jump from this goto statement to its label
> note: jump bypasses variable initialization
> ``` 

Many thanks for asking this, I had completely ignored this scenario! I've added 
handling for this which is symmetric with what we already do for `if constexpr` 
or `if available`; indeed, the answer to your question was "no":)


https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-05 Thread Alex Voicu via cfe-commits

https://github.com/AlexVlx updated 
https://github.com/llvm/llvm-project/pull/134016



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-05-02 Thread Alex Voicu via cfe-commits

https://github.com/AlexVlx updated 
https://github.com/llvm/llvm-project/pull/134016



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-29 Thread Joseph Huber via cfe-commits


@@ -29,6 +29,8 @@ MODULE_PASS("amdgpu-printf-runtime-binding", 
AMDGPUPrintfRuntimeBindingPass())
 MODULE_PASS("amdgpu-remove-incompatible-functions", 
AMDGPURemoveIncompatibleFunctionsPass(*this))
 MODULE_PASS("amdgpu-sw-lower-lds", AMDGPUSwLowerLDSPass(*this))
 MODULE_PASS("amdgpu-unify-metadata", AMDGPUUnifyMetadataPass())
+MODULE_PASS("amdgpu-expand-feature-predicates",
+AMDGPUExpandFeaturePredicatesPass(*this))

jhuber6 wrote:

I forget if these passes run in order, but shouldn't this run as soon as 
possible?

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-16 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> 
> You could default the warning to an error to make it more visible to the 
> user. Not certain if that's a bad idea or not though.
> 

Possibly. I had to drop this for a bit so circling around again, so apologies 
for the late reply. In the meanwhile, @epilk pointed out something I spaced out 
on, which is that we have some related / somewhat similar feature in Clang 
already: 
https://clang.llvm.org/docs/LanguageExtensions.html#objective-c-available, 
which is permissive along the lines of what you and @erichkeane are suggesting 
(a bit too permissive for my colleagues, I suspect, but that can be addressed 
by erroring). The Sema checking around that is rather extensive and I'd like to 
avoid duplicating / adding that amount of code.

> > If I may be so bold as to inquire: would you and @AaronBallman be slightly 
> > less horrified if the return type variance would be replaced with returning 
> > an odd type that only knows how to `bool`ify itself in conditions? More 
> > explicitly, if instead of `void __builtin_amdgcn_processor_is(const char*)` 
> > what we see is `__amdgpu_predicate_t __builtin_amdgcn_processor_is(const 
> > char*)`, would that be somewhat less bad? There is precedent for 
> > special-ish builtins returning special-ish target types (please consider 
> > `__amdgpu_buffer_rsrc_t` for `__builtin_amdgcn_make_buffer_rsrc` or 
> > `svcount_t`)
> 
> That would be just as weird, IMO. Having something that's contextually 
> converted to bool but only in some contexts, is going to be confusing in 
> practice.

Well, there's an interim solution here where it'd actually look essentially 
like so:

```cpp
struct __amdgpu_predicate_t { explicit operator bool() const noexcept; }
```

Which forces the user to consciously cast, unless in a valid (e.g. `if` 
context), and allows for the Sema checking to be funneled into a single point 
(the cast). Obligatory handwavium godbolt for illustration: 



https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-16 Thread Aaron Ballman via cfe-commits

AaronBallman wrote:

> > IMO, we are better served by a warning diagnostic if we detect these are 
> > misused. `ParseCXXCondition` (or the C equivalent, but since you are 
> > returning bool it seems you're not concerned about C?) might be a good 
> > place to set a variable to enable the warning.
> 
> This is a very good suggestion, thank you very much for it - it might well be 
> where we end up. My worry is that ignoring warning and diagnostics is rather 
> common. 

You could default the warning to an error to make it more visible to the user. 
Not certain if that's a bad idea or not though.

> If I may be so bold as to inquire: would you and @AaronBallman be slightly 
> less horrified if the return type variance would be replaced with returning 
> an odd type that only knows how to `bool`ify itself in conditions? More 
> explicitly, if instead of `void __builtin_amdgcn_processor_is(const char*)` 
> what we see is `__amdgpu_predicate_t __builtin_amdgcn_processor_is(const 
> char*)`, would that be somewhat less bad? There is precedent for special-ish 
> builtins returning special-ish target types (please consider 
> `__amdgpu_buffer_rsrc_t` for `__builtin_amdgcn_make_buffer_rsrc` or 
> `svcount_t`)

That would be just as weird, IMO. Having something that's contextually 
converted to bool but only in some contexts, is going to be confusing in 
practice.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-14 Thread Alex Voicu via cfe-commits

https://github.com/AlexVlx updated 
https://github.com/llvm/llvm-project/pull/134016

>From 91eeaf02336e539f14dcb0a79ff15dbe8befe6f1 Mon Sep 17 00:00:00 2001
From: Alex Voicu 
Date: Wed, 2 Apr 2025 02:47:42 +0100
Subject: [PATCH 01/11] Add the functional identity and feature queries.

---
 clang/docs/LanguageExtensions.rst | 110 ++
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   5 +
 .../clang/Basic/DiagnosticSemaKinds.td|  10 +
 clang/lib/Basic/Targets/SPIR.cpp  |   4 +
 clang/lib/Basic/Targets/SPIR.h|   4 +
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp   |  29 ++
 clang/lib/Sema/SemaExpr.cpp   | 157 
 clang/test/CodeGen/amdgpu-builtin-cpu-is.c|  65 
 .../CodeGen/amdgpu-builtin-is-invocable.c |  64 
 .../amdgpu-feature-builtins-invalid-use.cpp   |  43 +++
 llvm/lib/Target/AMDGPU/AMDGPU.h   |   9 +
 .../AMDGPU/AMDGPUExpandPseudoIntrinsics.cpp   | 207 ++
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def |   2 +
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |   3 +-
 llvm/lib/Target/AMDGPU/CMakeLists.txt |   1 +
 ...pu-expand-feature-predicates-unfoldable.ll |  28 ++
 .../amdgpu-expand-feature-predicates.ll   | 359 ++
 17 files changed, 1099 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/CodeGen/amdgpu-builtin-cpu-is.c
 create mode 100644 clang/test/CodeGen/amdgpu-builtin-is-invocable.c
 create mode 100644 clang/test/CodeGen/amdgpu-feature-builtins-invalid-use.cpp
 create mode 100644 llvm/lib/Target/AMDGPU/AMDGPUExpandPseudoIntrinsics.cpp
 create mode 100644 
llvm/test/CodeGen/AMDGPU/amdgpu-expand-feature-predicates-unfoldable.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/amdgpu-expand-feature-predicates.ll

diff --git a/clang/docs/LanguageExtensions.rst 
b/clang/docs/LanguageExtensions.rst
index 3b8a9cac6587a..8a7cb75af13e5 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -4920,6 +4920,116 @@ If no address spaces names are provided, all address 
spaces are fenced.
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global")
 
+__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable
+^^^
+
+``__builtin_amdgcn_processor_is`` and ``__builtin_amdgcn_is_invocable`` provide
+a functional mechanism for programatically querying:
+
+* the identity of the current target processor;
+* the capability of the current target processor to invoke a particular 
builtin.
+
+**Syntax**:
+
+.. code-block:: c
+
+  // When used as the predicate for a control structure
+  bool __builtin_amdgcn_processor_is(const char*);
+  bool __builtin_amdgcn_is_invocable(builtin_name);
+  // Otherwise
+  void __builtin_amdgcn_processor_is(const char*);
+  void __builtin_amdgcn_is_invocable(void);
+
+**Example of use**:
+
+.. code-block:: c++
+
+  if (__builtin_amdgcn_processor_is("gfx1201") ||
+  __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var))
+__builtin_amdgcn_s_sleep_var(x);
+
+  if (!__builtin_amdgcn_processor_is("gfx906"))
+__builtin_amdgcn_s_wait_event_export_ready();
+  else if (__builtin_amdgcn_processor_is("gfx1010") ||
+   __builtin_amdgcn_processor_is("gfx1101"))
+__builtin_amdgcn_s_ttracedata_imm(1);
+
+  while (__builtin_amdgcn_processor_is("gfx1101")) *p += x;
+
+  do { *p -= x; } while (__builtin_amdgcn_processor_is("gfx1010"));
+
+  for (; __builtin_amdgcn_processor_is("gfx1201"); ++*p) break;
+
+  if 
(__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready))
+__builtin_amdgcn_s_wait_event_export_ready();
+  else if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_ttracedata_imm))
+__builtin_amdgcn_s_ttracedata_imm(1);
+
+  do {
+*p -= x;
+  } while 
(__builtin_amdgcn_is_invocable(__builtin_amdgcn_global_load_tr_b64_i32));
+
+  for (; __builtin_amdgcn_is_invocable(__builtin_amdgcn_permlane64); ++*p) 
break;
+
+**Description**:
+
+When used as the predicate value of the following control structures:
+
+.. code-block:: c++
+
+  if (...)
+  while (...)
+  do { } while (...)
+  for (...)
+
+be it directly, or as arguments to logical operators such as ``!, ||, &&``, the
+builtins return a boolean value that:
+
+* indicates whether the current target matches the argument; the argument MUST
+  be a string literal and a valid AMDGPU target
+* indicates whether the builtin function passed as the argument can be invoked
+  by the current target; the argument MUST be either a generic or AMDGPU
+  specific builtin name
+
+Outside of these contexts, the builtins have a ``void`` returning signature
+which prevents their misuse.
+
+**Example of invalid use**:
+
+.. code-block:: c++
+
+  void kernel(int* p, int x, bool (*pfn)(bool), const char* str) {
+if (__builtin_amdgcn_processor_is("not_an_amdgcn_gfx_id")) return;
+else if (__b

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-05 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> > This change adds two semi-magical builtins for AMDGPU:
> > ```
> > * `__builtin_amdgcn_processor_is`, which is similar in observable behaviour 
> > with `__builtin_cpu_is`, except that it is never "evaluated" at run time;
> > 
> > * `__builtin_amdgcn_is_invocable`, which is behaviourally similar with 
> > `__has_builtin`, except that it is not a macro (i.e. not evaluated at 
> > preprocessing time).
> > ```
> > 
> > 
> > 
> >   
> > 
> > 
> >   
> > 
> > 
> > 
> >   
> > Neither of these are `constexpr`, even though when compiling for concrete 
> > (i.e. `gfxXXX` / `gfxXXX-generic`) targets they get evaluated in Clang, so 
> > they shouldn't tear the AST too badly / at all for multi-pass compilation 
> > cases like HIP. They can only be used in specific contexts (as args to 
> > control structures).
> > The motivation for adding these is two-fold:
> > ```
> > * as a nice to have, it provides an AST-visible way to incorporate 
> > architecture specific code, rather than having to rely on macros and the 
> > preprocessor, which burn in the choice quite early;
> > 
> > * as a must have, it allows featureful AMDGCN flavoured SPIR-V to be 
> > produced, where target specific capability is guarded and chosen or 
> > discarded when finalising compilation for a concrete target.
> > ```
> > 
> > 
> > 
> >   
> > 
> > 
> >   
> > 
> > 
> > 
> >   
> > I've tried to keep the overall footprint of the change small. The changes 
> > to Sema are a bit unpleasant, but there was a strong desire to have Clang 
> > validate these, and to constrain their uses, and this was the most compact 
> > solution I could come up with (suggestions welcome).
> > In the end, I will note there is nothing that is actually AMDGPU specific 
> > here, so it is possible that in the future, assuming interests from other 
> > targets / users, we'd just promote them to generic intrinsics.
> 
> First read through this, I find myself wondering WHY these aren't constexpr. 
> They seem exactly the sort of thing that folks would like to use `if 
> constexpr` for.

There are a few reasons, primarily though:

1. at least for builtin checking `__has_builtin` already exists, and that can 
get lumped into `if constexpr` and pretty much quacks like the same duck, so it 
felt superfluous (this is not a particularly strong reason though);
2. for an abstract target (in this case AMDGCNSPIRV) it cannob be 
constexpr/consteval, because when you're doing the initial compilation to 
SPIR-V you don't actually know which target you'll get eventually finalised for 
(this is one of the primary motivations for these existing, allowing on to 
generate "adaptable" AMDGCN SPIR-V that can tightly clamp to target features at 
finalisation time);
3. if they would be sometimes constexpr i.e. when compiling for a concrete 
target, and sometimes not constexpr, i.e. when compiling for an abstract one, 
the user would eventually have to end up doing something like an ifdef guard 
around their if clauses (to figure out whether or not it is if constexpr), or 
start playing games defining `MAYBE_CONSTEXPR` macros etc., which would be 
quite unfortunate.

Concerns around early AST tearing resulting from (3) in multi-pass compilation 
cases like HIP were also considerable. 

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-05 Thread Alex Voicu via cfe-commits


@@ -4920,6 +4920,116 @@ If no address spaces names are provided, all address 
spaces are fenced.
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global")
 
+__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable
+^^^
+
+``__builtin_amdgcn_processor_is`` and ``__builtin_amdgcn_is_invocable`` provide
+a functional mechanism for programatically querying:
+
+* the identity of the current target processor;
+* the capability of the current target processor to invoke a particular 
builtin.
+
+**Syntax**:
+
+.. code-block:: c
+
+  // When used as the predicate for a control structure
+  bool __builtin_amdgcn_processor_is(const char*);
+  bool __builtin_amdgcn_is_invocable(builtin_name);
+  // Otherwise

AlexVlx wrote:

Type, when "observable", is always `void`. So e.g. 
`decltype(__builtin_amdgcn_processor_is(...)), 
sizeof(__builtin_amdgcn_processor_is(...)), auto x = 
__builtin_amdgcn_processor_is(...); decltype(x)` would always be `void` / 
errors. I will pick up the other two Qs in a more thorough reply.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-05 Thread Erich Keane via cfe-commits


@@ -4920,6 +4920,116 @@ If no address spaces names are provided, all address 
spaces are fenced.
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global")
 
+__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable
+^^^
+
+``__builtin_amdgcn_processor_is`` and ``__builtin_amdgcn_is_invocable`` provide
+a functional mechanism for programatically querying:
+
+* the identity of the current target processor;
+* the capability of the current target processor to invoke a particular 
builtin.
+
+**Syntax**:
+
+.. code-block:: c
+
+  // When used as the predicate for a control structure
+  bool __builtin_amdgcn_processor_is(const char*);
+  bool __builtin_amdgcn_is_invocable(builtin_name);
+  // Otherwise

erichkeane wrote:

What about when that context is inside of an `if`?  

Either way, I'm pretty against the `void` return type change part of this 
design. It seems like a poor design at that point.  @AaronBallman can comment 
if he'd like, but I suspect he agrees with me.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-05 Thread Erich Keane via cfe-commits


@@ -284,6 +284,18 @@ void 
CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
   Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
 }
 
+static Value *GetOrInsertAMDGPUPredicate(CodeGenFunction &CGF, Twine Name) {
+  auto PTy = IntegerType::getInt1Ty(CGF.getLLVMContext());
+
+  auto P = cast(

erichkeane wrote:

```suggestion
  auto *P = cast(
```
?

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-05 Thread Erich Keane via cfe-commits

https://github.com/erichkeane edited 
https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-05 Thread Alex Voicu via cfe-commits


@@ -15576,6 +15609,38 @@ static bool isOverflowingIntegerType(ASTContext &Ctx, 
QualType T) {
   return Ctx.getIntWidth(T) >= Ctx.getIntWidth(Ctx.IntTy);
 }
 
+static Expr *ExpandAMDGPUPredicateBI(ASTContext &Ctx, CallExpr *CE) {
+  if (!CE->getBuiltinCallee())
+return CXXBoolLiteralExpr::Create(Ctx, false, Ctx.BoolTy, 
CE->getExprLoc());
+
+  if (Ctx.getTargetInfo().getTriple().isSPIRV()) {
+CE->setType(Ctx.getLogicalOperationType());
+return CE;
+  }
+
+  bool P = false;
+  auto &TI = Ctx.getTargetInfo();
+
+  if (CE->getDirectCallee()->getName() == "__builtin_amdgcn_processor_is") {

AlexVlx wrote:

These are both true, and I thank you for the feedback. As a relatively weak 
retort, I will note that:
- I went for the names because it felt a bit icky to add the AMDGPU specific 
builtin header, considering we're trying to limit the scope of these; also I 
did not feel confident enough to make these generic Clang BIs (for good reason, 
as the review shows:));
- The call to this function comes after having already checked that the Callee 
is one of the predicates, `IsAMDGPUPredicateBI` and `ValidateAMDGPUPredicateBI` 
get called before, so the precondition that we are indeed dealing with the 
magical BIs is established; furthermore, we're already checking upon entry that 
the Callee is indeed a builtin, and I *believe* that builtins always have 
non-elaborated names which can always be obtained via getName - I could be 
wrong here.
Having said that, using the Builtin IDs would indeed be nicer, so I can switch 
to that, thank you for the suggestion.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-04 Thread Alex Voicu via cfe-commits


@@ -0,0 +1,64 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --check-globals all --version 5
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx900 -emit-llvm %s 
-o - | FileCheck --check-prefix=AMDGCN-GFX900 %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx1010 -emit-llvm %s 
-o - | FileCheck --check-prefix=AMDGCN-GFX1010 %s
+// RUN: %clang_cc1 -triple spirv64-amd-amdhsa -emit-llvm %s -o - | FileCheck 
--check-prefix=AMDGCNSPIRV %s
+
+// Test that, depending on triple and, if applicable, target-cpu, one of three
+// things happens:
+//1) for gfx900 we emit an empty kernel (concrete target, lacks feature)
+//2) for gfx1010 we emit a call to trap (concrete target, has feature)
+//3) for AMDGCNSPIRV we emit llvm.amdgcn.has.gfx10-insts as a constant
+//   externally initialised bool global, and load from it to provide the
+//   condition to a br (abstract target)
+
+//.
+// AMDGCNSPIRV: @llvm.amdgcn.has.gfx10-insts = external addrspace(1) 
externally_initialized constant i1
+//.
+// AMDGCN-GFX900-LABEL: define dso_local void @foo(
+// AMDGCN-GFX900-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGCN-GFX900-NEXT:  [[ENTRY:.*:]]
+// AMDGCN-GFX900-NEXT:ret void
+//
+// AMDGCN-GFX1010-LABEL: define dso_local void @foo(
+// AMDGCN-GFX1010-SAME: ) #[[ATTR0:[0-9]+]] {
+// AMDGCN-GFX1010-NEXT:  [[ENTRY:.*:]]
+// AMDGCN-GFX1010-NEXT:call void @llvm.trap()
+// AMDGCN-GFX1010-NEXT:ret void
+//
+// AMDGCNSPIRV-LABEL: define spir_func void @foo(
+// AMDGCNSPIRV-SAME: ) addrspace(4) #[[ATTR0:[0-9]+]] {
+// AMDGCNSPIRV-NEXT:  [[ENTRY:.*:]]
+// AMDGCNSPIRV-NEXT:[[TMP0:%.*]] = load i1, ptr addrspace(1) 
@llvm.amdgcn.has.gfx10-insts, align 1
+// AMDGCNSPIRV-NEXT:[[TOBOOL:%.*]] = icmp ne i1 [[TMP0]], false
+// AMDGCNSPIRV-NEXT:br i1 [[TOBOOL]], label %[[IF_THEN:.*]], label 
%[[IF_END:.*]]
+// AMDGCNSPIRV:   [[IF_THEN]]:
+// AMDGCNSPIRV-NEXT:call addrspace(4) void @llvm.trap()
+// AMDGCNSPIRV-NEXT:br label %[[IF_END]]
+// AMDGCNSPIRV:   [[IF_END]]:
+// AMDGCNSPIRV-NEXT:ret void
+//
+void foo() {
+if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_permlanex16))

AlexVlx wrote:

> Could we get a test? Something simple like `+dpp`?

Sure, but if possible, could you clarify what you would like to be tested / 
what you expect to see, so that we avoid churning.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-04 Thread Shilei Tian via cfe-commits

https://github.com/shiltian commented:

This is worth a release note item.

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-04 Thread Alex Voicu via cfe-commits

https://github.com/AlexVlx updated 
https://github.com/llvm/llvm-project/pull/134016

>From 91eeaf02336e539f14dcb0a79ff15dbe8befe6f1 Mon Sep 17 00:00:00 2001
From: Alex Voicu 
Date: Wed, 2 Apr 2025 02:47:42 +0100
Subject: [PATCH 1/7] Add the functional identity and feature queries.

---
 clang/docs/LanguageExtensions.rst | 110 ++
 clang/include/clang/Basic/BuiltinsAMDGPU.def  |   5 +
 .../clang/Basic/DiagnosticSemaKinds.td|  10 +
 clang/lib/Basic/Targets/SPIR.cpp  |   4 +
 clang/lib/Basic/Targets/SPIR.h|   4 +
 clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp   |  29 ++
 clang/lib/Sema/SemaExpr.cpp   | 157 
 clang/test/CodeGen/amdgpu-builtin-cpu-is.c|  65 
 .../CodeGen/amdgpu-builtin-is-invocable.c |  64 
 .../amdgpu-feature-builtins-invalid-use.cpp   |  43 +++
 llvm/lib/Target/AMDGPU/AMDGPU.h   |   9 +
 .../AMDGPU/AMDGPUExpandPseudoIntrinsics.cpp   | 207 ++
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def |   2 +
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |   3 +-
 llvm/lib/Target/AMDGPU/CMakeLists.txt |   1 +
 ...pu-expand-feature-predicates-unfoldable.ll |  28 ++
 .../amdgpu-expand-feature-predicates.ll   | 359 ++
 17 files changed, 1099 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/CodeGen/amdgpu-builtin-cpu-is.c
 create mode 100644 clang/test/CodeGen/amdgpu-builtin-is-invocable.c
 create mode 100644 clang/test/CodeGen/amdgpu-feature-builtins-invalid-use.cpp
 create mode 100644 llvm/lib/Target/AMDGPU/AMDGPUExpandPseudoIntrinsics.cpp
 create mode 100644 
llvm/test/CodeGen/AMDGPU/amdgpu-expand-feature-predicates-unfoldable.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/amdgpu-expand-feature-predicates.ll

diff --git a/clang/docs/LanguageExtensions.rst 
b/clang/docs/LanguageExtensions.rst
index 3b8a9cac6587a..8a7cb75af13e5 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -4920,6 +4920,116 @@ If no address spaces names are provided, all address 
spaces are fenced.
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local")
   __builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup", "local", "global")
 
+__builtin_amdgcn_processor_is and __builtin_amdgcn_is_invocable
+^^^
+
+``__builtin_amdgcn_processor_is`` and ``__builtin_amdgcn_is_invocable`` provide
+a functional mechanism for programatically querying:
+
+* the identity of the current target processor;
+* the capability of the current target processor to invoke a particular 
builtin.
+
+**Syntax**:
+
+.. code-block:: c
+
+  // When used as the predicate for a control structure
+  bool __builtin_amdgcn_processor_is(const char*);
+  bool __builtin_amdgcn_is_invocable(builtin_name);
+  // Otherwise
+  void __builtin_amdgcn_processor_is(const char*);
+  void __builtin_amdgcn_is_invocable(void);
+
+**Example of use**:
+
+.. code-block:: c++
+
+  if (__builtin_amdgcn_processor_is("gfx1201") ||
+  __builtin_amdgcn_is_invocable(__builtin_amdgcn_s_sleep_var))
+__builtin_amdgcn_s_sleep_var(x);
+
+  if (!__builtin_amdgcn_processor_is("gfx906"))
+__builtin_amdgcn_s_wait_event_export_ready();
+  else if (__builtin_amdgcn_processor_is("gfx1010") ||
+   __builtin_amdgcn_processor_is("gfx1101"))
+__builtin_amdgcn_s_ttracedata_imm(1);
+
+  while (__builtin_amdgcn_processor_is("gfx1101")) *p += x;
+
+  do { *p -= x; } while (__builtin_amdgcn_processor_is("gfx1010"));
+
+  for (; __builtin_amdgcn_processor_is("gfx1201"); ++*p) break;
+
+  if 
(__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_wait_event_export_ready))
+__builtin_amdgcn_s_wait_event_export_ready();
+  else if (__builtin_amdgcn_is_invocable(__builtin_amdgcn_s_ttracedata_imm))
+__builtin_amdgcn_s_ttracedata_imm(1);
+
+  do {
+*p -= x;
+  } while 
(__builtin_amdgcn_is_invocable(__builtin_amdgcn_global_load_tr_b64_i32));
+
+  for (; __builtin_amdgcn_is_invocable(__builtin_amdgcn_permlane64); ++*p) 
break;
+
+**Description**:
+
+When used as the predicate value of the following control structures:
+
+.. code-block:: c++
+
+  if (...)
+  while (...)
+  do { } while (...)
+  for (...)
+
+be it directly, or as arguments to logical operators such as ``!, ||, &&``, the
+builtins return a boolean value that:
+
+* indicates whether the current target matches the argument; the argument MUST
+  be a string literal and a valid AMDGPU target
+* indicates whether the builtin function passed as the argument can be invoked
+  by the current target; the argument MUST be either a generic or AMDGPU
+  specific builtin name
+
+Outside of these contexts, the builtins have a ``void`` returning signature
+which prevents their misuse.
+
+**Example of invalid use**:
+
+.. code-block:: c++
+
+  void kernel(int* p, int x, bool (*pfn)(bool), const char* str) {
+if (__builtin_amdgcn_processor_is("not_an_amdgcn_gfx_id")) return;
+else if (__bui

[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-04 Thread Dmitry Sidorov via cfe-commits

MrSidims wrote:

> Thank you for the feedback! I might not be getting the question right (case 
> in which I apologise in advance!), but I think that for "vanilla" SPIR-V i.e. 
> not vendor flavoured one, where one strictly has to deal with Extensions / 
> non-core capabilities, we probably would have the following situation:

I was imagining cases like this:
```
if (__builtin_amdgcn_processor_is("some_hw_with_fp16_support) {
/*code using fp16*/
} else {
/*code using fp32*/
}
```
note, that when translated to SPIR-V the SPIR-V generator must insert 
**Float16** capability (in the beginning of the module). So such tool would 
need to remove that capability as well.

A side question, is it legal to use the builtin in unstructured control flow, 
like here: https://godbolt.org/z/qnhKdhfdW ?

https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-03 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> So in short: what you're trying to prevent is "this was stored in a variable, 
> then checked later when we are no longer on the device, thus the answer is 
> different". 

Not quite, although that is definitely an interesting consideration. What I am 
trying to address here is the not invalid concern that if you allow these to 
compose in arbitrary ways, with arbitrary values, stash and retrieve them to 
the point of being unable to trace them back, people will do it. And then they 
will build a rather complex contraption that makes perfect sense for them, is 
extremely valid and useful, but also ends up either failing to fold (not very 
helpful error message, they'll probably be unhappy with the compiler) or, 
worse, folds into something that completely subverted the value of the 
predicate (what was meant to be false is now true), and an ASM sequence that 
melts their GPU or launches nuclear missiles gets through (definitely unhappy). 
I am slightly more ambivalent than some of my colleagues towards this, but I 
cannot outright discard the concern - hence the awkward design.
 
>Your solution doesn't actually DO that, and acts in a way that is inconsistent 
>with the language. Your attempts here are >defeated by common patterns, 
>including once where variables are declared or altered inside of a condition 
>statement. So >any attempts here are pretty fraught with errors. Consider:
> 
> ```
> if (auto x = ) {
> // value is here
> }
> 

Indeed, with the minor nit that with the current PR that'd actually not work, 
but rather fail, since the innermost context for the BI's use would be the 
initialisation of X, and for initialisations it is just a void returning 
function; we only expand/promote in a boolean condition.

> // Or even:
> bool b = false;
> if () b = true;
> ```
> 

Completely agreed, with the slight objection that this is subversive. The goal 
is not to outsmart really clever users, that's not tractable, but merely to 
prevent enthusiasm driven misuse that could lead to extremely sub-optimal 
outcomes.

> So any attempt to do so is, honestly, partial at best, and confusing for no 
> good reason. Values change in a program, and these are no different, so that 
> sounds like a common programming mistake. Also, for some reason, this 
> disallows the conditional-operator as well? Which is another confusing thing 
> for users, as they consider that to be a shortcut for if/else assignments
>

The conditional-operator was considered "too confusing / why would you ever do 
that / what use could it be" material when I suggested it should work, hence it 
ended up as disallowed. Adding it back wouldn't be a problem.

> IMO, we are better served by a warning diagnostic if we detect these are 
> misused. `ParseCXXCondition` (or the C equivalent, but since you are 
> returning bool it seems you're not concerned about C?) might be a good place 
> to set a variable to enable the warning.

This is a very good suggestion, thank you very much for it - it might well be 
where we end up. My worry is that ignoring warning and diagnostics is rather 
common. If I may be so bold as to inquire: would you and @AaronBallman be 
slightly less horrified if the return type variance would be replaced with 
returning an odd type that only knows how to `bool`ify itself in conditions? 
More explicitly, if instead of `void __builtin_amdgcn_processor_is(const 
char*)` what we see is `__amdgpu_predicate_t 
__builtin_amdgcn_processor_is(const char*)`, would that be somewhat less bad? 
There is precedent for special-ish builtins returning special-ish target types 
(please consider `__amdgpu_buffer_rsrc_t` for 
`__builtin_amdgcn_make_buffer_rsrc` or `svcount_t`)



https://github.com/llvm/llvm-project/pull/134016
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [AMDGPU][clang][CodeGen][opt] Add late-resolved feature identifying predicates (PR #134016)

2025-04-03 Thread Alex Voicu via cfe-commits

AlexVlx wrote:

> > as a must have, it allows featureful AMDGCN flavoured SPIR-V to be 
> > produced, where target specific capability is guarded and chosen or 
> > discarded when finalising compilation for a concrete target.
> 
> I understand the reasoning behind providing such mechanisms to guard hardware 
> specific code for targets, that are being JITed, but I'm not sure about, how 
> would it work in SPIR-V case. Like it's is described now you are able to 
> easily remove the instructions within the guarded block. But SPIR-V also 
> provides `OpExtension` and `OpCapability` instructions that specify, which 
> features are used in the module and are placed on the top of it, so the 
> runtime is free to discard any module with unknown/unsupported capability. 
> Will you also provide a tool to cut those capabilities along the instructions?
> 

Thank you for the feedback! I might not be getting the question right (case in 
which I apologise in advance!), but I think that for "vanilla" SPIR-V i.e. not 
vendor flavoured one, where one strictly has to deal with Extensions / non-core 
capabilities, we probably would have the following situation:

- The `processor_is` query is not particularly useful, unless we'd start 
defining pseudo-processors i.e. collections of features, which IMHO would be 
bad / subvert one of the main benefits of SPIR-V;
- The `is_invocable` query, however, is probably quite useful even there, 
assuming we'd start somewhat more aggressively introducing target specific 
intrinsics which map to e.g. `Op*`s and do the reasonable thing of bubbling 
them into clang via builtins;
- please note that an underlying assumption for our use here is that SPIR-V 
gets reverse translated into LLVM IR, I assume you are interested in the case 
where that assumption does not hold / the transforms would only apply to the 
SPIR-V module - with the caveat that this is somewhat handwavium powered, I 
will suggest that the benefit in that case is that if a module containing 
guarded functionality is encountered by a RT that does not support said 
functionality, instead of fully discarding the module it becomes possible to 
just remove the offending bits, in a way similar to what is done here, and 
still successfully load it.

> My review is not gating here, but just for my curiosity (and to have an 
> opinion about: _"In the end, I will note there is nothing that is actually 
> AMDGPU specific here, so it is possible that in the future, assuming 
> interests from other targets / users, we'd just promote them to generic 
> intrinsics."_ ):
> 
> 1. Is it allowed to use the builtin along a runtime-known boolean flag? 
> (probably it's also somewhat related to Erich's comments)

No it is not, these have to be constant foldable (part of the motivation for 
the awkward design that is giving people pause is to unambiguously constrain 
them to cases where they will be constant foldable, strictly based on their 
value, without trying to explain that wrinkle to users). Adding in arbitrary 
run time values / performing arbitrary ops on these means that we might 
inadvertently allow through invalid code. If the question is more about 
something along the lines of "at run time a value is passed as an input to the 
finalisation process i.e. it is constant" in theory this could be made to work, 
but the interface isn't quite there and it remains dangerous. For completeness, 
I will note that in some sense a run time value is being passed for the SPIR-V 
case, as the target is only known at run time, although this is not user 
controlled.

> 2. Does this builtin act like a barrier for optimization passes? if so, is a 
> load from llvm.amdgcn.is GV considered to be a barrier, or something else?

In theory, only by virtue of it being a load from an externally initialised 
global variable, with all the implications of that. However, the intention is 
to run the expansion pass as early as possible, immediately after Clang 
CodeGen, so that subsequent passes don't have to deal with these being present 
at all (they are meant to be rather ephemeral, and would only ever come into 
being when targeting `amdgcnspirv`). 

> 3. Is it envisioned, that optnone (LLVM/SPIR-V) has no effect on the 
> pass/tool, that removes target-dependent code?

Yes, this is always enabled for AMDGPU and has to work with -O0 / `optnone` 
which is another reason for going for tight constraints. For the currently 
allowed uses, the transform is fairly straightforward and is not too disruptive 
to the module. Furthermore, it does not have to rely on e.g. DCE, inlining or 
constprop being run, which at O0 wouldn't happen. Conversely, were these to be 
allowed e.g. as arguments to a function call, then we'd have to start looking 
through / inlining calls where predicates are passed. Which is doable, and we 
have implemented, but is intrusive, non-trivial and possibly quite expensive 
(essentially we'd need to clone the callee to avoid messing up user provide

  1   2   >