[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-05 Thread David Spickett via cfe-commits

DavidSpickett wrote:

I've pushed a fix for one of the tests on 32 bit Arm, as it failed on our bot:
https://lab.llvm.org/buildbot/#/builders/154/builds/11413/steps/5/logs/FAIL__Clang__offload-Xarch_c

Maybe your intent was specifically to have a line that uses the host 
architecture, if that was the case, we can figure out a `requires:` to somehow 
make that work. I have access to a machine to reproduce it.

The other way to fix it was to use `-Xarch_nvptx32` instead but of course that 
failed on a 64-bit host.

Let me know if you want to follow up on this.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-05 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> I've pushed a fix for one of the tests on 32 bit Arm, as it failed on our 
> bot: 
> https://lab.llvm.org/buildbot/#/builders/154/builds/11413/steps/5/logs/FAIL__Clang__offload-Xarch_c
> 
> Maybe your intent was specifically to have a line that uses the host 
> architecture, if that was the case, we can figure out a `requires:` to 
> somehow make that work. I have access to a machine to reproduce it.
> 
> The other way to fix it was to use `-Xarch_nvptx32` instead but of course 
> that failed on a 64-bit host.
> 
> Let me know if you want to follow up on this.

Thanks! I totally forgot that `nvptx` existed, it's totally deprecated by 
NVIDIA so we should probably just reject it outright, but I'll leave that up to 
@Artem-B.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-05 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 closed 
https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits

https://github.com/Artem-B approved this pull request.


https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits


@@ -0,0 +1,34 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu 
-Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a,gfx1030 
-ccc-print-bindings -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"

Artem-B wrote:

Nit: Using `-D` as an argument to pass may let you 
unambiguously mark each subcompilation, and avoid triggering that -O1/-O2 
problem.


https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits


@@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group,
 def Xanalyzer : Separate<["-"], "Xanalyzer">,
   HelpText<"Pass  to the static analyzer">, MetaVarName<"">,
   Group;
-def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>;
+def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>,
+  HelpText<"Pass  to the compiliation if the target matches ">,

Artem-B wrote:

We do need a better documentation. It's not obvious that  could be a GPU 
name or an arch name from the target triple.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits


@@ -0,0 +1,34 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"

Artem-B wrote:

Nit: I'd also add the check that -O3 is present in a compilation with 
fcuda-is-device. Otherwise the tests would succeed somewhere else.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"

jhuber6 wrote:

My complaints with `--offload-arch=` is that it's pulling double duty on 
`-mcpu` and `--target` options. It's wholly insufficient for OpenMP when 
targeting anything that isn't a GPU because of that. I'd prefer to have the 
tools to individually enable and control the toolchains. My perspective is that 
offloading is a product of all `--target=` values and `-mcpu=` values. That's 
what it boils down to internally, and I'd like to be able to compile things 
that way if necessary, i.e. `clang --offload-target=spirv64 foo.hip` makes more 
sense to me.

Either way, does this patch LG now that part has been delayed until later?

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"

Artem-B wrote:

You're probably the one holding most of the puzzle pieces here. For CUDA/HIP 
things (mostly) work well enough. Even though there are somewhat useful 
attempts to target spir-V, they are half-baked at best. For OpenMP the target 
set is widely open, so I can see how it would need a more flexible way to 
customize the compilation pipeline construction. 

In abstract, part of the problem boils down to naming. You need to be able to 
tell the driver, what we want to target (currently we use --offload-arch= and 
-offload options for that) and how to select subsets of the constructed 
pipeline (-X).

`--offload-arch=amdgcnspirv,gfx1030` looks like a reasonable approach to me, 
and the same naming scheme could be used as a selector part of `-Xarch`. What 
we have now may not be perfect, but I think it's reasonably functional. Is 
there a particular reason `--offload-arch=amdgcnspirv,gfx1030` is not 
sufficient to drive pipeline construction? I'm not sure why we want -Xarch to 
do that job.

> Compared to those I really don't think -Xarch_ is a big deal, since it does 
> what you want, passes those arguments only to the Triple toolchain.

My issue is that the patch was making -Xarch do the job it's not intended for 
*and* hiding real errors in the process. The "passing the arguments to the 
triple{-selected} toolchain"  (as an wider-scope selector, compared to per-GPU 
or host/device ones we have now) part is fine. Let's handle pipeline creation 
challenge separately.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"

jhuber6 wrote:

Yes, the question is how do we separate architectures when we don't assume a 
single offloading toolchain. Right now we just wing it and guess based off of 
hard coded string values.

I removed the contentious part, so can we at least land the easy fix.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits

https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"

Artem-B wrote:

> what is your recommended solution for mixing targets

Can you be more specific? We seem to view things from very different angles, so 
I want to make sure we're talking about the same things here.

> we can already target HIP and CUDA from SPIRV-64

That's one example where we're looking at it differently. The way I see it it's 
HIP/CUDA (as in front-end) can use pirv as the target toolchain.
Is your question -- how do we tell the driver to construct yet another cc1 
subcompilation variant? 
I do not have a ready answer for that, but if I had one, `-Xarch` probably 
would not be it.



https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"

jhuber6 wrote:

That being said, what is your recommended solution for mixing targets? This is 
not unique to OpenMP, we can already target HIP and CUDA from SPIRV-64 which is 
a different toolchain that can't accept `--offload-arch` arguments.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/125421

>From 3ed4040c18b4a980218a86a37ed8aabd0e395b20 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Sun, 2 Feb 2025 10:39:01 -0600
Subject: [PATCH] [Clang] Make `-Xarch_` handling generic for all toolchains

Summary:
Currently, `-Xarch_` is used to forward argument specially to certain
toolchains. Currently, this is only supported by the Darwin toolchain.
We want to be able to use this generically, and for offloading too. This
patch moves the handling out of the Darwin Toolchain and places it in
the `getArgsForToolchain` helper which is run before the arguments get
passed to the tools.

The main benefit here is that we now have a more generic version of
`-Xopenmp-target=`, which should probably just be deprecated.
Additionally, it allows us to specially pass arguments to different
architectures for offloading.

This patch is done in preparation for making selecting offloading
toolchains more generic, this will be helpful while people are moving
toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN,
NVPTX).
---
 clang/include/clang/Driver/Options.td  |  8 +++--
 clang/lib/Driver/Driver.cpp|  5 +--
 clang/lib/Driver/ToolChain.cpp | 42 --
 clang/lib/Driver/ToolChains/Darwin.cpp | 24 ---
 clang/test/Driver/Xarch.c  |  8 +
 clang/test/Driver/offload-Xarch.c  | 34 +
 6 files changed, 82 insertions(+), 39 deletions(-)
 create mode 100644 clang/test/Driver/offload-Xarch.c

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 0ab923fcdd5838..55d10ed8e974af 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group,
 def Xanalyzer : Separate<["-"], "Xanalyzer">,
   HelpText<"Pass  to the static analyzer">, MetaVarName<"">,
   Group;
-def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>;
+def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>,
+  HelpText<"Pass  to the compiliation if the target matches ">,
+  MetaVarName<" ">;
 def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>,
   HelpText<"Pass  to the CUDA/HIP host compilation">, 
MetaVarName<"">;
 def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>,
@@ -1115,8 +1117,8 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,
-  Visibility<[ClangOption, FlangOption]>,
+def offload_arch_EQ : Joined<["--"], "offload-arch=">,
+  Visibility<[ClangOption, FlangOption]>, Flags<[NoXarchOption]>,
   HelpText<"Specify an offloading device architecture for CUDA, HIP, or 
OpenMP. (e.g. sm_35). "
"If 'native' is used the compiler will detect locally installed 
architectures. "
"For HIP offloading, the device architecture can be followed by 
target ID features "
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 912777a9808b4b..5a4737fb381e6a 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -3409,7 +3409,9 @@ class OffloadingActionBuilder final {
   // Collect all offload arch parameters, removing duplicates.
   std::set GpuArchs;
   bool Error = false;
-  for (Arg *A : Args) {
+  const ToolChain &TC = *ToolChains.front();
+  for (Arg *A : C.getArgsForToolChain(&TC, /*BoundArch=*/"",
+  AssociatedOffloadKind)) {
 if (!(A->getOption().matches(options::OPT_offload_arch_EQ) ||
   A->getOption().matches(options::OPT_no_offload_arch_EQ)))
   continue;
@@ -3420,7 +3422,6 @@ class OffloadingActionBuilder final {
   ArchStr == "all") {
 GpuArchs.clear();
   } else if (ArchStr == "native") {
-const ToolChain &TC = *ToolChains.front();
 auto GPUsOrErr = ToolChains.front()->getSystemGPUArchs(Args);
 if (!GPUsOrErr) {
   TC.getDriver().Diag(diag::err_drv_undetermined_gpu_arch)
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index ebc982096595e6..c25d1b6be14b50 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs(
   A->getOption().matches(options::OPT_Xarch_host))
 ValuePos = 0;
 
-  unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(ValuePos));
+  const InputArgList &BaseArgs = Args.getBaseArgs();
+  unsigned Index = BaseArgs.MakeIndex(A->getValue(ValuePos));
   unsigned Prev = Index;
   std::unique_ptr XarchArg(Opts.ParseOneArg(Args, Index));
 
@@ -1672,8 +1673,31 @@ void ToolChain::TranslateXarchArgs(
 Diags.Report(DiagID) << A->getAsStrin

[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"

jhuber6 wrote:

CUDA / HIP doesn't reject this currently, but in an effort to move this debate 
somewhere else, I've omitted this change from the PR.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"

Artem-B wrote:

It all seems to boil down to "what does Xarch mean". The conventional 
`-X` model is "pass the arg to  in the compilation 
pipeline". You seem to propose that for openMP it will be "use it to construct 
compiler pipeline". Not sure I like this dichotomy where we commingling 
pipeline construction with tweaking of already constructed pipeline.

Perhaps a cleaner way would be to figure out a better way tof OpenMP to set up 
the build pipeline structure (that's what --offload-arch is for), and keep 
`Xarch` exclusively as a way to tweak the options for the already constructed 
pipeline.
Presumably OpenMP already has the ways to specify the desired offloading 
targets.  `Xarch` does not look like the right tool for that job. It can do it, 
but it opens a can of corner cases.

If openMP does not have a viable better way to configure the compilation 
pipeline, we could make `-Xarch=backend. --offload-arch=target` a documented 
special case for OpenMP only. I would strongly prefer to keep CUDA/HIP to 
continue diagnosing `--offload-arch` being passed to `cc1`.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"

jhuber6 wrote:

Currently we have `-Xopenmp-target=` which does the same thing. OpenMP 
can do something like the following:
```console
clang foo.c -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda 
-Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx1030,gfx90a 
-Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_52,sm_90
```
These are passed to the *driver* stage because we call `getArgsForToolchain` 
while processing the offloading architecture inputs. This call has no bound 
architecture which is the problem with parsing `-Xarch_sm_52 
--offload-arch=sm_52`. There is no `-cc1` call here, it passes `-offload-arch` 
to the offloading toolchain, which then results in us building `N` compilation 
jobs for each of those architectures. `--offload-arch` is not accepted by 
`-cc1` at all, so it's not relevant here.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"

Artem-B wrote:

IMO this special case makes no sense.

If I were to look at this command line in real life, my assumption would be 
that a user made a mistake and intended to write `--offload-arch=sm_52 
-Xarch_sm_52 -some-option`. I.e. they targeted sm_52, and then wanted to tweak 
that compilation. In this case we effectively ignoring `-Xarch_sm_52` which was 
very likely *not* the user's intent.

`cc1` reporting an error when it got `--offload-arch` would be a better 
approach IMO, giving the feedback that the user is doing something wrong.

On the other hand you've mentioned that:
> using -Xarch_amdgcn --offload-arch=gfx1030 is very meaningful for OpenMP 
> where the user can enable multiple toolchains at the same time.

So, it looks like handling of these options is also language-dependent. For 
CUDA, blindly passing Xarch* options down to the compilation selected by Xarch 
kind (back-end, or specific GPU) and letting cc1 deal with those options would 
probably be acceptable.

The same approach may also work for OpenMP, where cc1 can do something sensible 
with --offload-arch passed to it and does not have to error out.



https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"

Artem-B wrote:

I'm not sure I understand what exactly it's intended to do and how is that 
supposed to work. I'm missing something here. Can you elaborate on the intended 
use case here and walk me through it?

So, the top-level driver sees `-Xarch_amdgcn`. I would assume that we want it 
to pass the following `--offloard-arch=gfx90a` to all cc1 subcompilations using 
amdgcn. What is `--offload-arch=gfx90a` expected to do in this case, once it's 
passed to cc1? I can see how it might be used if we have a single cc1 
subcompilation to tell that cc1 invocation to target gfx90a, but that looks 
like an odd fix for an odd problem. IMO that's something that should be done by 
the top-level driver.
If we have multiple cc1 subcompilation using amdgcn, then we'll end up with 
multiple cc1 invocations with potentially identical target... Not sure if 
that's going to cause troubles further down the compilation pipeline. E.g. can 
we incorporate N binaries for the same target? How will runtime figure out 
which one to load?




https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"
+
+// RUN: %clang -x cuda %s -nogpulib -nogpuinc \
+// RUN:   -Xarch_sm_51 --offload-arch=sm_52 -S -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC-WARN %s
+// SPECIFIC-WARN: warning: argument unused during compilation: '-Xarch_sm_51 
--offload-arch=sm_52'

jhuber6 wrote:

In this case, `--offload-arch=sm_52` would enable `sm_52` as a bound job, then 
it would make `-Xarch-sm_52` enable `--offload-arch=sm_80` but by the time it 
sees that the list was already generated so it wouldn't do anything.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits

https://github.com/Artem-B commented:

> > > --offload-arch= isn't an accepted -cc1 argument so it won't be forwarded 
> > > at all.
> > 
> > 
> > Silently? That would be wrong, imo. It should be diagnosed somewhere.
> 
> It's already an error if you pass it directly via -`Xclang` because it's not 
> an accepted `-cc1` argument. A lot of driver arguments are both driver and 
> `cc1` arguments so those get marshalled or forwarded.

Yes, I'm aware of that. My question is -- with this patch, and my example 
command above which wants to forward --offload-arch, does `clang -cc1` report 
an error, or stays silent because the argument "won't be forwarded at all." ?

Perhaps we should take a step back, document desired behavior/interactions 
between -Xarch, --offload-arch, and OpenMP/CUDA/HIP offloading modes, so we 
have somewhat consistent (or at least documented) behavior. Right now we seem 
to chase corner cases.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"

jhuber6 wrote:

But we *want* `-Xarch_amdgcn --offloard-arch=gfx90a` to work for cases like 
OpenMP which can combine multiple targets into a single compile and needs the 
lists to be separate. I guess we could go with @yxsamliu's suggestion and just 
error if the value isn't a valid triple architecture.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits


@@ -0,0 +1,44 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc 
\
+// RUN:   -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib 
-nogpuinc \
+// RUN:   -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \
+// RUN: | FileCheck -check-prefix=O3ONCE %s
+// O3ONCE: "-O3"
+// O3ONCE-NOT: "-O3"
+
+// RUN: %clang -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 
--offload-arch=sm_52,sm_60 -nogpuinc \
+// RUN:   -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### 
%s 2>&1 \
+// RUN: | FileCheck -check-prefix=OPENMP %s
+//
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], 
output: "[[HOST_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]"
+// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM52_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", 
"[[HOST_BC]]"], output: "[[SM60_PTX:.+]]"
+// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: 
["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], 
output: "[[BINARY:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", 
"[[BINARY]]"], output: "[[HOST_OBJ:.+]]"
+// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
["[[HOST_OBJ]]"], output: "a.out"
+
+// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 
-Xarch_sm_60 -O0 \
+// RUN:   --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib 
-nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=CUDA %s
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" 
{{.*}}"-O3"
+// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" 
{{.*}}"-O0"
+// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3"
+
+// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
\
+// RUN:   -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC %s
+// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52"
+
+// RUN: %clang -x cuda %s -nogpulib -nogpuinc \
+// RUN:   -Xarch_sm_51 --offload-arch=sm_52 -S -### 2>&1 \
+// RUN: | FileCheck -check-prefix=SPECIFIC-WARN %s
+// SPECIFIC-WARN: warning: argument unused during compilation: '-Xarch_sm_51 
--offload-arch=sm_52'

Artem-B wrote:

What will happen when we *do* pass `-offload-arch=sm_80` to cc1?
```
--offload-arch=sm_52 -Xarch_sm52 --offload-arch=sm_80 --some-option-for_sm52
```


https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits

https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/125421

>From 7773accdd4c3a6fc178c76dd974948dbe091e549 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Sun, 2 Feb 2025 10:39:01 -0600
Subject: [PATCH 1/2] [Clang] Make `-Xarch_` handling generic for all
 toolchains

Summary:
Currently, `-Xarch_` is used to forward argument specially to certain
toolchains. Currently, this is only supported by the Darwin toolchain.
We want to be able to use this generically, and for offloading too. This
patch moves the handling out of the Darwin Toolchain and places it in
the `getArgsForToolchain` helper which is run before the arguments get
passed to the tools.

The main benefit here is that we now have a more generic version of
`-Xopenmp-target=`, which should probably just be deprecated.
Additionally, it allows us to specially pass arguments to different
architectures for offloading.

This patch is done in preparation for making selecting offloading
toolchains more generic, this will be helpful while people are moving
toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN,
NVPTX).
---
 clang/include/clang/Driver/Options.td  |  7 ++--
 clang/lib/Driver/Driver.cpp|  5 +--
 clang/lib/Driver/ToolChain.cpp | 45 --
 clang/lib/Driver/ToolChains/Darwin.cpp | 24 --
 clang/test/Driver/Xarch.c  |  8 +
 clang/test/Driver/offload-Xarch.c  | 39 ++
 6 files changed, 89 insertions(+), 39 deletions(-)
 create mode 100644 clang/test/Driver/offload-Xarch.c

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 0ab923fcdd5838c..765b7f882e99a85 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group,
 def Xanalyzer : Separate<["-"], "Xanalyzer">,
   HelpText<"Pass  to the static analyzer">, MetaVarName<"">,
   Group;
-def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>;
+def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>,
+  HelpText<"Pass  to the compiliation if the target matches ">,
+  MetaVarName<" ">;
 def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>,
   HelpText<"Pass  to the CUDA/HIP host compilation">, 
MetaVarName<"">;
 def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>,
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,
+def offload_arch_EQ : Joined<["--"], "offload-arch=">,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Specify an offloading device architecture for CUDA, HIP, or 
OpenMP. (e.g. sm_35). "
"If 'native' is used the compiler will detect locally installed 
architectures. "
"For HIP offloading, the device architecture can be followed by 
target ID features "
"delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be 
specified more than once.">;
 def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">,
-  Flags<[NoXarchOption]>,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, 
gfx906) from the list of devices to compile for. "
"'all' resets the list to its default value.">;
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 912777a9808b4be..5a4737fb381e6a0 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -3409,7 +3409,9 @@ class OffloadingActionBuilder final {
   // Collect all offload arch parameters, removing duplicates.
   std::set GpuArchs;
   bool Error = false;
-  for (Arg *A : Args) {
+  const ToolChain &TC = *ToolChains.front();
+  for (Arg *A : C.getArgsForToolChain(&TC, /*BoundArch=*/"",
+  AssociatedOffloadKind)) {
 if (!(A->getOption().matches(options::OPT_offload_arch_EQ) ||
   A->getOption().matches(options::OPT_no_offload_arch_EQ)))
   continue;
@@ -3420,7 +3422,6 @@ class OffloadingActionBuilder final {
   ArchStr == "all") {
 GpuArchs.clear();
   } else if (ArchStr == "native") {
-const ToolChain &TC = *ToolChains.front();
 auto GPUsOrErr = ToolChains.front()->getSystemGPUArchs(Args);
 if (!GPUsOrErr) {
   TC.getDriver().Diag(diag::err_drv_undetermined_gpu_arch)
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index ebc982096595e61..fc2a07ff26fc025 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs(
   A->getOption().matches(options::OPT_Xarch_host))
 ValuePos = 0;
 
-  unsigned Ind

[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > --offload-arch= isn't an accepted -cc1 argument so it won't be forwarded at 
> > all.
> 
> Silently? That would be wrong, imo. It should be diagnosed somewhere.

It's already an error if you pass it directly via -`Xclang` because it's not an 
accepted `-cc1` argument. A lot of driver arguments are both driver and `cc1` 
arguments so those get marshalled or forwarded.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > Right now if someone passes -Xarch_foo --offload-arch=gfx1030 and foo 
> > doesn't match it's not passed and it will print something like this. I 
> > figured that's good enough.
> 
> This part SGTM, too.
> 
> However, I don't think I've seen the answer what happens when we do pass 
> --offload arch to cc1.
> 
> E.g. a user may accidentally paste an argument in the wrong place and instead 
> of intended `--offload-arch=sm_52 --offload-arch=sm_80 -Xarch_sm52 
> --some-option-for_sm52` passes `--offload-arch=sm_52 -Xarch_sm52 
> --offload-arch=sm_80 --some-option-for_sm52` ?

`--offload-arch=` isn't an accepted `-cc1` argument so it won't be forwarded at 
all.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> --offload-arch= isn't an accepted -cc1 argument so it won't be forwarded at 
> all.

Silently? That would be wrong, imo. It should be diagnosed somewhere.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> Right now if someone passes -Xarch_foo --offload-arch=gfx1030 and foo doesn't 
> match it's not passed and it will print something like this. I figured that's 
> good enough.

This part SGTM, too.

However, I don't think I've seen the answer what happens when we do pass 
--offload arch to cc1.

E.g. a user may accidentally paste an argument in the wrong place and instead 
of intended `--offload-arch=sm_52 --offload-arch=sm_80 -Xarch_sm52  
--some-option-for_sm52` passes `--offload-arch=sm_52 -Xarch_sm52 
--offload-arch=sm_80 --some-option-for_sm52` ?





https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > I don't think there's any use of --offload-arch outside of the driver.
> 
> I agree. Yet we do need to deal with such nonsensical input in a consistent 
> manner. We do not control what the users give us, but we control how we 
> respond.

Right now if someone passes `-Xarch_foo --offload-arch=gfx1030` and `foo` 
doesn't match it's not passed and it will print something like this. I figured 
that's good enough.
```
clang++: warning: argument unused during compilation: '-Xarch_sm_51 
--offload-arch=sm_52' [-Wunused-command-line-argument]```

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/125421

>From 7773accdd4c3a6fc178c76dd974948dbe091e549 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Sun, 2 Feb 2025 10:39:01 -0600
Subject: [PATCH 1/2] [Clang] Make `-Xarch_` handling generic for all
 toolchains

Summary:
Currently, `-Xarch_` is used to forward argument specially to certain
toolchains. Currently, this is only supported by the Darwin toolchain.
We want to be able to use this generically, and for offloading too. This
patch moves the handling out of the Darwin Toolchain and places it in
the `getArgsForToolchain` helper which is run before the arguments get
passed to the tools.

The main benefit here is that we now have a more generic version of
`-Xopenmp-target=`, which should probably just be deprecated.
Additionally, it allows us to specially pass arguments to different
architectures for offloading.

This patch is done in preparation for making selecting offloading
toolchains more generic, this will be helpful while people are moving
toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN,
NVPTX).
---
 clang/include/clang/Driver/Options.td  |  7 ++--
 clang/lib/Driver/Driver.cpp|  5 +--
 clang/lib/Driver/ToolChain.cpp | 45 --
 clang/lib/Driver/ToolChains/Darwin.cpp | 24 --
 clang/test/Driver/Xarch.c  |  8 +
 clang/test/Driver/offload-Xarch.c  | 39 ++
 6 files changed, 89 insertions(+), 39 deletions(-)
 create mode 100644 clang/test/Driver/offload-Xarch.c

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 0ab923fcdd5838c..765b7f882e99a85 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group,
 def Xanalyzer : Separate<["-"], "Xanalyzer">,
   HelpText<"Pass  to the static analyzer">, MetaVarName<"">,
   Group;
-def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>;
+def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>,
+  HelpText<"Pass  to the compiliation if the target matches ">,
+  MetaVarName<" ">;
 def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>,
   HelpText<"Pass  to the CUDA/HIP host compilation">, 
MetaVarName<"">;
 def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>,
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,
+def offload_arch_EQ : Joined<["--"], "offload-arch=">,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Specify an offloading device architecture for CUDA, HIP, or 
OpenMP. (e.g. sm_35). "
"If 'native' is used the compiler will detect locally installed 
architectures. "
"For HIP offloading, the device architecture can be followed by 
target ID features "
"delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be 
specified more than once.">;
 def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">,
-  Flags<[NoXarchOption]>,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, 
gfx906) from the list of devices to compile for. "
"'all' resets the list to its default value.">;
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 912777a9808b4be..5a4737fb381e6a0 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -3409,7 +3409,9 @@ class OffloadingActionBuilder final {
   // Collect all offload arch parameters, removing duplicates.
   std::set GpuArchs;
   bool Error = false;
-  for (Arg *A : Args) {
+  const ToolChain &TC = *ToolChains.front();
+  for (Arg *A : C.getArgsForToolChain(&TC, /*BoundArch=*/"",
+  AssociatedOffloadKind)) {
 if (!(A->getOption().matches(options::OPT_offload_arch_EQ) ||
   A->getOption().matches(options::OPT_no_offload_arch_EQ)))
   continue;
@@ -3420,7 +3422,6 @@ class OffloadingActionBuilder final {
   ArchStr == "all") {
 GpuArchs.clear();
   } else if (ArchStr == "native") {
-const ToolChain &TC = *ToolChains.front();
 auto GPUsOrErr = ToolChains.front()->getSystemGPUArchs(Args);
 if (!GPUsOrErr) {
   TC.getDriver().Diag(diag::err_drv_undetermined_gpu_arch)
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index ebc982096595e61..fc2a07ff26fc025 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs(
   A->getOption().matches(options::OPT_Xarch_host))
 ValuePos = 0;
 
-  unsigned Ind

[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> I don't think there's any use of --offload-arch outside of the driver. 

I agree. Yet we do need to deal with such nonsensical input in a consistent 
manner. We do not control what the users give us, but we control how we respond.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Good 

> > Right now it works as I'd expect, it passes --offload-arch=sm_52 to the 
> > sm_52 compilation, but no other architecture.
> 
> What happens with that `--offload-arch=sm_52` when cc1 sees it? Ideally there 
> should be either an unused argument warning, or an error is the option is not 
> accepted by cc1. Currently cc1 errors out with `error: unknown argument: 
> '--offload-arch=sm_52`. If it continues to do so with your change, then we're 
> fine.

I don't think there's any use of `--offload-arch` outside of the driver. If 
we're passing an arch list it'd be better to query it from the toolchain 
somehow.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> Right now it works as I'd expect, it passes --offload-arch=sm_52 to the sm_52 
> compilation, but no other architecture.

What happens with that `--offload-arch=sm_52` when cc1 sees it?
Ideally there should be either an unused argument warning, or an error is the 
option is not accepted by cc1. Currently cc1 errors out with `error: unknown 
argument: '--offload-arch=sm_52`. If it continues to do so with your change, 
then we're fine.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Joseph Huber via cfe-commits


@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,

jhuber6 wrote:

Right now `-Xarch_gfx1030 --offload-arch=gfx1030` works, but `-Xarch_gfx90a 
--offload-arch=gfx1030` doesn't. It's a little weird but I think it's the best 
interpretation of it we can do, otherwise it becomes difficult to figure out 
what to do without the bound architecture.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-04 Thread Yaxun Liu via cfe-commits


@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,

yxsamliu wrote:

For HIP, maybe check whether the `` in `-Xarch_` is a trple. 
`--offload-arch=` is only allowed with `-Xarch_` if `` is a triple.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/125421

>From 79000d0a1ecd1312fb9bc06af0369b66a133e5d4 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Sun, 2 Feb 2025 10:39:01 -0600
Subject: [PATCH] [Clang] Make `-Xarch_` handling generic for all toolchains

Summary:
Currently, `-Xarch_` is used to forward argument specially to certain
toolchains. Currently, this is only supported by the Darwin toolchain.
We want to be able to use this generically, and for offloading too. This
patch moves the handling out of the Darwin Toolchain and places it in
the `getArgsForToolchain` helper which is run before the arguments get
passed to the tools.

The main benefit here is that we now have a more generic version of
`-Xopenmp-target=`, which should probably just be deprecated.
Additionally, it allows us to specially pass arguments to different
architectures for offloading.

This patch is done in preparation for making selecting offloading
toolchains more generic, this will be helpful while people are moving
toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN,
NVPTX).
---
 clang/include/clang/Driver/Options.td  |  7 ++--
 clang/lib/Driver/Driver.cpp|  5 +--
 clang/lib/Driver/ToolChain.cpp | 45 --
 clang/lib/Driver/ToolChains/Darwin.cpp | 24 --
 clang/test/Driver/Xarch.c  |  8 +
 clang/test/Driver/offload-Xarch.c  | 39 ++
 6 files changed, 89 insertions(+), 39 deletions(-)
 create mode 100644 clang/test/Driver/offload-Xarch.c

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 0ab923fcdd5838..765b7f882e99a8 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group,
 def Xanalyzer : Separate<["-"], "Xanalyzer">,
   HelpText<"Pass  to the static analyzer">, MetaVarName<"">,
   Group;
-def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>;
+def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>,
+  HelpText<"Pass  to the compiliation if the target matches ">,
+  MetaVarName<" ">;
 def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>,
   HelpText<"Pass  to the CUDA/HIP host compilation">, 
MetaVarName<"">;
 def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>,
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,
+def offload_arch_EQ : Joined<["--"], "offload-arch=">,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Specify an offloading device architecture for CUDA, HIP, or 
OpenMP. (e.g. sm_35). "
"If 'native' is used the compiler will detect locally installed 
architectures. "
"For HIP offloading, the device architecture can be followed by 
target ID features "
"delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be 
specified more than once.">;
 def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">,
-  Flags<[NoXarchOption]>,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, 
gfx906) from the list of devices to compile for. "
"'all' resets the list to its default value.">;
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 912777a9808b4b..5a4737fb381e6a 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -3409,7 +3409,9 @@ class OffloadingActionBuilder final {
   // Collect all offload arch parameters, removing duplicates.
   std::set GpuArchs;
   bool Error = false;
-  for (Arg *A : Args) {
+  const ToolChain &TC = *ToolChains.front();
+  for (Arg *A : C.getArgsForToolChain(&TC, /*BoundArch=*/"",
+  AssociatedOffloadKind)) {
 if (!(A->getOption().matches(options::OPT_offload_arch_EQ) ||
   A->getOption().matches(options::OPT_no_offload_arch_EQ)))
   continue;
@@ -3420,7 +3422,6 @@ class OffloadingActionBuilder final {
   ArchStr == "all") {
 GpuArchs.clear();
   } else if (ArchStr == "native") {
-const ToolChain &TC = *ToolChains.front();
 auto GPUsOrErr = ToolChains.front()->getSystemGPUArchs(Args);
 if (!GPUsOrErr) {
   TC.getDriver().Diag(diag::err_drv_undetermined_gpu_arch)
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index ebc982096595e6..fc2a07ff26fc02 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs(
   A->getOption().matches(options::OPT_Xarch_host))
 ValuePos = 0;
 
-  unsigned Index = Args.g

[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

That's fun, well for now I've updated it to not hit that bug and also accept 
`-Xarch_sm_52 --offload-arch=sm_52` even though it's stupid.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Artem Belevich via cfe-commits

Artem-B wrote:

Also see: https://github.com/llvm/llvm-project/issues/110325

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

Here's something fun, `-O0` and `-O3` are accepted by `-Xarch` but `-O1` and 
`-O2` are rejected.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits


@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,

jhuber6 wrote:

I definitely agree now that it works on all the targets. I was actually shocked 
what I saw that `-Xarch_` didn't even have help text. Would that be a follow-up?

Also, if we *really* wanted the above to work, we could do a special-case 
handling that only accepts the pair of `-Xarch_ --offload-arch=` 
but I think that would be awful.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Artem Belevich via cfe-commits


@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,

Artem-B wrote:

Also, `-Xarch` has grown non-obvious usage nuances that should probably be 
mentioned in clang docs. We do mention them in openmp docs at the moment, but 
this is something that should be exposed somewhat more prominently, as this is 
pretty much the only mechanism we have for granular control of the offload 
build parameters. 

It's not uncommon to need different optimization options for different GPU 
variants. E.g. we want to compile with host-side debug info only, or specify 
different inlining/unrolling thresholds for different GPUs. All of those are 
effectively end-user facing options and should be documented.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits


@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,

jhuber6 wrote:

I don't think there's actually a way to do that unfortunately. When we query 
the like of active `--offload-arch` kinds we don't have a bound architecture 
yet. There's no way to know if the string *is* a CPU argument. So, the only 
case would be to reject usage of this altogether, which is clearly not useful 
because we have `-Xopenmp-target=` which is just a dumber version of this 
handling.

So, there's no way to detect the usage here and rejecting it flatly isn't 
desirable. The current behavior is that `-Xarch_gfx90a --offload-arch=gfx90a` 
will be unused.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Shilei Tian via cfe-commits


@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,

shiltian wrote:

then @yxsamliu made a very good point of avoiding use cases like that. that 
should be checked and errored out.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits


@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,

jhuber6 wrote:

That's how it behaves in Darwin I believe, and I think it's fine to accept both 
`-Xarch_amdgcn` for all AMD targets and `-Xarch_gfx90a` for just that one.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Shilei Tian via cfe-commits


@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,

shiltian wrote:

If that's the case, what does the `arch` mean in `-Xarch_`? It looks like it 
means both, which I'm not sure if that's a good idea.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,31 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s

jhuber6 wrote:

I added tests to make sure that `-Xarch-device` is equivalent to 
`-Xarch_nvptx64` and that `-Xarch_host` doesn't interfere with `-Xarch_nvptx64`.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits


@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,

jhuber6 wrote:

Overloaded term that means different things in different places. The `arch` in 
the LLVM triple refers to `amdgcn` in `amdgcn-amd-amdhsa` since that's the 
actual CPU microarchitecture. The `arch` in `-march` refers to the specific 
machine name for the processor that's being targeted. `-Xarch_gfx908` is valid 
because the `-Xarch_` option in non-offloading use also minds to `-march=` 
arguments AFAICT.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Artem Belevich via cfe-commits


@@ -0,0 +1,31 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s

Artem-B wrote:

> We have existing tests for those

We have a few scattered across hip-options.hip and openmp-offload-gpu.c and it 
would make sense to consolidate them here, now that we have the tests dedicated 
specifically to Xarch processing. 

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,31 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s

jhuber6 wrote:

Added some lines to make sure it works.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Shilei Tian via cfe-commits


@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,

shiltian wrote:

I thought `arch` in LLVM means things like `amdgcn`. In those ROCm runtime 
people tend to call `gfx1030` as "arch". That being said, I'm not sure if 
`-Xarch_gfx906` is a proper use at the first place.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/125421

>From 03852104d5945e0e92c97b68e993bd699b275ab5 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Sun, 2 Feb 2025 10:39:01 -0600
Subject: [PATCH] [Clang] Make `-Xarch_` handling generic for all toolchains

Summary:
Currently, `-Xarch_` is used to forward argument specially to certain
toolchains. Currently, this is only supported by the Darwin toolchain.
We want to be able to use this generically, and for offloading too. This
patch moves the handling out of the Darwin Toolchain and places it in
the `getArgsForToolchain` helper which is run before the arguments get
passed to the tools.

The main benefit here is that we now have a more generic version of
`-Xopenmp-target=`, which should probably just be deprecated.
Additionally, it allows us to specially pass arguments to different
architectures for offloading.

This patch is done in preparation for making selecting offloading
toolchains more generic, this will be helpful while people are moving
toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN,
NVPTX).
---
 clang/include/clang/Driver/Options.td  |  7 +++--
 clang/lib/Driver/ToolChain.cpp | 42 --
 clang/lib/Driver/ToolChains/Darwin.cpp | 24 ---
 clang/test/Driver/Xarch.c  |  8 +
 clang/test/Driver/offload-Xarch.c  | 32 
 5 files changed, 76 insertions(+), 37 deletions(-)
 create mode 100644 clang/test/Driver/offload-Xarch.c

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 0ab923fcdd5838..765b7f882e99a8 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group,
 def Xanalyzer : Separate<["-"], "Xanalyzer">,
   HelpText<"Pass  to the static analyzer">, MetaVarName<"">,
   Group;
-def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>;
+def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>,
+  HelpText<"Pass  to the compiliation if the target matches ">,
+  MetaVarName<" ">;
 def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>,
   HelpText<"Pass  to the CUDA/HIP host compilation">, 
MetaVarName<"">;
 def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>,
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,
+def offload_arch_EQ : Joined<["--"], "offload-arch=">,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Specify an offloading device architecture for CUDA, HIP, or 
OpenMP. (e.g. sm_35). "
"If 'native' is used the compiler will detect locally installed 
architectures. "
"For HIP offloading, the device architecture can be followed by 
target ID features "
"delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be 
specified more than once.">;
 def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">,
-  Flags<[NoXarchOption]>,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, 
gfx906) from the list of devices to compile for. "
"'all' resets the list to its default value.">;
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index ebc982096595e6..c25d1b6be14b50 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs(
   A->getOption().matches(options::OPT_Xarch_host))
 ValuePos = 0;
 
-  unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(ValuePos));
+  const InputArgList &BaseArgs = Args.getBaseArgs();
+  unsigned Index = BaseArgs.MakeIndex(A->getValue(ValuePos));
   unsigned Prev = Index;
   std::unique_ptr XarchArg(Opts.ParseOneArg(Args, Index));
 
@@ -1672,8 +1673,31 @@ void ToolChain::TranslateXarchArgs(
 Diags.Report(DiagID) << A->getAsString(Args);
 return;
   }
+
   XarchArg->setBaseArg(A);
   A = XarchArg.release();
+
+  // Linker input arguments require custom handling. The problem is that we
+  // have already constructed the phase actions, so we can not treat them as
+  // "input arguments".
+  if (A->getOption().hasFlag(options::LinkerInput)) {
+// Convert the argument into individual Zlinker_input_args. Need to do this
+// manually to avoid memory leaks with the allocated arguments.
+for (const char *Value : A->getValues()) {
+  auto Opt = Opts.getOption(options::OPT_Zlinker_input);
+  unsigned Index = BaseArgs.MakeIndex(Opt.getName(), Value);
+  auto NewArg =
+  new Arg(Opt, BaseArgs.MakeArgString(Opt.getPrefix() + Opt.getName()),
+  Index, BaseArgs.getArgString(Index + 1), A);
+
+  DAL->append(NewArg);
+

[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,31 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s

jhuber6 wrote:

We have tests for that in `openmp-offload-gpu.c` and `hip-options.hip` so I 
figured it wasn't necessary, but can add them if it will get the PR moving.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Artem Belevich via cfe-commits


@@ -0,0 +1,31 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s

Artem-B wrote:

Checks for host/device would still be useful, regardless of my confusion about 
argument processing above. 

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits


@@ -0,0 +1,31 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s

jhuber6 wrote:

We have existing tests for those, should I add more?

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Artem Belevich via cfe-commits


@@ -1697,19 +1721,17 @@ llvm::opt::DerivedArgList 
*ToolChain::TranslateXarchArgs(
 } else if (A->getOption().matches(options::OPT_Xarch_host)) {
   NeedTrans = !IsDevice;
   Skip = IsDevice;
-} else if (A->getOption().matches(options::OPT_Xarch__) && IsDevice) {
-  // Do not translate -Xarch_ options for non CUDA/HIP toolchain since
-  // they may need special translation.
-  // Skip this argument unless the architecture matches BoundArch
-  if (BoundArch.empty() || A->getValue(0) != BoundArch)
-Skip = true;
-  else
-NeedTrans = true;
+} else if (A->getOption().matches(options::OPT_Xarch__)) {

Artem-B wrote:

Ugh... Never mind, my brain was on vacation, apparently. Not sure if I need 
more coffee, or less coffee this morning. 

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits


@@ -1697,19 +1721,17 @@ llvm::opt::DerivedArgList 
*ToolChain::TranslateXarchArgs(
 } else if (A->getOption().matches(options::OPT_Xarch_host)) {
   NeedTrans = !IsDevice;
   Skip = IsDevice;
-} else if (A->getOption().matches(options::OPT_Xarch__) && IsDevice) {
-  // Do not translate -Xarch_ options for non CUDA/HIP toolchain since
-  // they may need special translation.
-  // Skip this argument unless the architecture matches BoundArch
-  if (BoundArch.empty() || A->getValue(0) != BoundArch)
-Skip = true;
-  else
-NeedTrans = true;
+} else if (A->getOption().matches(options::OPT_Xarch__)) {

jhuber6 wrote:

These are different options right? How could someone pass `-Xarch_device 
-Xarch_x86_64` or similar? It'd just be parsed as `-Xarch_device -option1 
-Xarch_x86_64 -option2` so they'd just be handled separately.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Artem Belevich via cfe-commits

https://github.com/Artem-B edited 
https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Artem Belevich via cfe-commits


@@ -1697,19 +1721,17 @@ llvm::opt::DerivedArgList 
*ToolChain::TranslateXarchArgs(
 } else if (A->getOption().matches(options::OPT_Xarch_host)) {
   NeedTrans = !IsDevice;
   Skip = IsDevice;
-} else if (A->getOption().matches(options::OPT_Xarch__) && IsDevice) {
-  // Do not translate -Xarch_ options for non CUDA/HIP toolchain since
-  // they may need special translation.
-  // Skip this argument unless the architecture matches BoundArch
-  if (BoundArch.empty() || A->getValue(0) != BoundArch)
-Skip = true;
-  else
-NeedTrans = true;
+} else if (A->getOption().matches(options::OPT_Xarch__)) {

Artem-B wrote:

Use of `-Xarch_host/device` should not block `-Xarch_`.


https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Artem Belevich via cfe-commits


@@ -0,0 +1,31 @@
+// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 
| FileCheck -check-prefix=O3ONCE %s
+// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | 
FileCheck -check-prefix=O3ONCE %s

Artem-B wrote:

More test cases are needed to handle combinations of `-Xarch_host/device` and 
`-Xarch-`. 



https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Artem Belevich via cfe-commits

https://github.com/Artem-B commented:

LGTM overall, with a few nits. 

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits

jhuber6 wrote:

> > Summary: Currently, `-Xarch_` is used to forward argument specially to 
> > certain toolchains. Currently, this is only supported by the Darwin 
> > toolchain. We want to be able to use this generically, and for offloading 
> > too. This patch moves the handling out of the Darwin Toolchain and places 
> > it in the `getArgsForToolchain` helper which is run before the arguments 
> > get passed to the tools.
> 
> I think this could use some editing. `-Xarch` is intended to set flags per 
> _target_. Same toolchain may handle more than one target. Perhaps rephrase 
> along the lines of "forward argument to the toolchain used for the given 
> target architecture"?

I don't think a single `ToolChain` can have multiple targets in the driver, but 
you can make separate `ToolChain` objects with a different triple, I think 
that's where the confusion lies. Right now this is just for the Triple.

> > this is only supported by the Darwin toolchain.
> 
> This is the confusing part. I'm pretty sure `-Xarch_host` `-Xarch_device` and 
> variety of `-Xarch_{gfx,sm}..` variants are also supported by HIP/Cuda 
> toolchains right now.
> 
> IMO, a better way to describe the situation is that MachO is the last 
> remaining special case implementation of Xarch and the patch folds it into a 
> common `Xarch` handling that's already used by offloading toolchains.

Yeah, I wasn't counting the host / device parts since they're separate flags. 
We do support the architecture part already but it's a special case that 
doesn't need to be there. Realistically this is just removing the MachO 
handling to simplify it and make it work everywhere.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Artem Belevich via cfe-commits

Artem-B wrote:

> Summary: Currently, `-Xarch_` is used to forward argument specially to 
> certain toolchains. Currently, this is only supported by the Darwin 
> toolchain. We want to be able to use this generically, and for offloading 
> too. This patch moves the handling out of the Darwin Toolchain and places it 
> in the `getArgsForToolchain` helper which is run before the arguments get 
> passed to the tools.

I think this could use some editing. `-Xarch` is intended to set flags per 
*target*. Same toolchain may handle more than one target. Perhaps rephrase 
along the lines of "forward argument to the toolchain used for the given target 
architecture"?

> this is only supported by the Darwin toolchain.

This is the confusing part. I'm pretty sure `-Xarch_host` `-Xarch_device` and 
variety of `-Xarch_{gfx,sm}..` variants are also supported by HIP/Cuda 
toolchains right now. 

IMO, a better way to describe the situation is that MachO is the last remaining 
special case implementation of Xarch and the patch folds it into a common 
`Xarch` handling that's already used by offloading toolchains.


https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Joseph Huber via cfe-commits


@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,

jhuber6 wrote:

using `-Xarch_amdgcn --offload-arch=gfx1030` is very meaningful for OpenMP 
where the user can enable multiple toolchains at the same time. I agree that 
use-case is not meaningful as it sets a weird case (Since we have no bound 
architecture while querying the device list, only the triple). I don't know how 
we'd diagnose that though.

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-03 Thread Yaxun Liu via cfe-commits


@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,

yxsamliu wrote:

For HIP, `--offload-arch=` is used by the driver to determine the GPU arch and 
`-Xarch_gfx906` is allowed to pass GPU arch specific options. The option 
sequences like `-Xarch_gfx906 --offload-arch=gfx1100` are not meaningful 
combinations, therefore `Flags<[NoXarchOption]` is used here to diagnose such 
invalid combinations. Removing this flag will lose diagnositcs for such invalid 
usage.

Can we diagnose such invalid usage manually?

https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-02 Thread Matt Arsenault via cfe-commits

https://github.com/arsenm edited 
https://github.com/llvm/llvm-project/pull/125421
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-02 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)


Changes

Summary:
Currently, `-Xarch_` is used to forward argument specially to certain
toolchains. Currently, this is only supported by the Darwin toolchain.
We want to be able to use this generically, and for offloading too. This
patch moves the handling out of the Darwin Toolchain and places it in
the `getArgsForToolchain` helper which is run before the arguments get
passed to the tools.

The main benefit here is that we now have a more generic version of
`-Xopenmp-target=`, which should probably just be deprecated.
Additionally, it allows us to specially pass arguments to different
architectures for offloading.

This patch is done in preparation for making selecting offloading
toolchains more generic, this will be helpful while people are moving
toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN,
NVPTX).


---
Full diff: https://github.com/llvm/llvm-project/pull/125421.diff


5 Files Affected:

- (modified) clang/include/clang/Driver/Options.td (+4-3) 
- (modified) clang/lib/Driver/ToolChain.cpp (+32-10) 
- (modified) clang/lib/Driver/ToolChains/Darwin.cpp (-24) 
- (modified) clang/test/Driver/Xarch.c (+8) 
- (added) clang/test/Driver/offload-Xarch.c (+31) 


``diff
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index d8123cc39fdc951..6dd9f2e8a9b1fc4 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group,
 def Xanalyzer : Separate<["-"], "Xanalyzer">,
   HelpText<"Pass  to the static analyzer">, MetaVarName<"">,
   Group;
-def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>;
+def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>,
+  HelpText<"Pass  to the compiliation if the target matches ">,
+  MetaVarName<" ">;
 def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>,
   HelpText<"Pass  to the CUDA/HIP host compilation">, 
MetaVarName<"">;
 def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>,
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,
+def offload_arch_EQ : Joined<["--"], "offload-arch=">,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Specify an offloading device architecture for CUDA, HIP, or 
OpenMP. (e.g. sm_35). "
"If 'native' is used the compiler will detect locally installed 
architectures. "
"For HIP offloading, the device architecture can be followed by 
target ID features "
"delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be 
specified more than once.">;
 def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">,
-  Flags<[NoXarchOption]>,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, 
gfx906) from the list of devices to compile for. "
"'all' resets the list to its default value.">;
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index ebc982096595e61..c25d1b6be14b50d 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs(
   A->getOption().matches(options::OPT_Xarch_host))
 ValuePos = 0;
 
-  unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(ValuePos));
+  const InputArgList &BaseArgs = Args.getBaseArgs();
+  unsigned Index = BaseArgs.MakeIndex(A->getValue(ValuePos));
   unsigned Prev = Index;
   std::unique_ptr XarchArg(Opts.ParseOneArg(Args, Index));
 
@@ -1672,8 +1673,31 @@ void ToolChain::TranslateXarchArgs(
 Diags.Report(DiagID) << A->getAsString(Args);
 return;
   }
+
   XarchArg->setBaseArg(A);
   A = XarchArg.release();
+
+  // Linker input arguments require custom handling. The problem is that we
+  // have already constructed the phase actions, so we can not treat them as
+  // "input arguments".
+  if (A->getOption().hasFlag(options::LinkerInput)) {
+// Convert the argument into individual Zlinker_input_args. Need to do this
+// manually to avoid memory leaks with the allocated arguments.
+for (const char *Value : A->getValues()) {
+  auto Opt = Opts.getOption(options::OPT_Zlinker_input);
+  unsigned Index = BaseArgs.MakeIndex(Opt.getName(), Value);
+  auto NewArg =
+  new Arg(Opt, BaseArgs.MakeArgString(Opt.getPrefix() + Opt.getName()),
+  Index, BaseArgs.getArgString(Index + 1), A);
+
+  DAL->append(NewArg);
+  if (!AllocatedArgs)
+DAL->AddSynthesizedArg(NewArg);
+  else
+AllocatedArgs->push_back(NewArg);
+}
+  }
+
   if (!AllocatedArgs)
 DAL->AddSynthesizedArg(A);
   else
@@ -1697,19 +1721,17 @@ llvm::opt::

[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)

2025-02-02 Thread Joseph Huber via cfe-commits

https://github.com/jhuber6 created 
https://github.com/llvm/llvm-project/pull/125421

Summary:
Currently, `-Xarch_` is used to forward argument specially to certain
toolchains. Currently, this is only supported by the Darwin toolchain.
We want to be able to use this generically, and for offloading too. This
patch moves the handling out of the Darwin Toolchain and places it in
the `getArgsForToolchain` helper which is run before the arguments get
passed to the tools.

The main benefit here is that we now have a more generic version of
`-Xopenmp-target=`, which should probably just be deprecated.
Additionally, it allows us to specially pass arguments to different
architectures for offloading.

This patch is done in preparation for making selecting offloading
toolchains more generic, this will be helpful while people are moving
toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN,
NVPTX).


>From d8f565a162841e20b26e141022e6884918ed5bfc Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Sun, 2 Feb 2025 10:39:01 -0600
Subject: [PATCH] [Clang] Make `-Xarch_` handling generic for all toolchains

Summary:
Currently, `-Xarch_` is used to forward argument specially to certain
toolchains. Currently, this is only supported by the Darwin toolchain.
We want to be able to use this generically, and for offloading too. This
patch moves the handling out of the Darwin Toolchain and places it in
the `getArgsForToolchain` helper which is run before the arguments get
passed to the tools.

The main benefit here is that we now have a more generic version of
`-Xopenmp-target=`, which should probably just be deprecated.
Additionally, it allows us to specially pass arguments to different
architectures for offloading.

This patch is done in preparation for making selecting offloading
toolchains more generic, this will be helpful while people are moving
toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN,
NVPTX).
---
 clang/include/clang/Driver/Options.td  |  7 +++--
 clang/lib/Driver/ToolChain.cpp | 42 --
 clang/lib/Driver/ToolChains/Darwin.cpp | 24 ---
 clang/test/Driver/Xarch.c  |  8 +
 clang/test/Driver/offload-Xarch.c  | 31 +++
 5 files changed, 75 insertions(+), 37 deletions(-)
 create mode 100644 clang/test/Driver/offload-Xarch.c

diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index d8123cc39fdc95..6dd9f2e8a9b1fc 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group,
 def Xanalyzer : Separate<["-"], "Xanalyzer">,
   HelpText<"Pass  to the static analyzer">, MetaVarName<"">,
   Group;
-def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>;
+def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>,
+  HelpText<"Pass  to the compiliation if the target matches ">,
+  MetaVarName<" ">;
 def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>,
   HelpText<"Pass  to the CUDA/HIP host compilation">, 
MetaVarName<"">;
 def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>,
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], 
"fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,
+def offload_arch_EQ : Joined<["--"], "offload-arch=">,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Specify an offloading device architecture for CUDA, HIP, or 
OpenMP. (e.g. sm_35). "
"If 'native' is used the compiler will detect locally installed 
architectures. "
"For HIP offloading, the device architecture can be followed by 
target ID features "
"delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be 
specified more than once.">;
 def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">,
-  Flags<[NoXarchOption]>,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, 
gfx906) from the list of devices to compile for. "
"'all' resets the list to its default value.">;
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index ebc982096595e6..c25d1b6be14b50 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs(
   A->getOption().matches(options::OPT_Xarch_host))
 ValuePos = 0;
 
-  unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(ValuePos));
+  const InputArgList &BaseArgs = Args.getBaseArgs();
+  unsigned Index = BaseArgs.MakeIndex(A->getValue(ValuePos));
   unsigned Prev = Index;
   std::unique_ptr XarchArg(Opts.ParseOneArg(Args, Index));
 
@@ -1672,8 +1673,31 @@ void ToolChain::TranslateXarchArgs(
 Diags.Report(DiagID) << A->getAsString(Args);
 r