[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
DavidSpickett wrote: I've pushed a fix for one of the tests on 32 bit Arm, as it failed on our bot: https://lab.llvm.org/buildbot/#/builders/154/builds/11413/steps/5/logs/FAIL__Clang__offload-Xarch_c Maybe your intent was specifically to have a line that uses the host architecture, if that was the case, we can figure out a `requires:` to somehow make that work. I have access to a machine to reproduce it. The other way to fix it was to use `-Xarch_nvptx32` instead but of course that failed on a 64-bit host. Let me know if you want to follow up on this. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
jhuber6 wrote: > I've pushed a fix for one of the tests on 32 bit Arm, as it failed on our > bot: > https://lab.llvm.org/buildbot/#/builders/154/builds/11413/steps/5/logs/FAIL__Clang__offload-Xarch_c > > Maybe your intent was specifically to have a line that uses the host > architecture, if that was the case, we can figure out a `requires:` to > somehow make that work. I have access to a machine to reproduce it. > > The other way to fix it was to use `-Xarch_nvptx32` instead but of course > that failed on a 64-bit host. > > Let me know if you want to follow up on this. Thanks! I totally forgot that `nvptx` existed, it's totally deprecated by NVIDIA so we should probably just reject it outright, but I'll leave that up to @Artem-B. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/jhuber6 closed https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,34 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" Artem-B wrote: Nit: Using `-D` as an argument to pass may let you unambiguously mark each subcompilation, and avoid triggering that -O1/-O2 problem. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group, def Xanalyzer : Separate<["-"], "Xanalyzer">, HelpText<"Pass to the static analyzer">, MetaVarName<"">, Group; -def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>; +def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>, + HelpText<"Pass to the compiliation if the target matches ">, Artem-B wrote: We do need a better documentation. It's not obvious that could be a GPU name or an arch name from the target triple. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,34 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" Artem-B wrote: Nit: I'd also add the check that -O3 is present in a compilation with fcuda-is-device. Otherwise the tests would succeed somewhere else. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" jhuber6 wrote: My complaints with `--offload-arch=` is that it's pulling double duty on `-mcpu` and `--target` options. It's wholly insufficient for OpenMP when targeting anything that isn't a GPU because of that. I'd prefer to have the tools to individually enable and control the toolchains. My perspective is that offloading is a product of all `--target=` values and `-mcpu=` values. That's what it boils down to internally, and I'd like to be able to compile things that way if necessary, i.e. `clang --offload-target=spirv64 foo.hip` makes more sense to me. Either way, does this patch LG now that part has been delayed until later? https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" Artem-B wrote: You're probably the one holding most of the puzzle pieces here. For CUDA/HIP things (mostly) work well enough. Even though there are somewhat useful attempts to target spir-V, they are half-baked at best. For OpenMP the target set is widely open, so I can see how it would need a more flexible way to customize the compilation pipeline construction. In abstract, part of the problem boils down to naming. You need to be able to tell the driver, what we want to target (currently we use --offload-arch= and -offload options for that) and how to select subsets of the constructed pipeline (-X). `--offload-arch=amdgcnspirv,gfx1030` looks like a reasonable approach to me, and the same naming scheme could be used as a selector part of `-Xarch`. What we have now may not be perfect, but I think it's reasonably functional. Is there a particular reason `--offload-arch=amdgcnspirv,gfx1030` is not sufficient to drive pipeline construction? I'm not sure why we want -Xarch to do that job. > Compared to those I really don't think -Xarch_ is a big deal, since it does > what you want, passes those arguments only to the Triple toolchain. My issue is that the patch was making -Xarch do the job it's not intended for *and* hiding real errors in the process. The "passing the arguments to the triple{-selected} toolchain" (as an wider-scope selector, compared to per-GPU or host/device ones we have now) part is fine. Let's handle pipeline creation challenge separately. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" jhuber6 wrote: Yes, the question is how do we separate architectures when we don't assume a single offloading toolchain. Right now we just wing it and guess based off of hard coded string values. I removed the contentious part, so can we at least land the easy fix. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" Artem-B wrote: > what is your recommended solution for mixing targets Can you be more specific? We seem to view things from very different angles, so I want to make sure we're talking about the same things here. > we can already target HIP and CUDA from SPIRV-64 That's one example where we're looking at it differently. The way I see it it's HIP/CUDA (as in front-end) can use pirv as the target toolchain. Is your question -- how do we tell the driver to construct yet another cc1 subcompilation variant? I do not have a ready answer for that, but if I had one, `-Xarch` probably would not be it. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" jhuber6 wrote: That being said, what is your recommended solution for mixing targets? This is not unique to OpenMP, we can already target HIP and CUDA from SPIRV-64 which is a different toolchain that can't accept `--offload-arch` arguments. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/125421 >From 3ed4040c18b4a980218a86a37ed8aabd0e395b20 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sun, 2 Feb 2025 10:39:01 -0600 Subject: [PATCH] [Clang] Make `-Xarch_` handling generic for all toolchains Summary: Currently, `-Xarch_` is used to forward argument specially to certain toolchains. Currently, this is only supported by the Darwin toolchain. We want to be able to use this generically, and for offloading too. This patch moves the handling out of the Darwin Toolchain and places it in the `getArgsForToolchain` helper which is run before the arguments get passed to the tools. The main benefit here is that we now have a more generic version of `-Xopenmp-target=`, which should probably just be deprecated. Additionally, it allows us to specially pass arguments to different architectures for offloading. This patch is done in preparation for making selecting offloading toolchains more generic, this will be helpful while people are moving toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN, NVPTX). --- clang/include/clang/Driver/Options.td | 8 +++-- clang/lib/Driver/Driver.cpp| 5 +-- clang/lib/Driver/ToolChain.cpp | 42 -- clang/lib/Driver/ToolChains/Darwin.cpp | 24 --- clang/test/Driver/Xarch.c | 8 + clang/test/Driver/offload-Xarch.c | 34 + 6 files changed, 82 insertions(+), 39 deletions(-) create mode 100644 clang/test/Driver/offload-Xarch.c diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 0ab923fcdd5838..55d10ed8e974af 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group, def Xanalyzer : Separate<["-"], "Xanalyzer">, HelpText<"Pass to the static analyzer">, MetaVarName<"">, Group; -def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>; +def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>, + HelpText<"Pass to the compiliation if the target matches ">, + MetaVarName<" ">; def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>, HelpText<"Pass to the CUDA/HIP host compilation">, MetaVarName<"">; def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>, @@ -1115,8 +1117,8 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, - Visibility<[ClangOption, FlangOption]>, +def offload_arch_EQ : Joined<["--"], "offload-arch=">, + Visibility<[ClangOption, FlangOption]>, Flags<[NoXarchOption]>, HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). " "If 'native' is used the compiler will detect locally installed architectures. " "For HIP offloading, the device architecture can be followed by target ID features " diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp index 912777a9808b4b..5a4737fb381e6a 100644 --- a/clang/lib/Driver/Driver.cpp +++ b/clang/lib/Driver/Driver.cpp @@ -3409,7 +3409,9 @@ class OffloadingActionBuilder final { // Collect all offload arch parameters, removing duplicates. std::set GpuArchs; bool Error = false; - for (Arg *A : Args) { + const ToolChain &TC = *ToolChains.front(); + for (Arg *A : C.getArgsForToolChain(&TC, /*BoundArch=*/"", + AssociatedOffloadKind)) { if (!(A->getOption().matches(options::OPT_offload_arch_EQ) || A->getOption().matches(options::OPT_no_offload_arch_EQ))) continue; @@ -3420,7 +3422,6 @@ class OffloadingActionBuilder final { ArchStr == "all") { GpuArchs.clear(); } else if (ArchStr == "native") { -const ToolChain &TC = *ToolChains.front(); auto GPUsOrErr = ToolChains.front()->getSystemGPUArchs(Args); if (!GPUsOrErr) { TC.getDriver().Diag(diag::err_drv_undetermined_gpu_arch) diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index ebc982096595e6..c25d1b6be14b50 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs( A->getOption().matches(options::OPT_Xarch_host)) ValuePos = 0; - unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(ValuePos)); + const InputArgList &BaseArgs = Args.getBaseArgs(); + unsigned Index = BaseArgs.MakeIndex(A->getValue(ValuePos)); unsigned Prev = Index; std::unique_ptr XarchArg(Opts.ParseOneArg(Args, Index)); @@ -1672,8 +1673,31 @@ void ToolChain::TranslateXarchArgs( Diags.Report(DiagID) << A->getAsStrin
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" jhuber6 wrote: CUDA / HIP doesn't reject this currently, but in an effort to move this debate somewhere else, I've omitted this change from the PR. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" Artem-B wrote: It all seems to boil down to "what does Xarch mean". The conventional `-X` model is "pass the arg to in the compilation pipeline". You seem to propose that for openMP it will be "use it to construct compiler pipeline". Not sure I like this dichotomy where we commingling pipeline construction with tweaking of already constructed pipeline. Perhaps a cleaner way would be to figure out a better way tof OpenMP to set up the build pipeline structure (that's what --offload-arch is for), and keep `Xarch` exclusively as a way to tweak the options for the already constructed pipeline. Presumably OpenMP already has the ways to specify the desired offloading targets. `Xarch` does not look like the right tool for that job. It can do it, but it opens a can of corner cases. If openMP does not have a viable better way to configure the compilation pipeline, we could make `-Xarch=backend. --offload-arch=target` a documented special case for OpenMP only. I would strongly prefer to keep CUDA/HIP to continue diagnosing `--offload-arch` being passed to `cc1`. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" jhuber6 wrote: Currently we have `-Xopenmp-target=` which does the same thing. OpenMP can do something like the following: ```console clang foo.c -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx1030,gfx90a -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_52,sm_90 ``` These are passed to the *driver* stage because we call `getArgsForToolchain` while processing the offloading architecture inputs. This call has no bound architecture which is the problem with parsing `-Xarch_sm_52 --offload-arch=sm_52`. There is no `-cc1` call here, it passes `-offload-arch` to the offloading toolchain, which then results in us building `N` compilation jobs for each of those architectures. `--offload-arch` is not accepted by `-cc1` at all, so it's not relevant here. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" Artem-B wrote: IMO this special case makes no sense. If I were to look at this command line in real life, my assumption would be that a user made a mistake and intended to write `--offload-arch=sm_52 -Xarch_sm_52 -some-option`. I.e. they targeted sm_52, and then wanted to tweak that compilation. In this case we effectively ignoring `-Xarch_sm_52` which was very likely *not* the user's intent. `cc1` reporting an error when it got `--offload-arch` would be a better approach IMO, giving the feedback that the user is doing something wrong. On the other hand you've mentioned that: > using -Xarch_amdgcn --offload-arch=gfx1030 is very meaningful for OpenMP > where the user can enable multiple toolchains at the same time. So, it looks like handling of these options is also language-dependent. For CUDA, blindly passing Xarch* options down to the compilation selected by Xarch kind (back-end, or specific GPU) and letting cc1 deal with those options would probably be acceptable. The same approach may also work for OpenMP, where cc1 can do something sensible with --offload-arch passed to it and does not have to error out. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" Artem-B wrote: I'm not sure I understand what exactly it's intended to do and how is that supposed to work. I'm missing something here. Can you elaborate on the intended use case here and walk me through it? So, the top-level driver sees `-Xarch_amdgcn`. I would assume that we want it to pass the following `--offloard-arch=gfx90a` to all cc1 subcompilations using amdgcn. What is `--offload-arch=gfx90a` expected to do in this case, once it's passed to cc1? I can see how it might be used if we have a single cc1 subcompilation to tell that cc1 invocation to target gfx90a, but that looks like an odd fix for an odd problem. IMO that's something that should be done by the top-level driver. If we have multiple cc1 subcompilation using amdgcn, then we'll end up with multiple cc1 invocations with potentially identical target... Not sure if that's going to cause troubles further down the compilation pipeline. E.g. can we incorporate N binaries for the same target? How will runtime figure out which one to load? https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" + +// RUN: %clang -x cuda %s -nogpulib -nogpuinc \ +// RUN: -Xarch_sm_51 --offload-arch=sm_52 -S -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC-WARN %s +// SPECIFIC-WARN: warning: argument unused during compilation: '-Xarch_sm_51 --offload-arch=sm_52' jhuber6 wrote: In this case, `--offload-arch=sm_52` would enable `sm_52` as a bound job, then it would make `-Xarch-sm_52` enable `--offload-arch=sm_80` but by the time it sees that the list was already generated so it wouldn't do anything. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/Artem-B commented: > > > --offload-arch= isn't an accepted -cc1 argument so it won't be forwarded > > > at all. > > > > > > Silently? That would be wrong, imo. It should be diagnosed somewhere. > > It's already an error if you pass it directly via -`Xclang` because it's not > an accepted `-cc1` argument. A lot of driver arguments are both driver and > `cc1` arguments so those get marshalled or forwarded. Yes, I'm aware of that. My question is -- with this patch, and my example command above which wants to forward --offload-arch, does `clang -cc1` report an error, or stays silent because the argument "won't be forwarded at all." ? Perhaps we should take a step back, document desired behavior/interactions between -Xarch, --offload-arch, and OpenMP/CUDA/HIP offloading modes, so we have somewhat consistent (or at least documented) behavior. Right now we seem to chase corner cases. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" jhuber6 wrote: But we *want* `-Xarch_amdgcn --offloard-arch=gfx90a` to work for cases like OpenMP which can combine multiple targets into a single compile and needs the lists to be separate. I guess we could go with @yxsamliu's suggestion and just error if the value isn't a valid triple architecture. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,44 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x cuda %s -Xarch_device -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -nogpulib -nogpuinc \ +// RUN: -Xarch_amdgcn -march=gfx90a -Xarch_amdgcn -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -nogpulib -nogpuinc \ +// RUN: -Xarch_nvptx64 -march=sm_52 -Xarch_nvptx64 -O3 -S -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=O3ONCE %s +// O3ONCE: "-O3" +// O3ONCE-NOT: "-O3" + +// RUN: %clang -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -nogpulib \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_nvptx64 --offload-arch=sm_52,sm_60 -nogpuinc \ +// RUN: -Xarch_amdgcn --offload-arch=gfx90a,gfx1030 -ccc-print-bindings -### %s 2>&1 \ +// RUN: | FileCheck -check-prefix=OPENMP %s +// +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[INPUT:.+]]"], output: "[[HOST_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX1030_BC:.+]]" +// OPENMP: # "amdgcn-amd-amdhsa" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[GFX90A_BC:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM52_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM52_PTX]]"], output: "[[SM52_CUBIN:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "clang", inputs: ["[[INPUT]]", "[[HOST_BC]]"], output: "[[SM60_PTX:.+]]" +// OPENMP: # "nvptx64-nvidia-cuda" - "NVPTX::Assembler", inputs: ["[[SM60_PTX]]"], output: "[[SM60_CUBIN:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["[[GFX1030_BC]]", "[[GFX90A_BC]]", "[[SM52_CUBIN]]", "[[SM60_CUBIN]]"], output: "[[BINARY:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "clang", inputs: ["[[HOST_BC]]", "[[BINARY]]"], output: "[[HOST_OBJ:.+]]" +// OPENMP: # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["[[HOST_OBJ]]"], output: "a.out" + +// RUN: %clang -x cuda %s --offload-arch=sm_52,sm_60 -Xarch_sm_52 -O3 -Xarch_sm_60 -O0 \ +// RUN: --target=x86_64-unknown-linux-gnu -Xarch_host -O3 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=CUDA %s +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" {{.*}}"-O3" +// CUDA: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_60" {{.*}}"-O0" +// CUDA: "-cc1" "-triple" "x86_64-unknown-linux-gnu" {{.*}}"-O3" + +// RUN: %clang -x cuda %s -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda \ +// RUN: -Xarch_sm_52 --offload-arch=sm_52 -S -nogpulib -nogpuinc -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC %s +// SPECIFIC: "-cc1" "-triple" "nvptx64-nvidia-cuda" {{.*}}"-target-cpu" "sm_52" + +// RUN: %clang -x cuda %s -nogpulib -nogpuinc \ +// RUN: -Xarch_sm_51 --offload-arch=sm_52 -S -### 2>&1 \ +// RUN: | FileCheck -check-prefix=SPECIFIC-WARN %s +// SPECIFIC-WARN: warning: argument unused during compilation: '-Xarch_sm_51 --offload-arch=sm_52' Artem-B wrote: What will happen when we *do* pass `-offload-arch=sm_80` to cc1? ``` --offload-arch=sm_52 -Xarch_sm52 --offload-arch=sm_80 --some-option-for_sm52 ``` https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/125421 >From 7773accdd4c3a6fc178c76dd974948dbe091e549 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sun, 2 Feb 2025 10:39:01 -0600 Subject: [PATCH 1/2] [Clang] Make `-Xarch_` handling generic for all toolchains Summary: Currently, `-Xarch_` is used to forward argument specially to certain toolchains. Currently, this is only supported by the Darwin toolchain. We want to be able to use this generically, and for offloading too. This patch moves the handling out of the Darwin Toolchain and places it in the `getArgsForToolchain` helper which is run before the arguments get passed to the tools. The main benefit here is that we now have a more generic version of `-Xopenmp-target=`, which should probably just be deprecated. Additionally, it allows us to specially pass arguments to different architectures for offloading. This patch is done in preparation for making selecting offloading toolchains more generic, this will be helpful while people are moving toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN, NVPTX). --- clang/include/clang/Driver/Options.td | 7 ++-- clang/lib/Driver/Driver.cpp| 5 +-- clang/lib/Driver/ToolChain.cpp | 45 -- clang/lib/Driver/ToolChains/Darwin.cpp | 24 -- clang/test/Driver/Xarch.c | 8 + clang/test/Driver/offload-Xarch.c | 39 ++ 6 files changed, 89 insertions(+), 39 deletions(-) create mode 100644 clang/test/Driver/offload-Xarch.c diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 0ab923fcdd5838c..765b7f882e99a85 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group, def Xanalyzer : Separate<["-"], "Xanalyzer">, HelpText<"Pass to the static analyzer">, MetaVarName<"">, Group; -def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>; +def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>, + HelpText<"Pass to the compiliation if the target matches ">, + MetaVarName<" ">; def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>, HelpText<"Pass to the CUDA/HIP host compilation">, MetaVarName<"">; def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>, @@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, +def offload_arch_EQ : Joined<["--"], "offload-arch=">, Visibility<[ClangOption, FlangOption]>, HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). " "If 'native' is used the compiler will detect locally installed architectures. " "For HIP offloading, the device architecture can be followed by target ID features " "delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.">; def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">, - Flags<[NoXarchOption]>, Visibility<[ClangOption, FlangOption]>, HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. " "'all' resets the list to its default value.">; diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp index 912777a9808b4be..5a4737fb381e6a0 100644 --- a/clang/lib/Driver/Driver.cpp +++ b/clang/lib/Driver/Driver.cpp @@ -3409,7 +3409,9 @@ class OffloadingActionBuilder final { // Collect all offload arch parameters, removing duplicates. std::set GpuArchs; bool Error = false; - for (Arg *A : Args) { + const ToolChain &TC = *ToolChains.front(); + for (Arg *A : C.getArgsForToolChain(&TC, /*BoundArch=*/"", + AssociatedOffloadKind)) { if (!(A->getOption().matches(options::OPT_offload_arch_EQ) || A->getOption().matches(options::OPT_no_offload_arch_EQ))) continue; @@ -3420,7 +3422,6 @@ class OffloadingActionBuilder final { ArchStr == "all") { GpuArchs.clear(); } else if (ArchStr == "native") { -const ToolChain &TC = *ToolChains.front(); auto GPUsOrErr = ToolChains.front()->getSystemGPUArchs(Args); if (!GPUsOrErr) { TC.getDriver().Diag(diag::err_drv_undetermined_gpu_arch) diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index ebc982096595e61..fc2a07ff26fc025 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs( A->getOption().matches(options::OPT_Xarch_host)) ValuePos = 0; - unsigned Ind
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
jhuber6 wrote: > > --offload-arch= isn't an accepted -cc1 argument so it won't be forwarded at > > all. > > Silently? That would be wrong, imo. It should be diagnosed somewhere. It's already an error if you pass it directly via -`Xclang` because it's not an accepted `-cc1` argument. A lot of driver arguments are both driver and `cc1` arguments so those get marshalled or forwarded. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
jhuber6 wrote: > > Right now if someone passes -Xarch_foo --offload-arch=gfx1030 and foo > > doesn't match it's not passed and it will print something like this. I > > figured that's good enough. > > This part SGTM, too. > > However, I don't think I've seen the answer what happens when we do pass > --offload arch to cc1. > > E.g. a user may accidentally paste an argument in the wrong place and instead > of intended `--offload-arch=sm_52 --offload-arch=sm_80 -Xarch_sm52 > --some-option-for_sm52` passes `--offload-arch=sm_52 -Xarch_sm52 > --offload-arch=sm_80 --some-option-for_sm52` ? `--offload-arch=` isn't an accepted `-cc1` argument so it won't be forwarded at all. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
Artem-B wrote: > --offload-arch= isn't an accepted -cc1 argument so it won't be forwarded at > all. Silently? That would be wrong, imo. It should be diagnosed somewhere. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
Artem-B wrote: > Right now if someone passes -Xarch_foo --offload-arch=gfx1030 and foo doesn't > match it's not passed and it will print something like this. I figured that's > good enough. This part SGTM, too. However, I don't think I've seen the answer what happens when we do pass --offload arch to cc1. E.g. a user may accidentally paste an argument in the wrong place and instead of intended `--offload-arch=sm_52 --offload-arch=sm_80 -Xarch_sm52 --some-option-for_sm52` passes `--offload-arch=sm_52 -Xarch_sm52 --offload-arch=sm_80 --some-option-for_sm52` ? https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
jhuber6 wrote: > > I don't think there's any use of --offload-arch outside of the driver. > > I agree. Yet we do need to deal with such nonsensical input in a consistent > manner. We do not control what the users give us, but we control how we > respond. Right now if someone passes `-Xarch_foo --offload-arch=gfx1030` and `foo` doesn't match it's not passed and it will print something like this. I figured that's good enough. ``` clang++: warning: argument unused during compilation: '-Xarch_sm_51 --offload-arch=sm_52' [-Wunused-command-line-argument]``` https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/125421 >From 7773accdd4c3a6fc178c76dd974948dbe091e549 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sun, 2 Feb 2025 10:39:01 -0600 Subject: [PATCH 1/2] [Clang] Make `-Xarch_` handling generic for all toolchains Summary: Currently, `-Xarch_` is used to forward argument specially to certain toolchains. Currently, this is only supported by the Darwin toolchain. We want to be able to use this generically, and for offloading too. This patch moves the handling out of the Darwin Toolchain and places it in the `getArgsForToolchain` helper which is run before the arguments get passed to the tools. The main benefit here is that we now have a more generic version of `-Xopenmp-target=`, which should probably just be deprecated. Additionally, it allows us to specially pass arguments to different architectures for offloading. This patch is done in preparation for making selecting offloading toolchains more generic, this will be helpful while people are moving toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN, NVPTX). --- clang/include/clang/Driver/Options.td | 7 ++-- clang/lib/Driver/Driver.cpp| 5 +-- clang/lib/Driver/ToolChain.cpp | 45 -- clang/lib/Driver/ToolChains/Darwin.cpp | 24 -- clang/test/Driver/Xarch.c | 8 + clang/test/Driver/offload-Xarch.c | 39 ++ 6 files changed, 89 insertions(+), 39 deletions(-) create mode 100644 clang/test/Driver/offload-Xarch.c diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 0ab923fcdd5838c..765b7f882e99a85 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group, def Xanalyzer : Separate<["-"], "Xanalyzer">, HelpText<"Pass to the static analyzer">, MetaVarName<"">, Group; -def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>; +def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>, + HelpText<"Pass to the compiliation if the target matches ">, + MetaVarName<" ">; def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>, HelpText<"Pass to the CUDA/HIP host compilation">, MetaVarName<"">; def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>, @@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, +def offload_arch_EQ : Joined<["--"], "offload-arch=">, Visibility<[ClangOption, FlangOption]>, HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). " "If 'native' is used the compiler will detect locally installed architectures. " "For HIP offloading, the device architecture can be followed by target ID features " "delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.">; def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">, - Flags<[NoXarchOption]>, Visibility<[ClangOption, FlangOption]>, HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. " "'all' resets the list to its default value.">; diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp index 912777a9808b4be..5a4737fb381e6a0 100644 --- a/clang/lib/Driver/Driver.cpp +++ b/clang/lib/Driver/Driver.cpp @@ -3409,7 +3409,9 @@ class OffloadingActionBuilder final { // Collect all offload arch parameters, removing duplicates. std::set GpuArchs; bool Error = false; - for (Arg *A : Args) { + const ToolChain &TC = *ToolChains.front(); + for (Arg *A : C.getArgsForToolChain(&TC, /*BoundArch=*/"", + AssociatedOffloadKind)) { if (!(A->getOption().matches(options::OPT_offload_arch_EQ) || A->getOption().matches(options::OPT_no_offload_arch_EQ))) continue; @@ -3420,7 +3422,6 @@ class OffloadingActionBuilder final { ArchStr == "all") { GpuArchs.clear(); } else if (ArchStr == "native") { -const ToolChain &TC = *ToolChains.front(); auto GPUsOrErr = ToolChains.front()->getSystemGPUArchs(Args); if (!GPUsOrErr) { TC.getDriver().Diag(diag::err_drv_undetermined_gpu_arch) diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index ebc982096595e61..fc2a07ff26fc025 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs( A->getOption().matches(options::OPT_Xarch_host)) ValuePos = 0; - unsigned Ind
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
Artem-B wrote: > I don't think there's any use of --offload-arch outside of the driver. I agree. Yet we do need to deal with such nonsensical input in a consistent manner. We do not control what the users give us, but we control how we respond. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
jhuber6 wrote: Good > > Right now it works as I'd expect, it passes --offload-arch=sm_52 to the > > sm_52 compilation, but no other architecture. > > What happens with that `--offload-arch=sm_52` when cc1 sees it? Ideally there > should be either an unused argument warning, or an error is the option is not > accepted by cc1. Currently cc1 errors out with `error: unknown argument: > '--offload-arch=sm_52`. If it continues to do so with your change, then we're > fine. I don't think there's any use of `--offload-arch` outside of the driver. If we're passing an arch list it'd be better to query it from the toolchain somehow. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
Artem-B wrote: > Right now it works as I'd expect, it passes --offload-arch=sm_52 to the sm_52 > compilation, but no other architecture. What happens with that `--offload-arch=sm_52` when cc1 sees it? Ideally there should be either an unused argument warning, or an error is the option is not accepted by cc1. Currently cc1 errors out with `error: unknown argument: '--offload-arch=sm_52`. If it continues to do so with your change, then we're fine. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, jhuber6 wrote: Right now `-Xarch_gfx1030 --offload-arch=gfx1030` works, but `-Xarch_gfx90a --offload-arch=gfx1030` doesn't. It's a little weird but I think it's the best interpretation of it we can do, otherwise it becomes difficult to figure out what to do without the bound architecture. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, yxsamliu wrote: For HIP, maybe check whether the `` in `-Xarch_` is a trple. `--offload-arch=` is only allowed with `-Xarch_` if `` is a triple. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/125421 >From 79000d0a1ecd1312fb9bc06af0369b66a133e5d4 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sun, 2 Feb 2025 10:39:01 -0600 Subject: [PATCH] [Clang] Make `-Xarch_` handling generic for all toolchains Summary: Currently, `-Xarch_` is used to forward argument specially to certain toolchains. Currently, this is only supported by the Darwin toolchain. We want to be able to use this generically, and for offloading too. This patch moves the handling out of the Darwin Toolchain and places it in the `getArgsForToolchain` helper which is run before the arguments get passed to the tools. The main benefit here is that we now have a more generic version of `-Xopenmp-target=`, which should probably just be deprecated. Additionally, it allows us to specially pass arguments to different architectures for offloading. This patch is done in preparation for making selecting offloading toolchains more generic, this will be helpful while people are moving toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN, NVPTX). --- clang/include/clang/Driver/Options.td | 7 ++-- clang/lib/Driver/Driver.cpp| 5 +-- clang/lib/Driver/ToolChain.cpp | 45 -- clang/lib/Driver/ToolChains/Darwin.cpp | 24 -- clang/test/Driver/Xarch.c | 8 + clang/test/Driver/offload-Xarch.c | 39 ++ 6 files changed, 89 insertions(+), 39 deletions(-) create mode 100644 clang/test/Driver/offload-Xarch.c diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 0ab923fcdd5838..765b7f882e99a8 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group, def Xanalyzer : Separate<["-"], "Xanalyzer">, HelpText<"Pass to the static analyzer">, MetaVarName<"">, Group; -def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>; +def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>, + HelpText<"Pass to the compiliation if the target matches ">, + MetaVarName<" ">; def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>, HelpText<"Pass to the CUDA/HIP host compilation">, MetaVarName<"">; def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>, @@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, +def offload_arch_EQ : Joined<["--"], "offload-arch=">, Visibility<[ClangOption, FlangOption]>, HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). " "If 'native' is used the compiler will detect locally installed architectures. " "For HIP offloading, the device architecture can be followed by target ID features " "delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.">; def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">, - Flags<[NoXarchOption]>, Visibility<[ClangOption, FlangOption]>, HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. " "'all' resets the list to its default value.">; diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp index 912777a9808b4b..5a4737fb381e6a 100644 --- a/clang/lib/Driver/Driver.cpp +++ b/clang/lib/Driver/Driver.cpp @@ -3409,7 +3409,9 @@ class OffloadingActionBuilder final { // Collect all offload arch parameters, removing duplicates. std::set GpuArchs; bool Error = false; - for (Arg *A : Args) { + const ToolChain &TC = *ToolChains.front(); + for (Arg *A : C.getArgsForToolChain(&TC, /*BoundArch=*/"", + AssociatedOffloadKind)) { if (!(A->getOption().matches(options::OPT_offload_arch_EQ) || A->getOption().matches(options::OPT_no_offload_arch_EQ))) continue; @@ -3420,7 +3422,6 @@ class OffloadingActionBuilder final { ArchStr == "all") { GpuArchs.clear(); } else if (ArchStr == "native") { -const ToolChain &TC = *ToolChains.front(); auto GPUsOrErr = ToolChains.front()->getSystemGPUArchs(Args); if (!GPUsOrErr) { TC.getDriver().Diag(diag::err_drv_undetermined_gpu_arch) diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index ebc982096595e6..fc2a07ff26fc02 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs( A->getOption().matches(options::OPT_Xarch_host)) ValuePos = 0; - unsigned Index = Args.g
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
jhuber6 wrote: That's fun, well for now I've updated it to not hit that bug and also accept `-Xarch_sm_52 --offload-arch=sm_52` even though it's stupid. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
Artem-B wrote: Also see: https://github.com/llvm/llvm-project/issues/110325 https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
jhuber6 wrote: Here's something fun, `-O0` and `-O3` are accepted by `-Xarch` but `-O1` and `-O2` are rejected. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, jhuber6 wrote: I definitely agree now that it works on all the targets. I was actually shocked what I saw that `-Xarch_` didn't even have help text. Would that be a follow-up? Also, if we *really* wanted the above to work, we could do a special-case handling that only accepts the pair of `-Xarch_ --offload-arch=` but I think that would be awful. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, Artem-B wrote: Also, `-Xarch` has grown non-obvious usage nuances that should probably be mentioned in clang docs. We do mention them in openmp docs at the moment, but this is something that should be exposed somewhat more prominently, as this is pretty much the only mechanism we have for granular control of the offload build parameters. It's not uncommon to need different optimization options for different GPU variants. E.g. we want to compile with host-side debug info only, or specify different inlining/unrolling thresholds for different GPUs. All of those are effectively end-user facing options and should be documented. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, jhuber6 wrote: I don't think there's actually a way to do that unfortunately. When we query the like of active `--offload-arch` kinds we don't have a bound architecture yet. There's no way to know if the string *is* a CPU argument. So, the only case would be to reject usage of this altogether, which is clearly not useful because we have `-Xopenmp-target=` which is just a dumber version of this handling. So, there's no way to detect the usage here and rejecting it flatly isn't desirable. The current behavior is that `-Xarch_gfx90a --offload-arch=gfx90a` will be unused. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, shiltian wrote: then @yxsamliu made a very good point of avoiding use cases like that. that should be checked and errored out. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, jhuber6 wrote: That's how it behaves in Darwin I believe, and I think it's fine to accept both `-Xarch_amdgcn` for all AMD targets and `-Xarch_gfx90a` for just that one. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, shiltian wrote: If that's the case, what does the `arch` mean in `-Xarch_`? It looks like it means both, which I'm not sure if that's a good idea. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,31 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s jhuber6 wrote: I added tests to make sure that `-Xarch-device` is equivalent to `-Xarch_nvptx64` and that `-Xarch_host` doesn't interfere with `-Xarch_nvptx64`. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, jhuber6 wrote: Overloaded term that means different things in different places. The `arch` in the LLVM triple refers to `amdgcn` in `amdgcn-amd-amdhsa` since that's the actual CPU microarchitecture. The `arch` in `-march` refers to the specific machine name for the processor that's being targeted. `-Xarch_gfx908` is valid because the `-Xarch_` option in non-offloading use also minds to `-march=` arguments AFAICT. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,31 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s Artem-B wrote: > We have existing tests for those We have a few scattered across hip-options.hip and openmp-offload-gpu.c and it would make sense to consolidate them here, now that we have the tests dedicated specifically to Xarch processing. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,31 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s jhuber6 wrote: Added some lines to make sure it works. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, shiltian wrote: I thought `arch` in LLVM means things like `amdgcn`. In those ROCm runtime people tend to call `gfx1030` as "arch". That being said, I'm not sure if `-Xarch_gfx906` is a proper use at the first place. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/125421 >From 03852104d5945e0e92c97b68e993bd699b275ab5 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sun, 2 Feb 2025 10:39:01 -0600 Subject: [PATCH] [Clang] Make `-Xarch_` handling generic for all toolchains Summary: Currently, `-Xarch_` is used to forward argument specially to certain toolchains. Currently, this is only supported by the Darwin toolchain. We want to be able to use this generically, and for offloading too. This patch moves the handling out of the Darwin Toolchain and places it in the `getArgsForToolchain` helper which is run before the arguments get passed to the tools. The main benefit here is that we now have a more generic version of `-Xopenmp-target=`, which should probably just be deprecated. Additionally, it allows us to specially pass arguments to different architectures for offloading. This patch is done in preparation for making selecting offloading toolchains more generic, this will be helpful while people are moving toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN, NVPTX). --- clang/include/clang/Driver/Options.td | 7 +++-- clang/lib/Driver/ToolChain.cpp | 42 -- clang/lib/Driver/ToolChains/Darwin.cpp | 24 --- clang/test/Driver/Xarch.c | 8 + clang/test/Driver/offload-Xarch.c | 32 5 files changed, 76 insertions(+), 37 deletions(-) create mode 100644 clang/test/Driver/offload-Xarch.c diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index 0ab923fcdd5838..765b7f882e99a8 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group, def Xanalyzer : Separate<["-"], "Xanalyzer">, HelpText<"Pass to the static analyzer">, MetaVarName<"">, Group; -def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>; +def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>, + HelpText<"Pass to the compiliation if the target matches ">, + MetaVarName<" ">; def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>, HelpText<"Pass to the CUDA/HIP host compilation">, MetaVarName<"">; def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>, @@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, +def offload_arch_EQ : Joined<["--"], "offload-arch=">, Visibility<[ClangOption, FlangOption]>, HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). " "If 'native' is used the compiler will detect locally installed architectures. " "For HIP offloading, the device architecture can be followed by target ID features " "delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.">; def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">, - Flags<[NoXarchOption]>, Visibility<[ClangOption, FlangOption]>, HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. " "'all' resets the list to its default value.">; diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index ebc982096595e6..c25d1b6be14b50 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs( A->getOption().matches(options::OPT_Xarch_host)) ValuePos = 0; - unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(ValuePos)); + const InputArgList &BaseArgs = Args.getBaseArgs(); + unsigned Index = BaseArgs.MakeIndex(A->getValue(ValuePos)); unsigned Prev = Index; std::unique_ptr XarchArg(Opts.ParseOneArg(Args, Index)); @@ -1672,8 +1673,31 @@ void ToolChain::TranslateXarchArgs( Diags.Report(DiagID) << A->getAsString(Args); return; } + XarchArg->setBaseArg(A); A = XarchArg.release(); + + // Linker input arguments require custom handling. The problem is that we + // have already constructed the phase actions, so we can not treat them as + // "input arguments". + if (A->getOption().hasFlag(options::LinkerInput)) { +// Convert the argument into individual Zlinker_input_args. Need to do this +// manually to avoid memory leaks with the allocated arguments. +for (const char *Value : A->getValues()) { + auto Opt = Opts.getOption(options::OPT_Zlinker_input); + unsigned Index = BaseArgs.MakeIndex(Opt.getName(), Value); + auto NewArg = + new Arg(Opt, BaseArgs.MakeArgString(Opt.getPrefix() + Opt.getName()), + Index, BaseArgs.getArgString(Index + 1), A); + + DAL->append(NewArg); +
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,31 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s jhuber6 wrote: We have tests for that in `openmp-offload-gpu.c` and `hip-options.hip` so I figured it wasn't necessary, but can add them if it will get the PR moving. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,31 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s Artem-B wrote: Checks for host/device would still be useful, regardless of my confusion about argument processing above. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,31 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s jhuber6 wrote: We have existing tests for those, should I add more? https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1697,19 +1721,17 @@ llvm::opt::DerivedArgList *ToolChain::TranslateXarchArgs( } else if (A->getOption().matches(options::OPT_Xarch_host)) { NeedTrans = !IsDevice; Skip = IsDevice; -} else if (A->getOption().matches(options::OPT_Xarch__) && IsDevice) { - // Do not translate -Xarch_ options for non CUDA/HIP toolchain since - // they may need special translation. - // Skip this argument unless the architecture matches BoundArch - if (BoundArch.empty() || A->getValue(0) != BoundArch) -Skip = true; - else -NeedTrans = true; +} else if (A->getOption().matches(options::OPT_Xarch__)) { Artem-B wrote: Ugh... Never mind, my brain was on vacation, apparently. Not sure if I need more coffee, or less coffee this morning. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1697,19 +1721,17 @@ llvm::opt::DerivedArgList *ToolChain::TranslateXarchArgs( } else if (A->getOption().matches(options::OPT_Xarch_host)) { NeedTrans = !IsDevice; Skip = IsDevice; -} else if (A->getOption().matches(options::OPT_Xarch__) && IsDevice) { - // Do not translate -Xarch_ options for non CUDA/HIP toolchain since - // they may need special translation. - // Skip this argument unless the architecture matches BoundArch - if (BoundArch.empty() || A->getValue(0) != BoundArch) -Skip = true; - else -NeedTrans = true; +} else if (A->getOption().matches(options::OPT_Xarch__)) { jhuber6 wrote: These are different options right? How could someone pass `-Xarch_device -Xarch_x86_64` or similar? It'd just be parsed as `-Xarch_device -option1 -Xarch_x86_64 -option2` so they'd just be handled separately. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1697,19 +1721,17 @@ llvm::opt::DerivedArgList *ToolChain::TranslateXarchArgs( } else if (A->getOption().matches(options::OPT_Xarch_host)) { NeedTrans = !IsDevice; Skip = IsDevice; -} else if (A->getOption().matches(options::OPT_Xarch__) && IsDevice) { - // Do not translate -Xarch_ options for non CUDA/HIP toolchain since - // they may need special translation. - // Skip this argument unless the architecture matches BoundArch - if (BoundArch.empty() || A->getValue(0) != BoundArch) -Skip = true; - else -NeedTrans = true; +} else if (A->getOption().matches(options::OPT_Xarch__)) { Artem-B wrote: Use of `-Xarch_host/device` should not block `-Xarch_`. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -0,0 +1,31 @@ +// RUN: %clang -x cuda %s -Xarch_nvptx64 -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s +// RUN: %clang -x hip %s -Xarch_amdgcn -O3 -S -nogpulib -nogpuinc -### 2>&1 | FileCheck -check-prefix=O3ONCE %s Artem-B wrote: More test cases are needed to handle combinations of `-Xarch_host/device` and `-Xarch-`. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/Artem-B commented: LGTM overall, with a few nits. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
jhuber6 wrote: > > Summary: Currently, `-Xarch_` is used to forward argument specially to > > certain toolchains. Currently, this is only supported by the Darwin > > toolchain. We want to be able to use this generically, and for offloading > > too. This patch moves the handling out of the Darwin Toolchain and places > > it in the `getArgsForToolchain` helper which is run before the arguments > > get passed to the tools. > > I think this could use some editing. `-Xarch` is intended to set flags per > _target_. Same toolchain may handle more than one target. Perhaps rephrase > along the lines of "forward argument to the toolchain used for the given > target architecture"? I don't think a single `ToolChain` can have multiple targets in the driver, but you can make separate `ToolChain` objects with a different triple, I think that's where the confusion lies. Right now this is just for the Triple. > > this is only supported by the Darwin toolchain. > > This is the confusing part. I'm pretty sure `-Xarch_host` `-Xarch_device` and > variety of `-Xarch_{gfx,sm}..` variants are also supported by HIP/Cuda > toolchains right now. > > IMO, a better way to describe the situation is that MachO is the last > remaining special case implementation of Xarch and the patch folds it into a > common `Xarch` handling that's already used by offloading toolchains. Yeah, I wasn't counting the host / device parts since they're separate flags. We do support the architecture part already but it's a special case that doesn't need to be there. Realistically this is just removing the MachO handling to simplify it and make it work everywhere. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
Artem-B wrote: > Summary: Currently, `-Xarch_` is used to forward argument specially to > certain toolchains. Currently, this is only supported by the Darwin > toolchain. We want to be able to use this generically, and for offloading > too. This patch moves the handling out of the Darwin Toolchain and places it > in the `getArgsForToolchain` helper which is run before the arguments get > passed to the tools. I think this could use some editing. `-Xarch` is intended to set flags per *target*. Same toolchain may handle more than one target. Perhaps rephrase along the lines of "forward argument to the toolchain used for the given target architecture"? > this is only supported by the Darwin toolchain. This is the confusing part. I'm pretty sure `-Xarch_host` `-Xarch_device` and variety of `-Xarch_{gfx,sm}..` variants are also supported by HIP/Cuda toolchains right now. IMO, a better way to describe the situation is that MachO is the last remaining special case implementation of Xarch and the patch folds it into a common `Xarch` handling that's already used by offloading toolchains. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, jhuber6 wrote: using `-Xarch_amdgcn --offload-arch=gfx1030` is very meaningful for OpenMP where the user can enable multiple toolchains at the same time. I agree that use-case is not meaningful as it sets a weird case (Since we have no bound architecture while querying the device list, only the triple). I don't know how we'd diagnose that though. https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
@@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, yxsamliu wrote: For HIP, `--offload-arch=` is used by the driver to determine the GPU arch and `-Xarch_gfx906` is allowed to pass GPU arch specific options. The option sequences like `-Xarch_gfx906 --offload-arch=gfx1100` are not meaningful combinations, therefore `Flags<[NoXarchOption]` is used here to diagnose such invalid combinations. Removing this flag will lose diagnositcs for such invalid usage. Can we diagnose such invalid usage manually? https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/125421 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) Changes Summary: Currently, `-Xarch_` is used to forward argument specially to certain toolchains. Currently, this is only supported by the Darwin toolchain. We want to be able to use this generically, and for offloading too. This patch moves the handling out of the Darwin Toolchain and places it in the `getArgsForToolchain` helper which is run before the arguments get passed to the tools. The main benefit here is that we now have a more generic version of `-Xopenmp-target=`, which should probably just be deprecated. Additionally, it allows us to specially pass arguments to different architectures for offloading. This patch is done in preparation for making selecting offloading toolchains more generic, this will be helpful while people are moving toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN, NVPTX). --- Full diff: https://github.com/llvm/llvm-project/pull/125421.diff 5 Files Affected: - (modified) clang/include/clang/Driver/Options.td (+4-3) - (modified) clang/lib/Driver/ToolChain.cpp (+32-10) - (modified) clang/lib/Driver/ToolChains/Darwin.cpp (-24) - (modified) clang/test/Driver/Xarch.c (+8) - (added) clang/test/Driver/offload-Xarch.c (+31) ``diff diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index d8123cc39fdc951..6dd9f2e8a9b1fc4 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group, def Xanalyzer : Separate<["-"], "Xanalyzer">, HelpText<"Pass to the static analyzer">, MetaVarName<"">, Group; -def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>; +def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>, + HelpText<"Pass to the compiliation if the target matches ">, + MetaVarName<" ">; def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>, HelpText<"Pass to the CUDA/HIP host compilation">, MetaVarName<"">; def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>, @@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, +def offload_arch_EQ : Joined<["--"], "offload-arch=">, Visibility<[ClangOption, FlangOption]>, HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). " "If 'native' is used the compiler will detect locally installed architectures. " "For HIP offloading, the device architecture can be followed by target ID features " "delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.">; def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">, - Flags<[NoXarchOption]>, Visibility<[ClangOption, FlangOption]>, HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. " "'all' resets the list to its default value.">; diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index ebc982096595e61..c25d1b6be14b50d 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs( A->getOption().matches(options::OPT_Xarch_host)) ValuePos = 0; - unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(ValuePos)); + const InputArgList &BaseArgs = Args.getBaseArgs(); + unsigned Index = BaseArgs.MakeIndex(A->getValue(ValuePos)); unsigned Prev = Index; std::unique_ptr XarchArg(Opts.ParseOneArg(Args, Index)); @@ -1672,8 +1673,31 @@ void ToolChain::TranslateXarchArgs( Diags.Report(DiagID) << A->getAsString(Args); return; } + XarchArg->setBaseArg(A); A = XarchArg.release(); + + // Linker input arguments require custom handling. The problem is that we + // have already constructed the phase actions, so we can not treat them as + // "input arguments". + if (A->getOption().hasFlag(options::LinkerInput)) { +// Convert the argument into individual Zlinker_input_args. Need to do this +// manually to avoid memory leaks with the allocated arguments. +for (const char *Value : A->getValues()) { + auto Opt = Opts.getOption(options::OPT_Zlinker_input); + unsigned Index = BaseArgs.MakeIndex(Opt.getName(), Value); + auto NewArg = + new Arg(Opt, BaseArgs.MakeArgString(Opt.getPrefix() + Opt.getName()), + Index, BaseArgs.getArgString(Index + 1), A); + + DAL->append(NewArg); + if (!AllocatedArgs) +DAL->AddSynthesizedArg(NewArg); + else +AllocatedArgs->push_back(NewArg); +} + } + if (!AllocatedArgs) DAL->AddSynthesizedArg(A); else @@ -1697,19 +1721,17 @@ llvm::opt::
[clang] [Clang] Make `-Xarch_` handling generic for all toolchains (PR #125421)
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/125421 Summary: Currently, `-Xarch_` is used to forward argument specially to certain toolchains. Currently, this is only supported by the Darwin toolchain. We want to be able to use this generically, and for offloading too. This patch moves the handling out of the Darwin Toolchain and places it in the `getArgsForToolchain` helper which is run before the arguments get passed to the tools. The main benefit here is that we now have a more generic version of `-Xopenmp-target=`, which should probably just be deprecated. Additionally, it allows us to specially pass arguments to different architectures for offloading. This patch is done in preparation for making selecting offloading toolchains more generic, this will be helpful while people are moving toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN, NVPTX). >From d8f565a162841e20b26e141022e6884918ed5bfc Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Sun, 2 Feb 2025 10:39:01 -0600 Subject: [PATCH] [Clang] Make `-Xarch_` handling generic for all toolchains Summary: Currently, `-Xarch_` is used to forward argument specially to certain toolchains. Currently, this is only supported by the Darwin toolchain. We want to be able to use this generically, and for offloading too. This patch moves the handling out of the Darwin Toolchain and places it in the `getArgsForToolchain` helper which is run before the arguments get passed to the tools. The main benefit here is that we now have a more generic version of `-Xopenmp-target=`, which should probably just be deprecated. Additionally, it allows us to specially pass arguments to different architectures for offloading. This patch is done in preparation for making selecting offloading toolchains more generic, this will be helpful while people are moving toward compile jobs that include multiple toolchins (SPIR-V, AMDGCN, NVPTX). --- clang/include/clang/Driver/Options.td | 7 +++-- clang/lib/Driver/ToolChain.cpp | 42 -- clang/lib/Driver/ToolChains/Darwin.cpp | 24 --- clang/test/Driver/Xarch.c | 8 + clang/test/Driver/offload-Xarch.c | 31 +++ 5 files changed, 75 insertions(+), 37 deletions(-) create mode 100644 clang/test/Driver/offload-Xarch.c diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td index d8123cc39fdc95..6dd9f2e8a9b1fc 100644 --- a/clang/include/clang/Driver/Options.td +++ b/clang/include/clang/Driver/Options.td @@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group, def Xanalyzer : Separate<["-"], "Xanalyzer">, HelpText<"Pass to the static analyzer">, MetaVarName<"">, Group; -def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>; +def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>, + HelpText<"Pass to the compiliation if the target matches ">, + MetaVarName<" ">; def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>, HelpText<"Pass to the CUDA/HIP host compilation">, MetaVarName<"">; def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>, @@ -1115,14 +1117,13 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">, // Common offloading options let Group = offload_Group in { -def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>, +def offload_arch_EQ : Joined<["--"], "offload-arch=">, Visibility<[ClangOption, FlangOption]>, HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). " "If 'native' is used the compiler will detect locally installed architectures. " "For HIP offloading, the device architecture can be followed by target ID features " "delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.">; def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">, - Flags<[NoXarchOption]>, Visibility<[ClangOption, FlangOption]>, HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. " "'all' resets the list to its default value.">; diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp index ebc982096595e6..c25d1b6be14b50 100644 --- a/clang/lib/Driver/ToolChain.cpp +++ b/clang/lib/Driver/ToolChain.cpp @@ -1648,7 +1648,8 @@ void ToolChain::TranslateXarchArgs( A->getOption().matches(options::OPT_Xarch_host)) ValuePos = 0; - unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(ValuePos)); + const InputArgList &BaseArgs = Args.getBaseArgs(); + unsigned Index = BaseArgs.MakeIndex(A->getValue(ValuePos)); unsigned Prev = Index; std::unique_ptr XarchArg(Opts.ParseOneArg(Args, Index)); @@ -1672,8 +1673,31 @@ void ToolChain::TranslateXarchArgs( Diags.Report(DiagID) << A->getAsString(Args); r