[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CGCUDARuntime.h:56-70 +enum OffloadRegionEntryKindFlag : uint32_t { + /// Mark the region entry as a kernel. + OffloadRegionKernelEntry = 0x0, +}; + +/// The kind flag of the global variable entry. +

[PATCH] D124866: [CUDA][HIP] support __noinline__ as keyword

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > CUDA/HIP do not have language spec. Well. It's not completely true. CUDA programming guide does serve as the de-facto spec for CUDA. It's far from perfect, but it does mention `__noinline__` and `__forceinline__` as function qualifiers: https://docs.nvidia.com/cuda/cuda-

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D123471#3497224 , @jhuber6 wrote: > It's a little tough, I chose that format because it's exactly the same as we > use with OpenMP with a few different flags. I wish whoever initially designed > the struct made the ``reserved`` f

[PATCH] D124382: [Clang] Recognize target address space in superset calculation

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/AST/Type.h:508 + static_cast(LangAS::FirstTargetAddressSpace)); +// When dealing with target AS return true if: +// * A is equal to B, or jchlanda wrote: > tra wrote: > > Is t

[PATCH] D124866: [CUDA][HIP] support __noinline__ as keyword

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a reviewer: rsmith. tra added a subscriber: rsmith. tra added a comment. > I don't know how language extensions come about in CUDA or HIP -- is there an > appropriate standards body (or something similar) that's aware of this > extension and supports it? Summoning @rsmith for his lang

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-05-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > Type struct __tgt_offload_entry { > > void*addr; // Pointer to the offload entry info. > // (function or global) > char*name; // Name of the function or global. > size_t size; // Size of the entry info (0 if it a function)

[PATCH] D125050: [OpenMP] Try to Infer target triples using the offloading architecture

2022-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Just a drive-by style review. I didn't look at the functionality of the changes much. Comment at: clang/lib/Driver/Driver.cpp:788-789 + options::OPT_fno_openmp, false) && + (C.getInputArgs().hasArg(options::OPT_fopenmp_ta

[PATCH] D124721: [OpenMP] Allow compiling multiple target architectures with OpenMP

2022-05-05 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM in general. Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:313 + if (BoundArch.empty()) +checkSystemForAMDGPU(Args, *this, Arch); DAL->AddJoinedA

[PATCH] D124842: [NFC][CUDA][HIP] rework mangling number for aux target

2022-05-03 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Comment at: clang/lib/AST/ASTContext.cpp:11770-11778 + if (!LangOpts.CUDA || LangOpts.CUDAIsDevice) { +assert(!ForAuxTarget && "Only CUDA/HIP host compilation supports m

[PATCH] D124721: [OpenMP] Allow compiling multiple target architectures with OpenMP

2022-05-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Comment at: clang/test/Driver/openmp-offload-gpu-new.c:56 + +// RUN: %clang -### --target=x86_64-unknown-linux-gnu -ccc-print-bindings -fopenmp \ +// RUN: -fopenmp-targets=nvptx64-nvidia-cuda,amdgcn-amd-amdhsa -Xopenmp-target=nvptx64-nvidia-cuda --

[PATCH] D123325: [Clang] Make enabling the new driver more generic

2022-04-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2535-2538 +def foffload_new_driver : Flag<["-"], "foffload-new-driver">, Flags<[CC1Option]>, Group, + HelpText<"Use the new driver for offloading compilation.">; +def fno_offload_new_driver : Flag<["-"

[PATCH] D123325: [Clang] Make enabling the new driver more generic

2022-04-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2535-2538 +def foffload_new_driver : Flag<["-"], "foffload-new-driver">, Flags<[CC1Option]>, Group, + HelpText<"Use the new driver for offloading compilation.">; +def fno_offload_new_driver : Flag<["-"

[PATCH] D124220: [OpenMP] Add options to only compile the host or device when offloading

2022-04-27 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a minor nit. Comment at: clang/lib/Driver/Driver.cpp:4230 + // compile action to embed it in. If preprocessing only ignore embedding. + if ((!isa(HostAction) && PL.back

[PATCH] D124220: [OpenMP] Add options to only compile the host or device when offloading

2022-04-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2527 HelpText<"Use the static host OpenMP runtime while linking.">; +def offload_device_only : Flag<["-"], "offload-device-only">, + HelpText<"Only compile for the offloading device.">; -

[PATCH] D124545: [HIP] Add HIP runtime library arguments for linker

2022-04-27 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/ToolChains/Linux.cpp:681 + ArgStringList &CmdArgs) const { + CmdArgs.push_back( + Args.MakeArgString(StringRe

[PATCH] D124220: [OpenMP] Add options to only compile the host or device when offloading

2022-04-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2527 HelpText<"Use the static host OpenMP runtime while linking.">; +def offload_device_only : Flag<["-"], "offload-device-only">, + HelpText<"Only compile for the offloading device.">; -

[PATCH] D124220: [OpenMP] Add options to only compile the host or device when offloading

2022-04-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2535-2538 +def foffload_device_only : Flag<["-"], "foffload-device-only">, + HelpText<"Only compile for the offloading device.">; +def foffload_host_only : Flag<["-"], "foffload-host-only">, + HelpText

[PATCH] D124220: [OpenMP] Add options to only compile the host or device when offloading

2022-04-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2535-2538 +def foffload_device_only : Flag<["-"], "foffload-device-only">, + HelpText<"Only compile for the offloading device.">; +def foffload_host_only : Flag<["-"], "foffload-host-only">, + HelpText

[PATCH] D124220: [OpenMP] Add options to only compile the host or device when offloading

2022-04-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:2535-2538 +def foffload_device_only : Flag<["-"], "foffload-device-only">, + HelpText<"Only compile for the offloading device.">; +def foffload_host_only : Flag<["-"], "foffload-host-only">, + HelpText

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-04-26 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. In D120272#3475155 , @jhuber6 wrote: > Changing this to simply require that the user manually passes `-fgpu-rdc` in > order to use the new driver. I think t

[PATCH] D124382: [Clang] Recognize target address space in superset calculation

2022-04-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/AST/Type.h:486 + bool IsSYCLOrOpenCL = false) { +if (ASMap) { + bool IsATargetAS = false; If A and B are both target AS, we fall through to the code which is

[PATCH] D124396: [HIP] Fix diag msg about sanitizer

2022-04-25 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Wording nit. LGTM otherwise. Comment at: clang/include/clang/Basic/DiagnosticDriverKinds.td:113-114 def warn_drv_unsupported_option_for_offload_arch_req_feature : Warning< - "ign

[PATCH] D124292: [OpenMP] Use CUDA's non-RDC mode when LTO has whole program visibility

2022-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a minor test nit. Comment at: clang/test/Driver/linker-wrapper.c:41 -// LTO: ptxas{{.*}}-m64 -o {{.*}}.cubin -O2 --gpu-name sm_70 -c {{.*}}.s -// LTO: nvlink{{.*}}-m64

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: rnk. tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6225 if (IsCuda || IsHIP) { -if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)) +if (Args.hasArg(options::OPT_fno_gpu_rdc) && IsCudaDevice &

[PATCH] D124189: [CUDA][HIP] Externalize kernels with internal linkage

2022-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM overal, with few test nits. Comment at: clang/lib/AST/ASTContext.cpp:12300 + (D->hasAttr() && + GetGVALinkageForFunction(cast(D), +

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6223-6224 if (IsCuda || IsHIP) { -if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)) +if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) || +

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6223-6224 if (IsCuda || IsHIP) { -if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)) +if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) || +

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-04-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6223-6224 if (IsCuda || IsHIP) { -if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)) +if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false) || +

[PATCH] D124189: [CUDA][HIP] Externalize kernels with internal linkage

2022-04-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/AST/ASTContext.cpp:12300 + (D->hasAttr() && + GetGVALinkageForFunction(cast(D), + /*IgnoreCUDAGlobalAttr=*/true) == yaxunl wrote: > tra wrote: > > yaxunl wrote: >

[PATCH] D124189: [CUDA][HIP] Externalize kernels with internal linkage

2022-04-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/AST/ASTContext.cpp:12300 + (D->hasAttr() && + GetGVALinkageForFunction(cast(D), + /*IgnoreCUDAGlobalAttr=*/true) == yaxunl wrote: > tra wrote: > > Perhaps we don't

[PATCH] D124189: [CUDA][HIP] Externalize kernels with internal linkage

2022-04-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/AST/ASTContext.cpp:11322 + } else if (Context.getLangOpts().CUDA && Context.getLangOpts().CUDAIsDevice && + !IgnoreCUDAGlobalAttr) { // Device-side functions with __global__ attribute must always be -

[PATCH] D122734: [CUDA][HIP] Fix mangling number for local struct

2022-04-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/AST/ASTContext.cpp:11760-11763 + auto Cutoff = [](unsigned V) { return V > 1 ? V : 1; }; + if (CUDANameMangleCtx.MangleDeviceNameInHostCompilation) +return Cutoff(Res >> 16); + return Cutoff(Res & 0x); r

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-04-19 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Thank you for adding the compilation pipeline tests. LGTM overall. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:6223-6226 if (Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false)) CmdArgs.push_back("-fgpu-rdc"); +if (Ar

[PATCH] D123812: [CUDA] Add wrapper code generation for registering CUDA images

2022-04-14 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:351 +/// required for texture / surface / managed variables. +Function *createRegisterGlobalsFunction(Module &M) { + LLVMContext &C = M.getContext(); jhuber6 wrote:

[PATCH] D123812: [CUDA] Add wrapper code generation for registering CUDA images

2022-04-14 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/tools/clang-linker-wrapper/OffloadWrapper.cpp:351 +/// required for texture / surface / managed variables. +Function *createRegisterGlobalsFunction(Module &M) { + LLVMContext &C = M.getContext(); Do you think ge

[PATCH] D122734: [CUDA][HIP] Fix mangling number for local struct

2022-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/AST/ASTContext.cpp:11760-11763 + auto Cutoff = [](unsigned V) { return V > 1 ? V : 1; }; + if (CUDANameMangleCtx.MangleDeviceNameInHostCompilation) +return Cutoff(Res >> 16); + return Cutoff(Res & 0x); I

[PATCH] D123441: [CUDA][HIP] Fix host used external kernel in archive

2022-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. In D123441#3446499 , @yaxunl wrote: > You are talking about a use case that actually needs --whole-archive option. > If the main TU does not reference some

[PATCH] D123441: [CUDA][HIP] Fix host used external kernel in archive

2022-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > This approach will only link in kernels and device variables used by host code In the absence of the explicit reference info from the host side, GPU-side linker must link all objects with symbols that **may** be used by the host. E.g if we have a library with three object

[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

2022-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. I've mentioned in D123441 that it would be useful to have a list of GPU-side symbols needed by the host and this offload info is pretty close to what we need. The only remaining feature is being able to extract them by external tool, so we

[PATCH] D123441: [CUDA][HIP] Fix host used external kernel in archive

2022-04-12 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM in principle. This will keep around the GPU code we do need. That said, it seems to be a rather blunt hammer. I think we'll end up linking almost everything in an archive into the final executable as we'll likely have a host-visible symbol in most of the GPU objects (e

[PATCH] D123353: [CUDA][HIP] Externalize kernels in anonymous name space

2022-04-08 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM overall. Comment at: clang/test/CodeGenCUDA/kernel-in-anon-ns.cu:13 + +// CHECK: define weak_odr {{.*}}void @[[KERN:_ZN12_GLOBAL__N_16kernelEv\.anon\..*]]( +// CHECK: @[[STR:

[PATCH] D123353: [CUDA][HIP] Externalize kernels in anonymous name space

2022-04-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/test/CodeGenCUDA/kernel-in-anon-ns.cu:13 + +// CHECK: define weak_odr {{.*}}void @[[KERN:_ZN12_GLOBAL__N_16kernelEv\.anon\..*]]( +// CHECK: @[[STR:.*]] = {{.*}} c"[[KERN]]\00" Will the externalized names be uniquified

[PATCH] D122734: [HIP] Fix mangling number for local struct

2022-04-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D122734#3435086 , @yaxunl wrote: > This patch takes a similar approach as https://reviews.llvm.org/D69322 has > done for lambda. When doing host compilation for CUDA/HIP on Windows with > MSVC toolchain, mangling number of lambda

[PATCH] D122734: [HIP] Fix mangling number for local struct

2022-04-06 Thread Artem Belevich via Phabricator via cfe-commits
tra added subscribers: rnk, rsmith. tra added a comment. @rnk, @rsmith -- PTAL. Changing mangling for CXXABI on Windows is way out of my comfort zone. > This patch uses Itanium mangling number for structs in HIP host compilation > on Windows to fix this issue. It does not appear to be HIP-sp

[PATCH] D122846: [CUDA] Don't call inferCUDATargetForImplicitSpecialMember too early.

2022-03-31 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGfe528e721633: [CUDA] Don't call inferCUDATargetForImplicitSpecialMember too early. (authored by tra). Repository: rG LLVM Github Monorepo CHANGES

[PATCH] D122846: [CUDA] Don't call inferCUDATargetForImplicitSpecialMember too early.

2022-03-31 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added a reviewer: yaxunl. Herald added a subscriber: bixia. Herald added a project: All. tra requested review of this revision. Herald added a project: clang. Otherwise we may crash because the special member has not been sufficiently set up yet. Fixes #54537 Rep

[PATCH] D122069: [Clang] Add binary format for bundling offloading metadata

2022-03-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D122069#3397560 , @jhuber6 wrote: > I was definitely thinking about a lot of alternatives to making yet another > on-disk binary format. I had a few discussions with @JonChesterfield on the > subject. There were two things that p

[PATCH] D122069: [Clang] Add binary format for bundling offloading metadata

2022-03-21 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > I decided to use a custom format rather than using an existing format (ELF, > JSON, etc) because of the specialty use-case of this. Will we ever need to process the files with tools built with a different LLVM version? E.g clang and lld may be built separately from differ

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-03-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Basic/Cuda.h:105-107 +constexpr CudaArch DefaultCudaArch = CudaArch::SM_35; +constexpr CudaArch DefaultHIPArch = CudaArch::GFX803; + Nit: those could be just enum values. ``` ... LAST, DefaultCudaAr

[PATCH] D121765: [CUDA][HIP] Fix hostness check with -fopenmp

2022-03-16 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: rsmith. tra added a comment. @rsmith - this touches generic Sema functionality and could use your input. Overall, the patch looks OK to me. Comment at: clang/include/clang/Sema/Sema.h:3327-3328 + /// a pointer to the function or lambda decl for the fun

[PATCH] D120132: [HIP] Fix HIP include path

2022-03-09 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. I still don't quite understand your reluctance to use `AddClangSystemIncludeArgs` for adding HIP path, but if this change works for HIP, I'm fine with it. Also, thank you for fixing the issue with

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-03-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:4132-4134 + Archs.insert(CudaArchToString(CudaArch::SM_35)); +else if (Kind == Action::OFK_HIP) + Archs.insert(CudaArchToString(CudaArch::GFX803)); JonChesterfield wrote: > tra wrote

[PATCH] D120272: [CUDA] Add driver support for compiling CUDA with the new driver

2022-03-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:4107 + options::OPT_no_offload_arch_EQ)) { +C.getDriver().Diag(diag::err_opt_not_valid_with_opt) << "--offload-arch" + << "--offl

[PATCH] D120911: [CUDA][HIP] Fix offloading kind for linking C++ programs

2022-03-03 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. We should probably also check what happens when we specify compilation language explicitly: E.g. `clang -x cuda a.cu -x c++ b.cc`, `clang -x cuda a.cu b.cc` and `clang a.cu -x cuda b.cc`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120911/new/ https://review

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-03-02 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D117887#3353653 , @jchlanda wrote: > I went with the web interface as described here: > https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface > > with `git diff -U99 ...` didn't want to bite the bu

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-03-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Missing clang-side changes have landed. Please check. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D117887/new/ https://reviews.llvm.org/D117887 ___ cfe-commits mailing list cfe-comm

[PATCH] D120132: [HIP] Fix HIP include path

2022-03-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D120132#3351999 , @yaxunl wrote: > In D120132#3351853 , @tra wrote: > >> In D120132#3351391 , @yaxunl wrote: >> >>> >> >> >> >>> If any input file

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-03-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D117887#3351257 , @jchlanda wrote: > @tra thank you for landing the patches, it seems that the clang part (builtin > declarations and tests) have been dropped, only `llvm` dir changes made it > through. Is there any way I could f

[PATCH] D120132: [HIP] Fix HIP include path

2022-03-01 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D120132#3351391 , @yaxunl wrote: > > If any input file is HIP program, clang driver will use HIP offload kind for > all inputs. This behavior is similar as cuda-clang. I do not think this is the case as illustrated by the exa

[PATCH] D120132: [HIP] Fix HIP include path

2022-02-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D120132#3349936 , @yaxunl wrote: > Users may use clang driver to compile HIP program and C++ program with one > clang driver invocation, e.g. > > clang --offload-arch=gfx906 a.hip b.cpp > > Clang driver will create job actions f

[PATCH] D120132: [HIP] Fix HIP include path

2022-02-28 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D120132#3345534 , @yaxunl wrote: > I just found one issue with the current patch. It adds HIP include path for > non-HIP programs. > > We should only add HIP include path for JobAction with HIP offloading kind. > However, AddClan

[PATCH] D120499: [NVPTX] Fix nvvm.match.sync*.i64 intrinsics return type (i64 -> i32)

2022-02-28 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D120499/new/ https://reviews.llvm.org/D120499 ___ cfe-

[PATCH] D120132: [HIP] Fix HIP include path

2022-02-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:486 -void RocmInstallationDetector::AddHIPIncludeArgs(const ArgList &DriverArgs, - ArgStringList &CC1Args) const { +std::string RocmInstallationDetecto

[PATCH] D120499: [NVPTX] Fix nvvm.match.sync*.i64 intrinsics return type (i64 -> i32)

2022-02-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Good catch. Thank you for the fix. Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:477 TARGET_BUILTIN(__nvvm_match_any_sync_i32, "UiUiUi", "", PTX60) -TARGET_BUILTIN(__nvvm_match_any_sync_i64, "WiUiWi", "", PTX60) +TARGET_BUILTIN(__nvvm_match_any_s

[PATCH] D120132: [HIP] Fix HIP include path

2022-02-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:531-532 + DriverArgs.hasArg(options::OPT_nostdlibinc)) { +CC1Args.push_back("-internal-isystem"); +CC1Args.push_back(HipIncludePath); + } yaxunl wrote: > tra wrote: > > ya

[PATCH] D119157: [NVPTX] Add ex2 f16 support

2022-02-23 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG69a8350c232a: [NVPTX] Add ex2.approx.f16/f16x2 support (authored by npmiller, committed by tra). Changed prior to commit: https://reviews.llvm.org

[PATCH] D118977: [NVPTX] Add more FMA intriniscs/builtins

2022-02-23 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGbe672934ff88: [NVPTX] Add more FMA intriniscs/builtins (authored by jchlanda, committed by tra). Changed prior to commit: https://reviews.llvm.org

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-02-23 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGe0dc4ac28f00: [NVPTX] Expose float tys min, max, abs, neg as builtins (authored by jchlanda, committed by tra). Changed prior to commit: https://r

[PATCH] D120298: [HIP] Support `-fgpu-default-stream`

2022-02-23 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM with a minor nit. > Also -DHIP_API_PER_THREAD_DEFAULT_STREAM is passed to clang -cc1 to enable > other per-thread stream You may want to rephrase patch description it a bit to match the latest

[PATCH] D120132: [HIP] Fix HIP include path

2022-02-23 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:531-532 + DriverArgs.hasArg(options::OPT_nostdlibinc)) { +CC1Args.push_back("-internal-isystem"); +CC1Args.push_back(HipIncludePath); + } yaxunl wrote: > tra wrote: > > My

[PATCH] D120132: [HIP] Fix HIP include path

2022-02-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:531-532 + DriverArgs.hasArg(options::OPT_nostdlibinc)) { +CC1Args.push_back("-internal-isystem"); +CC1Args.push_back(HipIncludePath); + } My impression, after reading the

[PATCH] D120298: [HIP] Support `--default-stream`

2022-02-22 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:962 NegFlag>; +def default_stream_EQ : Joined<["--"], "default-stream=">, + HelpText<"Specify default stream. Valid values are 'legacy' and 'per-thread'. The default value is 'legacy'. (HIP only)">,

[PATCH] D118977: [NVPTX] Add more FMA intriniscs/builtins

2022-02-17 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D118977#3329146 , @jchlanda wrote: > @tra I've fixed the test failure (`math-intrins.ll`) the rest seems to be > unrelated timeouts, Thank you. > would you be able to merge those patches in, as I don't have the commit > access

[PATCH] D120070: [HIP] Support linking archive of bundled bitcode

2022-02-17 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. LGTM. Comment at: clang/test/Driver/hip-link-bundle-archive.hip:3 + +// RUN: touch %T/libhipBundled.a + Is this file necessary? `clang -###` should not need the fi

[PATCH] D119615: [CUDA][HIP] Do not promote constexpr var with non-constant initializer

2022-02-12 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Sema/SemaCUDA.cpp:148-150 + if ((Var->isConstexpr() || Var->getType().isConstQualified()) && + Var->hasAttr() && !hasExplicitAttr(Var)) -

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-02-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:162 SimplifyAction(Instruction::BinaryOps BinaryOp, FtzRequirementTy FtzReq) : BinaryOp(BinaryOp), FtzRequirement(FtzReq) {} jchlanda wrote: > tra wrote: > >

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-02-09 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp:162 SimplifyAction(Instruction::BinaryOps BinaryOp, FtzRequirementTy FtzReq) : BinaryOp(BinaryOp), FtzRequirement(FtzReq) {} The new 3-argument constructor a

[PATCH] D118977: [NVPTX] Add more FMA intriniscs/builtins

2022-02-08 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: llvm/lib/Target/NVPTX/NVPTXIntrinsics.td:937 +class FMA_TUPLE Preds = [hasPTX70, hasSM80]> { + string Variant = V; I think the default should be the most useful/common and the least surprising value. I'd argue that in thi

[PATCH] D118977: [NVPTX] Add more FMA intriniscs/builtins

2022-02-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D118977#3299974 , @jchlanda wrote: >> Target ISA Notes >> Requires sm_53 or higher. I think we do need this constraint applied to the new builtins, too. Right now nothing stops using them on a GPU where they do not exist and that

[PATCH] D119157: [NVPTX] Add ex2 f16 support

2022-02-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. Is the patch is ready for review? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D119157/new/ https://reviews.llvm.org/D119157 ___ cfe-commits mailing list cfe-commits@lists.llvm.org h

[PATCH] D119026: [HIP] Emit amdgpu_code_object_version module flag

2022-02-07 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1166 CmdArgs.insert(CmdArgs.begin() + 1, "-mllvm"); +// -cc1as does not need -mcode-object-version option. +if (!

[PATCH] D119026: [HIP] Emit amdgpu_code_object_version module flag

2022-02-07 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/include/clang/Driver/Options.td:3445 def mcode_object_version_EQ : Joined<["-"], "mcode-object-version=">, Group, - HelpText<"Specify code object ABI version. Defaults to 4. (AMDGPU only)">, - MetaVarName<"">, Values<"2,3,4,5">; +

[PATCH] D119026: [HIP] Emit amdgpu_code_object_version module flag

2022-02-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:575 +// times 100. +if (getTarget().getTargetOpts().CodeObjectVersion != "none") { + unsigned CodeObjVer; yaxunl wrote: > tra wrote: > > When will it ever be set to `none`? Do

[PATCH] D119026: [HIP] Emit amdgpu_code_object_version module flag

2022-02-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:575 +// times 100. +if (getTarget().getTargetOpts().CodeObjectVersion != "none") { + unsigned CodeObjVer; When will it ever be set to `none`? Does the new option parser enforc

[PATCH] D118977: [NVPTX] Add more FMA intriniscs/builtins

2022-02-04 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. > They all require PTX 7.0, SM_80. According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-fma only `fma.relu` and `bf16*` variants require ptx70/sm80: PTX ISA Notes Introduced in PTX ISA version 4.2.

[PATCH] D118949: [HIP] Support code object v5

2022-02-03 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. Few nits, LGTM in general. Comment at: clang/lib/Driver/ToolChains/ROCm.h:27 +struct DeviceLibABIVersion { + unsigned Value = 0; + DeviceLibABIVersion(unsigned V) : Value(V) {} -

[PATCH] D117887: [NVPTX] Expose float tys min, max, abs, neg as builtins

2022-02-02 Thread Artem Belevich via Phabricator via cfe-commits
tra accepted this revision. tra added a comment. This revision is now accepted and ready to land. In D117887#3289481 , @jchlanda wrote: > `ptxas` is happy with asm generated from both `math-intrins-sm86-ptx72.ll` > and `math-intrins-sm80-ptx70.ll` Thank

[PATCH] D118153: [CUDA][HIP] Do not treat host var address as constant in device compilation

2022-01-27 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: rsmith. tra added a comment. @rsmith -- is there anything else we need to worry about when it comes to treating pointers as constant values (or not)? Comment at: clang/lib/AST/ExprConstant.cpp:2227 +!Var->hasAttr() && +!Var->hasA

[PATCH] D118084: [CUDA, NVPTX] Pass byval aggregates directly

2022-01-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D118084#3272154 , @lebedev.ri wrote: > My last idea was about allowing splitting > > struct { > int a; > int b[2]; > } a; > > into > > // not in a struct anymore! > int a; > int b[2] This looks like it's a somew

[PATCH] D117137: [Driver] Add CUDA support for --offload param

2022-01-26 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D117137#3269365 , @yaxunl wrote: > Does that mean only "spirv{64}-unknown-unknown" is acceptable, or > "spirv{64}-amd-unknown-unknown" is also acceptable? My point is that `unknown` part of the triple is a catch-all for `anything

[PATCH] D118084: [CUDA, NVPTX] Pass byval aggregates directly

2022-01-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D118084#3271073 , @jdoerfert wrote: > @lebedev.ri wanted to teach SROA how to deal with dynamic indices before, > IIRC. It seems to be generally useful. Interesting. I'd like to hear more. > This patch can wait till then? Yes.

[PATCH] D118084: [CUDA, NVPTX] Pass byval aggregates directly

2022-01-25 Thread Artem Belevich via Phabricator via cfe-commits
tra planned changes to this revision. tra added a comment. Getting rid of byval helps getting rid of locals in quite a few places, but runs into a new problem. 😕 Looks like this change does have unexpected side-effects. When we need to dynamically index into a struct passed directly, there's no

[PATCH] D118023: Corrected fragment size for tf32 LD B matrix.

2022-01-25 Thread Artem Belevich via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rG0ad19a833177: [CUDA,NVPTX] Corrected fragment size for tf32 LD B matrix. (authored by JackAKirk, committed by tra). Repository: rG LLVM Github Mon

[PATCH] D118153: [CUDA][HIP] Do not treat host var address as constant in device compilation

2022-01-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. LGTM. Do we need to do anything special about `__managed__` vars? Comment at: clang/lib/AST/ExprConstant.cpp:2224 + Info.getCtx().getLangOpts().CUDAIsDevice) { +if (!Var->hasAttr() && +!Var->hasAttr() && Nit: N

[PATCH] D117137: [Driver] Add CUDA support for --offload param

2022-01-25 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. In D117137#3268548 , @linjamaki wrote: > SPIR-V target requires that the OS and the environment type is unknown (see > TargetInfo::AllocateTarget and BaseSPIRTargetInfo). The problem is that LLVM's triple parser will set `UnknownVen

[PATCH] D118084: [CUDA, NVPTX] Pass byval aggregates directly

2022-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra created this revision. tra added reviewers: jdoerfert, yaxunl. Herald added subscribers: asavonic, bixia. tra requested review of this revision. Herald added a project: clang. Changes the NVPTX ABI to pass aggregates directly. Only clang-generated IR is affected. The change does not affect AB

[PATCH] D117137: [Driver] Add CUDA support for --offload param

2022-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a subscriber: linjamaki. tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:153-155 + if (TT->getArch() == llvm::Triple::spirv64 && + TT->getVendor() == llvm::Triple::UnknownVendor && + TT->getOS() == llvm::Triple::UnknownOS)

[PATCH] D117137: [Driver] Add CUDA support for --offload param

2022-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added inline comments. Comment at: clang/lib/Driver/Driver.cpp:153-155 + if (TT->getArch() == llvm::Triple::spirv64 && + TT->getVendor() == llvm::Triple::UnknownVendor && + TT->getOS() == llvm::Triple::UnknownOS) What's expected to happen if someon

[PATCH] D118023: Corrected fragment size for tf32 LD B matrix.

2022-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. The patch uses a `@gmail.com` email. Should I change it to `JackAKirk ` to match the sign-off email? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D118023/new/ https://reviews.llvm.org/D118023 _

[PATCH] D118023: Corrected fragment size for tf32 LD B matrix.

2022-01-24 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment. If you are not sure if you have commit access, you probably do not. It must be explicitly granted: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access As for landing the patch, I do it manually with git. Never tried it with `arc`, so I can't say much, except

<    1   2   3   4   5   6   7   8   9   10   >