[clang] [CUDA][HIP] Exclude external variables from constant promotion. (PR #73549)

2023-12-05 Thread Artem Belevich via cfe-commits
@@ -104,3 +106,14 @@ void fun() { (void) b; (void) var_host_only; } + +extern __global__ void external_func(); +extern void* const external_dep[] = { Artem-B wrote: This array is nomiannly host-only entity and should not be emitted on GPU at all, IMO. In

[clang] [CUDA][HIP] Exclude external variables from constant promotion. (PR #73549)

2023-12-05 Thread Artem Belevich via cfe-commits
@@ -104,3 +106,14 @@ void fun() { (void) b; (void) var_host_only; } + +extern __global__ void external_func(); +extern void* const external_dep[] = { + (void*)(external_func) +}; +extern void* const external_arr[] = {}; + +void* host_fun() { + (void) external_dep; +

[clang] [CUDA][Win32] Add `fma(long double,..)` to math forward declares. (PR #73756)

2023-12-04 Thread Artem Belevich via cfe-commits
@@ -70,6 +70,9 @@ __DEVICE__ double floor(double); __DEVICE__ float floor(float); __DEVICE__ double fma(double, double, double); __DEVICE__ float fma(float, float, float); +#ifdef _MSC_VER +__DEVICE__ long double fma(long double, long double, long double);

[clang] [CUDA][Win32] Add `fma(long double,..)` to math forward declares. (PR #73756)

2023-12-04 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/73756 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][Win32] Add `fma(long double,..)` to math forward declares. (PR #73756)

2023-12-04 Thread Artem Belevich via cfe-commits
@@ -70,6 +70,9 @@ __DEVICE__ double floor(double); __DEVICE__ float floor(float); __DEVICE__ double fma(double, double, double); __DEVICE__ float fma(float, float, float); +#ifdef _MSC_VER +__DEVICE__ long double fma(long double, long double, long double);

[clang] [CUDA] work around more __noinline__ conflicts with libc++ (PR #74123)

2023-12-01 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B updated https://github.com/llvm/llvm-project/pull/74123 >From 71e24fc704c82c11162313613691d09b9a653bd5 Mon Sep 17 00:00:00 2001 From: Artem Belevich Date: Fri, 1 Dec 2023 10:37:08 -0800 Subject: [PATCH 1/3] [CUDA] work around more __noinline__ conflicts with libc++

[clang] [CUDA] work around more __noinline__ conflicts with libc++ (PR #74123)

2023-12-01 Thread Artem Belevich via cfe-commits
Artem-B wrote: > FWIW I am not thrilled about using `__config` here. That header is an > implementation detail of libc++ and defining it and relying on it is somewhat > brittle. I'm all for having it fixed in libc++ or in CUDA SDK. Barring that, working around the specific implementation

[clang] [CUDA] work around more __noinline__ conflicts with libc++ (PR #74123)

2023-12-01 Thread Artem Belevich via cfe-commits
Artem-B wrote: > I think we can find a solution to work around this in libc++ within a > reasonable timeframe OK. I'll hold off on landing the patch. I believe we're not blocked on it at the moment. https://github.com/llvm/llvm-project/pull/74123

[clang] [CUDA] work around more __noinline__ conflicts with libc++ (PR #74123)

2023-12-01 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B updated https://github.com/llvm/llvm-project/pull/74123 >From 71e24fc704c82c11162313613691d09b9a653bd5 Mon Sep 17 00:00:00 2001 From: Artem Belevich Date: Fri, 1 Dec 2023 10:37:08 -0800 Subject: [PATCH 1/2] [CUDA] work around more __noinline__ conflicts with libc++

[clang] [CUDA] work around more __noinline__ conflicts with libc++ (PR #74123)

2023-12-01 Thread Artem Belevich via cfe-commits
Artem-B wrote: Yes, I've mentioned that in https://github.com/llvm/llvm-project/pull/73838. However, we need something to fix the issue right now while we're figuring out a better solution. In any case `__noinline__` is unlikely to be widely used, so the wrappers may be manageable, at least

[clang] [CUDA] work around more __noinline__ conflicts with libc++ (PR #74123)

2023-12-01 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B created https://github.com/llvm/llvm-project/pull/74123 https://github.com/llvm/llvm-project/pull/73838 >From 71e24fc704c82c11162313613691d09b9a653bd5 Mon Sep 17 00:00:00 2001 From: Artem Belevich Date: Fri, 1 Dec 2023 10:37:08 -0800 Subject: [PATCH] [CUDA] work

[clang] [CUDA][HIP] allow trivial ctor/dtor in device var init (PR #73140)

2023-11-30 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/73140 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] fix stack marking for -fgpu-rdc (PR #72782)

2023-11-27 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/72782 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] ignore implicit host/device attr for override (PR #72815)

2023-11-20 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/72815 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] ignore implicit host/device attr for override (PR #72815)

2023-11-19 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM, with one question. https://github.com/llvm/llvm-project/pull/72815 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] ignore implicit host/device attr for override (PR #72815)

2023-11-19 Thread Artem Belevich via cfe-commits
@@ -1000,13 +1000,9 @@ void Sema::checkCUDATargetOverload(FunctionDecl *NewFD, // should have the same implementation on both sides. if (NewTarget != OldTarget && ((NewTarget == CFT_HostDevice && - !(LangOpts.OffloadImplicitHostDeviceTemplates && -

[clang] [CUDA][HIP] ignore implicit host/device attr for override (PR #72815)

2023-11-19 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/72815 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] make trivial ctor/dtor host device (PR #72394)

2023-11-18 Thread Artem Belevich via cfe-commits
Artem-B wrote: We've found a problem with the patch. https://godbolt.org/z/jcKo34vzG ``` template class C { explicit C() {}; }; template <> C::C() {}; ``` :6:21: error: __host__ function 'C' cannot overload __host__ __device__ function 'C' 6 | template <> C::C() {}; |

[clang] [CUDA][HIP] make trivial ctor/dtor host device (PR #72394)

2023-11-15 Thread Artem Belevich via cfe-commits
@@ -772,6 +772,26 @@ void Sema::maybeAddCUDAHostDeviceAttrs(FunctionDecl *NewD, NewD->addAttr(CUDADeviceAttr::CreateImplicit(Context)); } +// If a trivial ctor/dtor has no host/device +// attributes, make it implicitly host device function. +void

[clang] [CUDA][HIP] make trivial ctor/dtor host device (PR #72394)

2023-11-15 Thread Artem Belevich via cfe-commits
@@ -12,7 +12,7 @@ extern "C" void host_fn() {} struct Dummy {}; struct S { - S() {} + S() { x = 1; } Artem-B wrote: Can we make the purpose of non-trivial constructor more descriptive, here and in other places? E.g. `S() { static int nontrivial_ctor = 1;

[clang] [CUDA][HIP] make trivial ctor/dtor host device (PR #72394)

2023-11-15 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM with a couple of nits. https://github.com/llvm/llvm-project/pull/72394 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] make trivial ctor/dtor host device (PR #72394)

2023-11-15 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/72394 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-14 Thread Artem Belevich via cfe-commits
Artem-B wrote: > Nvidia backend doesn't handle scoped atomics at all yet Yeah, it's on my ever growing todo. :-( https://github.com/llvm/llvm-project/pull/72280 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] [Clang] Introduce scoped variants of GNU atomic functions (PR #72280)

2023-11-14 Thread Artem Belevich via cfe-commits
Artem-B wrote: Just a FYI, that recent NVIDIA GPUs have introduced a concept of [thread block cluster](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-block-clusters). We may need another level of granularity between the block and device.

[compiler-rt] [clang] [llvm] [HIP] support 128 bit int division (PR #71978)

2023-11-13 Thread Artem Belevich via cfe-commits
Artem-B wrote: > I don't think we're in a position to actually enable that at this time. We > still don't have everything necessary to provide object linking, which this > seems to rely on OK. IR it is. https://github.com/llvm/llvm-project/pull/71978

[compiler-rt] [clang] [llvm] [HIP] support 128 bit int division (PR #71978)

2023-11-10 Thread Artem Belevich via cfe-commits
Artem-B wrote: Would it be feasible to consider switching to the new offloading driver mode and really link with the library instead? It may be a conveniently isolated use case with little/no existing users that would disrupt. https://github.com/llvm/llvm-project/pull/71978

[clang] [CUDA][HIP] Make template implicitly host device (PR #70369)

2023-11-09 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/70369 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] Make template implicitly host device (PR #70369)

2023-11-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: Now that we're making an even larger class of functions implicitly HD, the last logical step would be to make *all* unattributed functions implicitly HD, too (in a separate patch). After all, a template is as GPU-portable (or not) as a regular function. Unlike constexpr or

[openmp] [llvm] [clang] ReworkCtorDtor (PR #71739)

2023-11-08 Thread Artem Belevich via cfe-commits
@@ -95,7 +95,7 @@ using namespace llvm; static cl::opt LowerCtorDtor("nvptx-lower-global-ctor-dtor", cl::desc("Lower GPU ctor / dtors to globals on the device."), - cl::init(false), cl::Hidden); + cl::init(true),

[clang] [llvm] [openmp] ReworkCtorDtor (PR #71739)

2023-11-08 Thread Artem Belevich via cfe-commits
@@ -95,7 +95,7 @@ using namespace llvm; static cl::opt LowerCtorDtor("nvptx-lower-global-ctor-dtor", cl::desc("Lower GPU ctor / dtors to globals on the device."), - cl::init(false), cl::Hidden); + cl::init(true),

[clang] [CUDA][HIP] Fix deduction guide (PR #69366)

2023-10-30 Thread Artem Belevich via cfe-commits
Artem-B wrote: @ldionne - Can you take a look if that would have unintended consequences for libc++? https://github.com/llvm/llvm-project/pull/69366 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] [CUDA][HIP] Fix std::is_invocable (PR #70369)

2023-10-27 Thread Artem Belevich via cfe-commits
@@ -283,12 +283,18 @@ set(cuda_wrapper_files cuda_wrappers/cmath cuda_wrappers/complex cuda_wrappers/new + cuda_wrappers/type_traits ) set(cuda_wrapper_bits_files cuda_wrappers/bits/shared_ptr_base.h cuda_wrappers/bits/basic_string.h

[clang] [CUDA][HIP] Fix std::is_invocable (PR #70369)

2023-10-26 Thread Artem Belevich via cfe-commits
@@ -283,12 +283,18 @@ set(cuda_wrapper_files cuda_wrappers/cmath cuda_wrappers/complex cuda_wrappers/new + cuda_wrappers/type_traits ) set(cuda_wrapper_bits_files cuda_wrappers/bits/shared_ptr_base.h cuda_wrappers/bits/basic_string.h

[clang] [NVPTX] Fixed some wmma store builtins that had non-const src param. (PR #69354)

2023-10-18 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/69354 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] Document func ptr and virtual func (PR #68126)

2023-10-18 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/68126 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] Fix init var diag in temmplate (PR #69081)

2023-10-16 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/69081 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Let clang-cl support CUDA/HIP (PR #68921)

2023-10-12 Thread Artem Belevich via cfe-commits
Artem-B wrote: @rnk -- would that be acceptable for clang-cl on windows? https://github.com/llvm/llvm-project/pull/68921 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-09 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B closed https://github.com/llvm/llvm-project/pull/67866 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-09 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B closed https://github.com/llvm/llvm-project/pull/67866 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: clang-format failure on GitHub is weird -- it just silently exits with an error. I ran the same command locally and fixed one place it was not happy about. The buildkite failure somewhere in RISC-V appears to be unrelated. https://github.com/llvm/llvm-project/pull/67866

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: clang-format failure on GitHub is weird -- it just silently exits with an error. I ran the same command locally and fixed one place it was not happy about. The buildkite failure somewhere in RISC-V appears to be unrelated. https://github.com/llvm/llvm-project/pull/67866

[clang-tools-extra] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
Artem-B wrote: Found another issue. We merge four independent byte loads with `align 1` into a 32-bit load, which fails at runtime on misaligned pointers. ``` %t0 = type { [17 x i8] } @shared_storage = linkonce_odr local_unnamed_addr addrspace(3) global %t0 undef, align 1 define <4 x i8>

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
Artem-B wrote: Found another issue. We merge four independent byte loads with `align 1` into a 32-bit load, which fails at runtime on misaligned pointers. ``` %t0 = type { [17 x i8] } @shared_storage = linkonce_odr local_unnamed_addr addrspace(3) global %t0 undef, align 1 define <4 x i8>

[clang-tools-extra] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,1248 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3 +; ## Support i16x2 instructions +; RUN: llc < %s -mtriple=nvptx64-nvidia-cuda -mcpu=sm_90 -mattr=+ptx80 \ +; RUN: -O0 -disable-post-ra

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,1248 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3 +; ## Support i16x2 instructions +; RUN: llc < %s -mtriple=nvptx64-nvidia-cuda -mcpu=sm_90 -mattr=+ptx80 \ +; RUN: -O0 -disable-post-ra

[clang-tools-extra] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
Artem-B wrote: > > I see one suspicious failure in tensorflow tests. I suspect I've messed > > something up in v4i8 comparison. > > Yup, there is a problem: > > ``` > Successfully custom legalized node > ... replacing: t10: v4i8 = BUILD_VECTOR Constant:i16<-128>, > Constant:i16<-128>,

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
Artem-B wrote: > > I see one suspicious failure in tensorflow tests. I suspect I've messed > > something up in v4i8 comparison. > > Yup, there is a problem: > > ``` > Successfully custom legalized node > ... replacing: t10: v4i8 = BUILD_VECTOR Constant:i16<-128>, > Constant:i16<-128>,

[clang-tools-extra] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
@@ -2150,58 +2179,94 @@ NVPTXTargetLowering::LowerCONCAT_VECTORS(SDValue Op, SelectionDAG ) const { return DAG.getBuildVector(Node->getValueType(0), dl, Ops); } -// We can init constant f16x2 with a single .b32 move. Normally it +// We can init constant f16x2/v2i16/v4i8

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
@@ -2150,58 +2179,94 @@ NVPTXTargetLowering::LowerCONCAT_VECTORS(SDValue Op, SelectionDAG ) const { return DAG.getBuildVector(Node->getValueType(0), dl, Ops); } -// We can init constant f16x2 with a single .b32 move. Normally it +// We can init constant f16x2/v2i16/v4i8

[clang-tools-extra] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B deleted https://github.com/llvm/llvm-project/pull/67866 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B deleted https://github.com/llvm/llvm-project/pull/67866 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
@@ -2150,58 +2179,94 @@ NVPTXTargetLowering::LowerCONCAT_VECTORS(SDValue Op, SelectionDAG ) const { return DAG.getBuildVector(Node->getValueType(0), dl, Ops); } -// We can init constant f16x2 with a single .b32 move. Normally it +// We can init constant f16x2/v2i16/v4i8

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
@@ -2150,58 +2179,94 @@ NVPTXTargetLowering::LowerCONCAT_VECTORS(SDValue Op, SelectionDAG ) const { return DAG.getBuildVector(Node->getValueType(0), dl, Ops); } -// We can init constant f16x2 with a single .b32 move. Normally it +// We can init constant f16x2/v2i16/v4i8

[clang-tools-extra] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
Artem-B wrote: > I see one suspicious failure in tensorflow tests. I suspect I've messed > something up in v4i8 comparison. Yup, there is a problem: ``` Successfully custom legalized node ... replacing: t10: v4i8 = BUILD_VECTOR Constant:i16<-128>, Constant:i16<-128>, Constant:i16<-128>,

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
Artem-B wrote: > I see one suspicious failure in tensorflow tests. I suspect I've messed > something up in v4i8 comparison. Yup, there is a problem: ``` Successfully custom legalized node ... replacing: t10: v4i8 = BUILD_VECTOR Constant:i16<-128>, Constant:i16<-128>, Constant:i16<-128>,

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
Artem-B wrote: I see one suspicious failure in tensorflow tests. I suspect I've messed something up in v4i8 comparison. https://github.com/llvm/llvm-project/pull/67866 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] [NVPTX] Improve lowering of v4i8 (PR #67866)

2023-10-06 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,1248 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3 +; ## Support i16x2 instructions +; RUN: llc < %s -mtriple=nvptx64-nvidia-cuda -mcpu=sm_90 -mattr=+ptx80 \ +; RUN: -O0 -disable-post-ra

[clang] [HIP] Document func ptr and virtual func (PR #68126)

2023-10-04 Thread Artem Belevich via cfe-commits
@@ -176,3 +176,65 @@ Predefined Macros * - ``HIP_API_PER_THREAD_DEFAULT_STREAM`` - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated. +Function Pointers Support in Clang with HIP +=== + +Function pointers' support

[clang] [HIP] Document func ptr and virtual func (PR #68126)

2023-10-04 Thread Artem Belevich via cfe-commits
@@ -176,3 +176,65 @@ Predefined Macros * - ``HIP_API_PER_THREAD_DEFAULT_STREAM`` - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated. +Function Pointers Support in Clang with HIP +=== + +Function pointers' support

[clang] [CUDA][HIP] Fix host/device context in concept (PR #67721)

2023-10-04 Thread Artem Belevich via cfe-commits
@@ -176,3 +176,34 @@ Predefined Macros * - ``HIP_API_PER_THREAD_DEFAULT_STREAM`` - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated. +C++20 Concepts with HIP and CUDA + + +In Clang, when working with HIP or CUDA, it's

[clang] [NVPTX] Add support for maxclusterrank in launch_bounds (PR #66496)

2023-09-25 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/66496 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Add support for maxclusterrank in launch_bounds (PR #66496)

2023-09-22 Thread Artem Belevich via cfe-commits
@@ -537,59 +537,46 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const Function , raw_ostream ) const { // If the NVVM IR has some of reqntid* specified, then output // the reqntid directive, and set the

[clang] [NVPTX] Add support for maxclusterrank in launch_bounds (PR #66496)

2023-09-22 Thread Artem Belevich via cfe-commits
@@ -537,59 +537,46 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const Function , raw_ostream ) const { // If the NVVM IR has some of reqntid* specified, then output // the reqntid directive, and set the

[clang] [NVPTX] Add support for maxclusterrank in launch_bounds (PR #66496)

2023-09-22 Thread Artem Belevich via cfe-commits
@@ -5607,6 +5607,21 @@ bool Sema::CheckRegparmAttr(const ParsedAttr , unsigned ) { return false; } +// Helper to get CudaArch. +static CudaArch getCudaArch(const TargetInfo ) { Artem-B wrote: You may need to verify that `TI->getTriple()->isNVPTX()` before

[clang] [NVPTX] Add support for maxclusterrank in launch_bounds (PR #66496)

2023-09-22 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B resolved https://github.com/llvm/llvm-project/pull/66496 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Add support for maxclusterrank in launch_bounds (PR #66496)

2023-09-22 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B resolved https://github.com/llvm/llvm-project/pull/66496 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX] Add support for maxclusterrank in launch_bounds (PR #66496)

2023-09-21 Thread Artem Belevich via cfe-commits
@@ -537,59 +537,46 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const Function , raw_ostream ) const { // If the NVVM IR has some of reqntid* specified, then output // the reqntid directive, and set the

[clang] [NVPTX] Add support for maxclusterrank in launch_bounds (PR #66496)

2023-09-21 Thread Artem Belevich via cfe-commits
@@ -12,7 +12,7 @@ __launch_bounds__(0x1) void TestWayTooBigArg(void); // expected- __launch_bounds__(-128, 7) void TestNegArg1(void); // expected-warning {{'launch_bounds' attribute parameter 0 is negative and will be ignored}} __launch_bounds__(128, -7) void

[clang] [NVPTX] Add support for maxclusterrank in launch_bounds (PR #66496)

2023-09-21 Thread Artem Belevich via cfe-commits
@@ -5607,6 +5607,21 @@ bool Sema::CheckRegparmAttr(const ParsedAttr , unsigned ) { return false; } +// Helper to get CudaArch. +static CudaArch getCudaArch(const TargetInfo ) { Artem-B wrote: Considering that we do have TargetInfo pointer here, instead of

[clang] [NVPTX] Add support for maxclusterrank in launch_bounds (PR #66496)

2023-09-21 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,45 @@ +// RUN: %clang_cc1 -std=c++11 -fsyntax-only -triple nvptx-unknown-unknown -target-cpu sm_90 -verify %s + +#include "Inputs/cuda.h" + +__launch_bounds__(128, 7) void Test2Args(void); +__launch_bounds__(128) void Test1Arg(void); + +__launch_bounds__(0x)

[clang] [NVPTX] Add support for maxclusterrank in launch_bounds (PR #66496)

2023-09-21 Thread Artem Belevich via cfe-commits
@@ -5650,34 +5665,51 @@ static Expr *makeLaunchBoundsArgExpr(Sema , Expr *E, CUDALaunchBoundsAttr * Sema::CreateLaunchBoundsAttr(const AttributeCommonInfo , Expr *MaxThreads, - Expr *MinBlocks) { - CUDALaunchBoundsAttr TmpAttr(Context, CI,

[clang] [NVPTX] Add support for maxclusterrank in launch_bounds (PR #66496)

2023-09-21 Thread Artem Belevich via cfe-commits
@@ -11836,6 +11836,10 @@ def err_sycl_special_type_num_init_method : Error< "types with 'sycl_special_class' attribute must have one and only one '__init' " "method defined">; +def warn_cuda_maxclusterrank_sm_90 : Warning< + "maxclusterrank requires sm_90 or higher,

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-18 Thread Artem Belevich via cfe-commits
@@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); +

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-18 Thread Artem Belevich via cfe-commits
@@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); +

[clang] [Driver][NVPTX] Add a warning that device debug info does not work with optimizations (PR #65327)

2023-09-18 Thread Artem Belevich via cfe-commits
@@ -413,13 +413,25 @@ void NVPTX::Assembler::ConstructJob(Compilation , const JobAction , // TODO: Perhaps we should map host -O2 to ptxas -O3. -O3 is ptxas's // default, so it may correspond more closely to the spirit of clang -O2. +bool noOptimization =

[clang] [Driver][NVPTX] Add a warning that device debug info does not work with optimizations (PR #65327)

2023-09-18 Thread Artem Belevich via cfe-commits
@@ -28,6 +28,17 @@ // RUN: --offload-arch=sm_35 --cuda-path=%S/Inputs/CUDA/usr/local/cuda \ // RUN: | FileCheck -check-prefixes=CHECK,ARCH64,SM35,RDC %s +// Compiling -O{1,2,3,4,fast,s,z} with -g does not pass -g debug info to ptxas. +// NOTE: This is because ptxas does not

[clang] [clang-repl][CUDA] Move CUDA module registration to beginning of global_ctors (PR #66658)

2023-09-18 Thread Artem Belevich via cfe-commits
@@ -794,7 +794,7 @@ void CodeGenModule::Release() { AddGlobalCtor(ObjCInitFunction); if (Context.getLangOpts().CUDA && CUDARuntime) { if (llvm::Function *CudaCtorFunction = CUDARuntime->finalizeModule()) - AddGlobalCtor(CudaCtorFunction); +

[clang] [HIP] Fix comdat of template kernel handle (PR #66283)

2023-09-13 Thread Artem Belevich via cfe-commits
@@ -43,6 +44,9 @@ __global__ void kernelfunc() {} __global__ void kernel_decl(); +template +__global__ void temp_kernel_decl(T x); Artem-B wrote: Nit: rename temp -> template? `temp` is strongly associated with 'temporary'.

[clang] [HIP] Fix comdat of template kernel handle (PR #66283)

2023-09-13 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/66283 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [HIP] Fix comdat of template kernel handle (PR #66283)

2023-09-13 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/66283 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Work around two more instances of __noinline__ conflicts. (PR #66138)

2023-09-12 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B created https://github.com/llvm/llvm-project/pull/66138: https://github.com/llvm/llvm-project/issues/57544 >From 91c9d12e8f71cd55c877f80a0820615531cb62bd Mon Sep 17 00:00:00 2001 From: Artem Belevich Date: Tue, 12 Sep 2023 11:47:17 -0700 Subject: [PATCH] Work around

[clang] Work around two more instances of __noinline__ conflicts. (PR #66138)

2023-09-12 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B review_requested https://github.com/llvm/llvm-project/pull/66138 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Work around two more instances of __noinline__ conflicts. (PR #66138)

2023-09-12 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B review_requested https://github.com/llvm/llvm-project/pull/66138 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] Do not mark extern shared var (PR #65990)

2023-09-11 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM. Thank you! https://github.com/llvm/llvm-project/pull/65990 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Reland "[CUDA][HIP] Fix overloading resolution in global variable iniā€¦ (PR #65606)

2023-09-07 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM. I'm still figuring out the github-based workflow. One thing that may be useful in the future would be to start the pull request branch with the original/reverted commit and put the updates into separate commits, so one could see

[clang] 2a702ec - Use unsigned types for __popc/__popcll to match their declarations in CUDA headers.

2023-09-05 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-09-05T16:02:42-07:00 New Revision: 2a702eca3efa066e3a470cd3b17082a05e118c91 URL: https://github.com/llvm/llvm-project/commit/2a702eca3efa066e3a470cd3b17082a05e118c91 DIFF:

[clang] 8f8df78 - Added missing test constraints.

2023-08-18 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-08-18T11:39:11-07:00 New Revision: 8f8df788aefaf9c947f0b8768ebca45176c7e9ee URL: https://github.com/llvm/llvm-project/commit/8f8df788aefaf9c947f0b8768ebca45176c7e9ee DIFF:

[clang] 7275734 - [CUDA/NVPTX] Improve handling of memcpy for -Os compilations.

2023-08-18 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-08-18T11:27:36-07:00 New Revision: 72757343fa866b7bfcbaa67edad895297c8cb2c5 URL: https://github.com/llvm/llvm-project/commit/72757343fa866b7bfcbaa67edad895297c8cb2c5 DIFF:

[clang] f05b58a - [clang] Support '-fgpu-default-stream=per-thread' for NVIDIA CUDA

2023-07-13 Thread Artem Belevich via cfe-commits
Author: boxu.zhang Date: 2023-07-13T16:54:57-07:00 New Revision: f05b58a9468cc2990678e06bc51df56b30344807 URL: https://github.com/llvm/llvm-project/commit/f05b58a9468cc2990678e06bc51df56b30344807 DIFF: https://github.com/llvm/llvm-project/commit/f05b58a9468cc2990678e06bc51df56b30344807.diff

[clang] 0f49116 - [CUDA] Update Kepler(sm_3*) support info.

2023-06-02 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-06-02T14:16:13-07:00 New Revision: 0f49116e261cf5a156221b006acb677e3565fd1a URL: https://github.com/llvm/llvm-project/commit/0f49116e261cf5a156221b006acb677e3565fd1a DIFF:

[clang] 6cdc07a - [CUDA] correctly install cuda_wrappers/bits/shared_ptr_base.h

2023-05-30 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-05-30T10:44:33-07:00 New Revision: 6cdc07a701eec08da450be58d6e1b67428a983dd URL: https://github.com/llvm/llvm-project/commit/6cdc07a701eec08da450be58d6e1b67428a983dd DIFF:

[clang] df1b2be - [CUDA] Explicitly construct dim3() return values.

2023-05-25 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-05-25T12:41:25-07:00 New Revision: df1b2bef0c7cad11681a02e9e2f816b27fb480a6 URL: https://github.com/llvm/llvm-project/commit/df1b2bef0c7cad11681a02e9e2f816b27fb480a6 DIFF:

[clang] 5c082e7 - [CUDA] Add CUDA wrappers over clang builtins for sm_90.

2023-05-25 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-05-25T11:57:58-07:00 New Revision: 5c082e7e15e38a2eea1f506725efe636a5b1bf8a URL: https://github.com/llvm/llvm-project/commit/5c082e7e15e38a2eea1f506725efe636a5b1bf8a DIFF:

[clang] 25708b3 - [NVPTX, CUDA] barrier intrinsics and builtins for sm_90

2023-05-25 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-05-25T11:57:57-07:00 New Revision: 25708b3df6e359123d5bce137652af812e168cfc URL: https://github.com/llvm/llvm-project/commit/25708b3df6e359123d5bce137652af812e168cfc DIFF:

[clang] 0a0bae1 - [CUDA] plumb through new sm_90-specific builtins.

2023-05-25 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-05-25T11:57:56-07:00 New Revision: 0a0bae1e9f94ec86ac17b0b4eb817741689f3739 URL: https://github.com/llvm/llvm-project/commit/0a0bae1e9f94ec86ac17b0b4eb817741689f3739 DIFF:

[clang] 0ad5d40 - [CUDA] Relax restrictions on variadics in host-side compilation.

2023-05-25 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-05-25T11:57:54-07:00 New Revision: 0ad5d40fa19f27db0e5f999d0e17b0c18b811019 URL: https://github.com/llvm/llvm-project/commit/0ad5d40fa19f27db0e5f999d0e17b0c18b811019 DIFF:

[clang] ffb635c - [CUDA] bump supported CUDA version to 12.1/11.8

2023-05-25 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-05-25T11:57:55-07:00 New Revision: ffb635cb2d4e374e52b12066893458a8b70889fa URL: https://github.com/llvm/llvm-project/commit/ffb635cb2d4e374e52b12066893458a8b70889fa DIFF:

[clang] 29cb080 - [CUDA] Fix wrappers for sm_80 functions

2023-05-24 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-05-24T11:48:39-07:00 New Revision: 29cb080c363d655ab1179a5564f1a82460e49a06 URL: https://github.com/llvm/llvm-project/commit/29cb080c363d655ab1179a5564f1a82460e49a06 DIFF:

[clang] 4450285 - [CUDA] provide wrapper functions for new NVCC builtins.

2023-05-19 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-05-19T11:48:08-07:00 New Revision: 4450285bd74079bf87ba7b824a8dec8dcfb586ef URL: https://github.com/llvm/llvm-project/commit/4450285bd74079bf87ba7b824a8dec8dcfb586ef DIFF:

[clang] 6963c61 - [NVPTX/CUDA] added an optional src_size argument to __nvvm_cp_async*

2023-05-19 Thread Artem Belevich via cfe-commits
Author: Artem Belevich Date: 2023-05-19T10:59:36-07:00 New Revision: 6963c61f0f6e4be2039cb45e824ea1e83a8f1526 URL: https://github.com/llvm/llvm-project/commit/6963c61f0f6e4be2039cb45e824ea1e83a8f1526 DIFF:

<    1   2   3   4   5   6   7   8   9   >