[clang] [Clang] Fix a non-effective assertion (PR #81083)

2024-02-08 Thread Matt Arsenault via cfe-commits
@@ -5908,7 +5908,7 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID, } } -assert(PTy->canLosslesslyBitCastTo(FTy->getParamType(i)) && +assert(ArgValue->getType()->canLosslesslyBitCastTo(PTy) && --

[clang] [Clang] Fix a non-effective assertion (PR #81083)

2024-02-08 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/81083 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-08 Thread Matt Arsenault via cfe-commits
@@ -167,6 +167,10 @@ def FeatureCuMode : SubtargetFeature<"cumode", "Enable CU wavefront execution mode" >; +def FeaturePreciseMemory arsenm wrote: The subtarget feature prefix should be removed. The subtarget feature name is not the user facing component

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-02-08 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,199 @@ +; Testing the -amdgpu-precise-memory-op option +; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -mattr=+amdgpu-precise-memory-op -verify-machineinstrs < %s | FileCheck %s -check-prefixes=GFX9 +; RUN: llc -mtriple=amdgcn -mcpu=gfx90a -mattr=+amdgpu-precise-memory-op -v

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-08 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: I think this needs codegen tests for the gfx900 vs. gfx906 mad_mix/fma_fix issue https://github.com/llvm/llvm-project/pull/76955 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-b

[clang] [llvm] [RFC][WIP][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-08 Thread Matt Arsenault via cfe-commits
@@ -2819,11 +2819,11 @@ def int_amdgcn_fdot2_f16_f16 : def int_amdgcn_fdot2_bf16_bf16 : ClangBuiltin<"__builtin_amdgcn_fdot2_bf16_bf16">, DefaultAttrsIntrinsic< -[llvm_i16_ty], // %r +[llvm_bfloat_ty], // %r arsenm wrote: Changing the clang bui

[clang] [llvm] [RFC][WIP][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-08 Thread Matt Arsenault via cfe-commits
@@ -2835,8 +2835,8 @@ def int_amdgcn_fdot2_f32_bf16 : DefaultAttrsIntrinsic< [llvm_float_ty], // %r [ - llvm_v2i16_ty, // %a - llvm_v2i16_ty, // %b + llvm_v2bf16_ty, // %a + llvm_v2bf16_ty, // %b arsenm wrote: For potential revert

[clang] [llvm] [RFC][WIP][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-08 Thread Matt Arsenault via cfe-commits
@@ -1562,8 +1562,9 @@ bool IRTranslator::translateBitCast(const User &U, bool IRTranslator::translateCast(unsigned Opcode, const User &U, MachineIRBuilder &MIRBuilder) { - if (U.getType()->getScalarType()->isBFloatTy() || - U.getOperand(0

[clang] [llvm] [RFC][WIP][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-08 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,8 @@ +// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s | FileCheck %s +// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1200 -show-encoding %s | FileCheck %s + +v_dot2_bf16_bf16 v5, v1, v2, 100.0 arsenm wrote: does this help with #79369 at all? http

[clang] [llvm] [RFC][WIP][AMDGPU] Use `bf16` instead of `i16` for bfloat (PR #80908)

2024-02-08 Thread Matt Arsenault via cfe-commits
@@ -1562,8 +1562,9 @@ bool IRTranslator::translateBitCast(const User &U, bool IRTranslator::translateCast(unsigned Opcode, const User &U, MachineIRBuilder &MIRBuilder) { - if (U.getType()->getScalarType()->isBFloatTy() || - U.getOperand(0

[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-02-08 Thread Matt Arsenault via cfe-commits
arsenm wrote: Next piece in #81108 https://github.com/llvm/llvm-project/pull/74056 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] InstCombine: Enable SimplifyDemandedUseFPClass and remove flag (PR #81108)

2024-02-08 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/81108 This completes the unrevert of ef388334ee5a3584255b9ef5b3fefdb244fa3fd7. >From 7b5b50597e13c647ec70beab35dcc9b643bff42f Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Thu, 8 Feb 2024 14:15:33 +0530 Subject:

[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-02-08 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/74056 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-02-08 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/74056 >From 9be777d5b39852cf3c0b2538fd5f712922672caa Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 1 Dec 2023 18:00:13 +0900 Subject: [PATCH 1/4] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass""

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,273 @@ +// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py + +// REQUIRES: x86-registered-target +// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-cpu x86-64-v4 -std=c23 -O1 -ffreestanding -emit-llvm -o - %s | FileCheck %s + +// Th

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,117 @@ +// RUN: %clang_cc1 -triple i386-unknown-linux-gnu -Wno-varargs -O1 -disable-llvm-passes -emit-llvm -o - %s | opt --passes=instcombine | opt -passes="expand-variadics,default" -S | FileCheck %s --check-prefixes=CHECK,X86Linux arsenm wrote: ca

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,589 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: -p --function-signature +; RUN: opt -S --passes=expand-variadics < %s | FileCheck %s +target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,589 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: -p --function-signature +; RUN: opt -S --passes=expand-variadics < %s | FileCheck %s +target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,589 @@ +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: -p --function-signature +; RUN: opt -S --passes=expand-variadics < %s | FileCheck %s +target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache

[clang] [llvm] [transforms] Inline simple variadic functions (PR #81058)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,698 @@ +//===-- ExpandVariadicsPass.cpp *- C++ -*-=// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache

[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -1877,3 +1877,139 @@ Value *InstCombinerImpl::SimplifyDemandedVectorElts(Value *V, return MadeChange ? I : nullptr; } + +/// For floating-point classes that resolve to a single bit pattern, return that +/// value. +static Constant *getFPClassConstant(Type *Ty, FPClassTe

[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -1877,3 +1877,139 @@ Value *InstCombinerImpl::SimplifyDemandedVectorElts(Value *V, return MadeChange ? I : nullptr; } + +/// For floating-point classes that resolve to a single bit pattern, return that +/// value. +static Constant *getFPClassConstant(Type *Ty, FPClassTe

[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -1877,3 +1877,139 @@ Value *InstCombinerImpl::SimplifyDemandedVectorElts(Value *V, return MadeChange ? I : nullptr; } + +/// For floating-point classes that resolve to a single bit pattern, return that +/// value. +static Constant *getFPClassConstant(Type *Ty, FPClassTe

[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -1877,3 +1877,139 @@ Value *InstCombinerImpl::SimplifyDemandedVectorElts(Value *V, return MadeChange ? I : nullptr; } + +/// For floating-point classes that resolve to a single bit pattern, return that +/// value. +static Constant *getFPClassConstant(Type *Ty, FPClassTe

[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -1877,3 +1877,139 @@ Value *InstCombinerImpl::SimplifyDemandedVectorElts(Value *V, return MadeChange ? I : nullptr; } + +/// For floating-point classes that resolve to a single bit pattern, return that +/// value. +static Constant *getFPClassConstant(Type *Ty, FPClassTe

[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-02-07 Thread Matt Arsenault via cfe-commits
arsenm wrote: > I don't know why it fails: > > ``` > error: patch failed: llvm/lib/Transforms/InstCombine/InstCombineInternal.h:551 > error: llvm/lib/Transforms/InstCombine/InstCombineInternal.h: patch does not > apply > error: patch failed: > llvm/lib/Transforms/InstCombine/InstCombineSimplif

[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-02-07 Thread Matt Arsenault via cfe-commits
arsenm wrote: > @arsenm Can you rebase this patch first? It was already fresh, I just re-merged again with no conflicts https://github.com/llvm/llvm-project/pull/74056 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi

[clang] [llvm] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-02-07 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/74056 >From 9be777d5b39852cf3c0b2538fd5f712922672caa Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 1 Dec 2023 18:00:13 +0900 Subject: [PATCH 1/2] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass""

[clang] [NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. (PR #78759)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -285,6 +289,20 @@ void NVPTXTargetCodeGenInfo::addNVVMMetadata(llvm::GlobalValue *GV, bool NVPTXTargetCodeGenInfo::shouldEmitStaticExternCAliases() const { return false; } + +llvm::Constant * +NVPTXTargetCodeGenInfo::getNullPointer(const CodeGen::CodeGenModule &CGM, +

[clang] [NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. (PR #78759)

2024-02-07 Thread Matt Arsenault via cfe-commits
@@ -285,6 +289,20 @@ void NVPTXTargetCodeGenInfo::addNVVMMetadata(llvm::GlobalValue *GV, bool NVPTXTargetCodeGenInfo::shouldEmitStaticExternCAliases() const { return false; } + +llvm::Constant * +NVPTXTargetCodeGenInfo::getNullPointer(const CodeGen::CodeGenModule &CGM, +

[clang] Disable FTZ/DAZ when compiling shared libraries by default. (PR #80475)

2024-02-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: > Do you only set the register for kernel entries? Yes, it's the pre-initialized state. Non kernels can't be arbitrarily invoked from the host > Is the attribute ignored for other functions? No, it's an informative attribute about that the mode is. The compiler isn't trying t

[clang] Disable FTZ/DAZ when compiling shared libraries by default. (PR #80475)

2024-02-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: > > So, alternatively...we could just go with the simplest solution, and use > > "ieee" as the default even under -ffast-math. > +1. There hasn't been a performance reason to use FTZ/DAZ since ~2011. Maybe there's still a power benefit? But in that case you could still explicitl

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: > @arsenm Are you suggesting that these should instead be a range of > minimum/maximum number of workitems globally? That's how all of the other attributes we already have do this. amdgpu-waves-per-eu is a single min, max pair. Same with amdgpu-flat-work-group-size Although thi

[clang] [AMDGPU] Treat printf as builtin for OpenCL (PR #72554)

2024-02-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm requested changes to this pull request. Is this redundant with #68515? Do we just need to add OpenCL test coverage? https://github.com/llvm/llvm-project/pull/72554 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https:

[llvm] [clang] Revert "InstCombine: Fold is.fpclass(x, fcInf) to fabs+fcmp" (PR #76338)

2024-02-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: @dtcxzyw are you planning on a codegen patch to improve the backend handling? https://github.com/llvm/llvm-project/pull/76338 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-com

[llvm] [clang-tools-extra] [clang] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-02-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: ping, I want to get this in and move to remove the flag https://github.com/llvm/llvm-project/pull/74056 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. (PR #78759)

2024-02-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. amdgpu parts lgtm (which could be split to a separate change from the ptx change) https://github.com/llvm/llvm-project/pull/78759 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lis

[clang] [CUDA][HIP] Exclude external variables from constant promotion. (PR #73549)

2024-02-06 Thread Matt Arsenault via cfe-commits
@@ -104,3 +106,14 @@ void fun() { (void) b; (void) var_host_only; } + +extern __global__ void external_func(); +extern void* const external_dep[] = { arsenm wrote: Sounds broken that the behavior would differ between array and non-array ? https://github.c

[lldb] [clang] [clang-tools-extra] [llvm] [libc] [libcxx] [lld] [flang] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/67104 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm requested changes to this pull request. One attribute https://github.com/llvm/llvm-project/pull/79035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [AMDGPU] Adding the amdgpu-num-work-groups function attribute (PR #79035)

2024-02-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: One attribute, with a range, would be better than two attributes. This is how it is handled in the similar cases. I also think this should be in terms of work items, not workgroups https://github.com/llvm/llvm-project/pull/79035 ___

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Matt Arsenault via cfe-commits
@@ -520,6 +520,104 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following === === = = === === == +Generic processors also exist. ---

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Matt Arsenault via cfe-commits
@@ -156,6 +156,12 @@ void AMDGPUAsmPrinter::emitFunctionBodyStart() { const GCNSubtarget &STM = MF->getSubtarget(); const Function &F = MF->getFunction(); + // TODO: We're checking this late, would be nice to check it earlier. + if (STM.requiresCodeObjectV6() && CodeObje

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Matt Arsenault via cfe-commits
@@ -139,10 +139,10 @@ bool AMDGPURemoveIncompatibleFunctions::checkFunction(Function &F) { const GCNSubtarget *ST = static_cast(TM->getSubtargetImpl(F)); - // Check the GPU isn't generic. Generic is used for testing only - // and we don't want this pass to interfere

[clang] [llvm] [AMDGPU] Introduce GFX9/10.1/10.3/11 Generic Targets (PR #76955)

2024-02-06 Thread Matt Arsenault via cfe-commits
@@ -279,13 +279,25 @@ void AMDGPUTargetInfo::getTargetDefines(const LangOptions &Opts, if (GPUKind == llvm::AMDGPU::GK_NONE && !IsHIPHost) return; - StringRef CanonName = isAMDGCN(getTriple()) ? getArchNameAMDGCN(GPUKind) -

[clang] [compiler-rt] [HIP] support 128 bit int division (PR #71978)

2024-02-06 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: Can we land the infrastructure to allow linking of compiler-rt binaries without the specifics for divide 128? https://github.com/llvm/llvm-project/pull/71978 ___ cfe-commits mailing list cfe-commits@lists.llvm.org

[clang] Disable FTZ/DAZ when compiling shared libraries by default. (PR #80475)

2024-02-06 Thread Matt Arsenault via cfe-commits
arsenm wrote: > I may have mentioned a few times that I don't like function attributes > controlling fast-math behaviors. It doesn't control it, it's informative. You just get undefined behavior if you end up calling mismatched mode functions. It does control it in the AMDGPU entry point func

[llvm] [clang] [clang-tools-extra] [AArch64] Implement -fno-plt for SelectionDAG/GlobalISel (PR #78890)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -1293,8 +1293,19 @@ bool AArch64CallLowering::lowerCall(MachineIRBuilder &MIRBuilder, !Subtarget.noBTIAtReturnTwice() && MF.getInfo()->branchTargetEnforcement()) Opc = AArch64::BLR_BTI; - else + else { +// For an intrinsic call (e.g. memset),

[clang] [clang][AMDGPU][CUDA] Handle __builtin_printf for device printf (PR #68515)

2024-02-05 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/68515 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Disable FTZ/DAZ when compiling shared libraries by default. (PR #80475)

2024-02-05 Thread Matt Arsenault via cfe-commits
arsenm wrote: > * Which value allows generating the "fastest" math code -- disregarding > correctness? I'd assume that "dynamic" is least optimizable, "ieee" in the > middle, and "preserve-sign" is likely to generate the "fastest" code? This depends on the target and operations. For some funct

[clang] [clang][AMDGPU][CUDA] Handle __builtin_printf for device printf (PR #68515)

2024-02-05 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/68515 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [clang][HLSL][SPRI-V] Add convergence intrinsics (PR #80680)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -1129,8 +1129,97 @@ struct BitTest { static BitTest decodeBitTestBuiltin(unsigned BuiltinID); }; + +// Returns the first convergence entry/loop/anchor instruction found in |BB|. +// std::nullopt otherwise. +std::optional getConvergenceToken(llvm::BasicBlock *BB) { + for

[clang] [llvm] [clang][HLSL][SPRI-V] Add convergence intrinsics (PR #80680)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -1129,8 +1129,97 @@ struct BitTest { static BitTest decodeBitTestBuiltin(unsigned BuiltinID); }; + +// Returns the first convergence entry/loop/anchor instruction found in |BB|. +// std::nullopt otherwise. +std::optional getConvergenceToken(llvm::BasicBlock *BB) { + for

[clang] [llvm] [clang-tools-extra] [AArch64] Implement -fno-plt for SelectionDAG/GlobalISel (PR #78890)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -1293,8 +1293,19 @@ bool AArch64CallLowering::lowerCall(MachineIRBuilder &MIRBuilder, !Subtarget.noBTIAtReturnTwice() && MF.getInfo()->branchTargetEnforcement()) Opc = AArch64::BLR_BTI; - else + else { +// For an intrinsic call (e.g. memset),

[compiler-rt] [libcxx] [flang] [openmp] [llvm] [clang-tools-extra] [clang] [lldb] [lld] [libc] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -862,14 +862,18 @@ static void instrumentOneFunc( auto Name = FuncInfo.FuncNameVar; auto CFGHash = ConstantInt::get(Type::getInt64Ty(M->getContext()), FuncInfo.FunctionHash); + // Make sure that pointer to global is passed in with zero

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-05 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/80183 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libc] [libcxx] [clang-tools-extra] [flang] [lld] [lldb] [llvm] [clang] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -1025,6 +1025,26 @@ void AMDGPUAsmPrinter::EmitProgramInfoSI(const MachineFunction &MF, OutStreamer->emitInt32(MFI->getNumSpilledVGPRs()); } +// Helper function to add common PAL Metadata 3.0+ +static void EmitPALMetadataCommon(AMDGPUPALMetadata *MD, +

[llvm] [clang-tools-extra] [flang] [lldb] [clang] [libcxx] [libc] [lld] [AMDGPU] Add pal metadata 3.0 support to callable pal funcs (PR #67104)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -1127,10 +1131,16 @@ void AMDGPUAsmPrinter::emitPALFunctionMetadata(const MachineFunction &MF) { MD->setFunctionScratchSize(FnName, MFI.getStackSize()); const GCNSubtarget &ST = MF.getSubtarget(); - // Set compute registers - MD->setRsrc1(CallingConv::AMDGPU_CS, -

[clang] Disable FTZ/DAZ when compiling shared libraries by default. (PR #80475)

2024-02-05 Thread Matt Arsenault via cfe-commits
arsenm wrote: > I wonder if, instead, we should just have `-ffast-math` always downgrade > `-fdenormal-fp-math=ieee` to `-fdenormal-fp-math=preserve-sign`, under the > rationale of "you asked for fast math, and preserve-sign mode might let the > compiler generate faster code"? This could also

[clang] [clang][AMDGPU][CUDA] Handle __builtin_printf for device printf (PR #68515)

2024-02-05 Thread Matt Arsenault via cfe-commits
arsenm wrote: > > It looks reasonable to me, although I'm not really an AMDGPU person. /me > > summons @arsenm ? > > AMDGPU backend relies on LLVM passes to translate printf at IR level. For the OpenCL case only, not for HIP/OpenMP https://github.com/llvm/llvm-project/pull/68515 _

[clang] [clang][AMDGPU][CUDA] Handle __builtin_printf for device printf (PR #68515)

2024-02-05 Thread Matt Arsenault via cfe-commits
@@ -0,0 +1,21 @@ +// REQUIRES: amdgpu-registered-target +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -disable-llvm-optzns -mprintf-kind=hostcall -fno-builtin-printf -fcuda-is-device \ +// RUN: -o - %s | FileCheck %s +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emi

[llvm] [clang] Reapply "[AMDGPU] Add InstCombine rule for ballot.i64 intrinsic in wave32 mode." (PR #80303)

2024-02-02 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/80303 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins (PR #79980)

2024-02-01 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/79980 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[llvm] [clang] [clang-tools-extra] [AMDGPU] Add code model (#70760) test for amdgpu target. (PR #71019)

2024-02-01 Thread Matt Arsenault via cfe-commits
arsenm wrote: > LGTM. Please update PR title before merging So this was only supposed to add the test, or implement this too? https://github.com/llvm/llvm-project/pull/71019 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.or

[llvm] [clang] [clang-tools-extra] [AMDGPU] Add code model (#70760) test for amdgpu target. (PR #71019)

2024-02-01 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/71019 >From 2477ae87e7bb82b4551e42b8255dfe93dadff453 Mon Sep 17 00:00:00 2001 From: Pravin Jagtap Date: Thu, 2 Nov 2023 01:05:35 -0400 Subject: [PATCH 1/6] [AMDGPU] Add code model (#70760) test for amdgpu target. ---

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-01 Thread Matt Arsenault via cfe-commits
@@ -4,13 +4,10 @@ // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -target-feature -wavefrontsize64 -verify -S -o - %s // RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx1010 -verify -S -o - %s +// expected-no-diagnostics + typedef unsigned long ulong; void test_ba

[clang] [AMDGPU] Allow w64 ballot to be used on w32 targets (PR #80183)

2024-02-01 Thread Matt Arsenault via cfe-commits
@@ -151,7 +151,7 @@ BUILTIN(__builtin_amdgcn_mqsad_u32_u8, "V4UiWUiUiV4Ui", "nc") //===--===// TARGET_BUILTIN(__builtin_amdgcn_ballot_w32, "ZUib", "nc", "wavefrontsize32") -TARGET_BUILTIN(__builtin_amdgcn_ba

[compiler-rt] [flang] [llvm] [clang] [libc] [clang-tools-extra] [libcxx] [AMDGPU] Every convergent operation needs post-isel processing (PR #80102)

2024-01-31 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/80102 >From b64f7ba4afc6cbb3e5e34757e6979a0d5ee73e2b Mon Sep 17 00:00:00 2001 From: Sameer Sahasrabuddhe Date: Tue, 30 Jan 2024 11:26:53 +0530 Subject: [PATCH] [AMDGPU] Every convergent operation needs post-isel proces

[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #80035)

2024-01-30 Thread Matt Arsenault via cfe-commits
@@ -175,6 +175,8 @@ Predefined Macros - Defined when the GPU default stream is set to per-thread mode. * - ``HIP_API_PER_THREAD_DEFAULT_STREAM`` - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated. + * - ``__AMDGCN_WAVEFRONT_SIZE__``

[llvm] [clang] [lld] [AMDGPU] Rename COV module flag to amdhsa_code_object_version (PR #79905)

2024-01-30 Thread Matt Arsenault via cfe-commits
@@ -25,4 +25,4 @@ entry: } !llvm.module.flags = !{!0} -!0 = !{i32 1, !"amdgpu_code_object_version", i32 500} +!0 = !{i32 1, !"amdhsa_code_object_version", i32 500} arsenm wrote: Separate would be better https://github.com/llvm/llvm-project/pull/79905 __

[clang] [AMDGPU] Check wavefrontsize for GFX11 WMMA builtins (PR #79980)

2024-01-30 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm commented: Also should get a run line that errors due to wavesize? https://github.com/llvm/llvm-project/pull/79980 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-co

[llvm] [clang-tools-extra] [clang] [AArch64] Implement -fno-plt for SelectionDAG/GlobalISel (PR #78890)

2024-01-30 Thread Matt Arsenault via cfe-commits
@@ -1293,8 +1293,19 @@ bool AArch64CallLowering::lowerCall(MachineIRBuilder &MIRBuilder, !Subtarget.noBTIAtReturnTwice() && MF.getInfo()->branchTargetEnforcement()) Opc = AArch64::BLR_BTI; - else + else { +// For an intrinsic call (e.g. memset),

[lld] [llvm] [clang] [AMDGPU] Rename COV module flag to amdhsa_code_object_version (PR #79905)

2024-01-29 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/79905 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [libc] [libcxx] [clang] [llvm] [lldb] [compiler-rt] [lld] [ASan][AMDGPU] Fix Assertion Failure. (PR #79795)

2024-01-29 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/79795 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-29 Thread Matt Arsenault via cfe-commits
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI, return Changed; } +bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) { + const GCNSubtarget &ST = MF.getSubtarget(); + const SIInstrInfo *TII = ST.get

[clang] [llvm] [AMDGPU] Emit a waitcnt instruction after each memory instruction (PR #79236)

2024-01-29 Thread Matt Arsenault via cfe-commits
@@ -2561,6 +2567,70 @@ bool SIMemoryLegalizer::expandAtomicCmpxchgOrRmw(const SIMemOpInfo &MOI, return Changed; } +bool SIMemoryLegalizer::GFX9InsertWaitcntForPreciseMem(MachineFunction &MF) { arsenm wrote: can you just make this happen as a consequence of

[llvm] [clang-tools-extra] [clang] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-01-29 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/74056 >From 9be777d5b39852cf3c0b2538fd5f712922672caa Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 1 Dec 2023 18:00:13 +0900 Subject: [PATCH 1/2] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass""

[clang] [llvm] [clang-tools-extra] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" (PR #74056)

2024-01-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/74056 >From 9be777d5b39852cf3c0b2538fd5f712922672caa Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Fri, 1 Dec 2023 18:00:13 +0900 Subject: [PATCH] Reapply "InstCombine: Introduce SimplifyDemandedUseFPClass"" This

[llvm] [clang-tools-extra] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest (PR #66522)

2024-01-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm closed https://github.com/llvm/llvm-project/pull/66522 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [llvm] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest (PR #66522)

2024-01-26 Thread Matt Arsenault via cfe-commits
@@ -2641,8 +2641,8 @@ define float @assume_false_smallest_normal(float %arg) { } define float @clamp_false_nan(float %arg) { -; CHECK-LABEL: define float @clamp_false_nan( -; CHECK-SAME: float returned [[ARG:%.*]]) #[[ATTR2]] { +; CHECK-LABEL: define nofpclass(nan inf nzero su

[llvm] [clang-tools-extra] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest (PR #66522)

2024-01-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/66522 >From 076ab2374d84c4112e0bf3fb11ecda2f5774785e Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 11 Sep 2023 10:56:40 +0300 Subject: [PATCH 1/7] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest --

[clang-tools-extra] [llvm] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest (PR #66522)

2024-01-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/66522 >From 076ab2374d84c4112e0bf3fb11ecda2f5774785e Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 11 Sep 2023 10:56:40 +0300 Subject: [PATCH 1/6] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest --

[clang-tools-extra] [llvm] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest (PR #66522)

2024-01-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/66522 >From 076ab2374d84c4112e0bf3fb11ecda2f5774785e Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Mon, 11 Sep 2023 10:56:40 +0300 Subject: [PATCH 1/2] ValueTracking: Merge fcmpImpliesClass and fcmpToClassTest --

[clang-tools-extra] [llvm] [clang] [SeperateConstOffsetFromGEP] Handle `or disjoint` flags (PR #76997)

2024-01-26 Thread Matt Arsenault via cfe-commits
arsenm wrote: Not sure if we need additional negative tests for missing disjoints https://github.com/llvm/llvm-project/pull/76997 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang-tools-extra] [llvm] [clang] [SeperateConstOffsetFromGEP] Handle `or disjoint` flags (PR #76997)

2024-01-26 Thread Matt Arsenault via cfe-commits
https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/76997 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [NVPTX][AMDGPU][CodeGen] Fix `local_space nullptr` handling for NVPTX and local/private `nullptr` value for AMDGPU. (PR #78759)

2024-01-25 Thread Matt Arsenault via cfe-commits
@@ -418,8 +418,10 @@ class LLVM_LIBRARY_VISIBILITY AMDGPUTargetInfo final : public TargetInfo { // value ~0. uint64_t getNullPointerValue(LangAS AS) const override { // FIXME: Also should handle region. -return (AS == LangAS::opencl_local || AS == LangAS::opencl_pr

[clang] [mlir] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-23 Thread Matt Arsenault via cfe-commits
@@ -2601,67 +2601,73 @@ def int_amdgcn_ds_bvh_stack_rtn : [ImmArg>, IntrWillReturn, IntrNoCallback, IntrNoFree] >; +def int_amdgcn_s_wait_event_export_ready : + ClangBuiltin<"__builtin_amdgcn_s_wait_event_export_ready">, + Intrinsic<[], [], [IntrNoMem, IntrHasSideEffec

[flang] [libc] [clang-tools-extra] [libcxx] [llvm] [clang] [compiler-rt] [lldb] [lld] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-23 Thread Matt Arsenault via cfe-commits
Mirko =?utf-8?q?Brkušanin?= , Mirko =?utf-8?q?Brkušanin?= ,Mirko Brkusanin ,Mariusz Sikora Message-ID: In-Reply-To: https://github.com/arsenm edited https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mailing list cfe-commits@lists.llvm.o

[libc] [flang] [clang] [clang-tools-extra] [lldb] [libcxx] [lld] [compiler-rt] [llvm] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-23 Thread Matt Arsenault via cfe-commits
Mirko =?utf-8?q?Brku=C5=A1anin?= , Mirko =?utf-8?q?Brku=C5=A1anin?= ,Mirko Brkusanin ,Mariusz Sikora Message-ID: In-Reply-To: https://github.com/arsenm approved this pull request. https://github.com/llvm/llvm-project/pull/78414 ___ cfe-commits mail

[compiler-rt] [libcxx] [lldb] [flang] [libc] [lld] [llvm] [clang-tools-extra] [clang] [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (PR #78414)

2024-01-23 Thread Matt Arsenault via cfe-commits
Mirko =?utf-8?q?Brkušanin?= , Mirko =?utf-8?q?Brkušanin?= ,Mirko Brkusanin ,Mariusz Sikora Message-ID: In-Reply-To: @@ -8770,6 +8781,22 @@ void AMDGPUAsmParser::cvtVOP3DPP(MCInst &Inst, const OperandVector &Operands, } } +int VdstInIdx = AMDGPU::getNamedOper

[clang] [mlir] [llvm] [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (PR #77795)

2024-01-23 Thread Matt Arsenault via cfe-commits
@@ -2601,67 +2601,73 @@ def int_amdgcn_ds_bvh_stack_rtn : [ImmArg>, IntrWillReturn, IntrNoCallback, IntrNoFree] >; +def int_amdgcn_s_wait_event_export_ready : + ClangBuiltin<"__builtin_amdgcn_s_wait_event_export_ready">, + Intrinsic<[], [], [IntrNoMem, IntrHasSideEffec

[llvm] [mlir] [clang] [AMDGPU] Change default AMDHSA Code Object version to 5 (PR #79038)

2024-01-22 Thread Matt Arsenault via cfe-commits
arsenm wrote: Should get a mention in the release notes https://github.com/llvm/llvm-project/pull/79038 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

<    4   5   6   7   8   9   10   11   12   13   >