[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-29 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D153953#4458386 , @foad wrote: > In D153953#4458134 , @sameerds > wrote: > >> @pravinjagtap @arsenm ... reverting the mbcnt intrinsic affects tests that >> were added for atomic

[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-28 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds requested review of this revision. sameerds added a comment. @pravinjagtap @arsenm ... reverting the mbcnt intrinsic affects tests that were added for atomic optimizations. In particular, the mbcnt is now being moved across/into/out of control flow because it is no longer convergent. I

[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-28 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D153953#4455794 , @yaxunl wrote: > Marking mbcnt as convergent, together with https://reviews.llvm.org/D144756 > prevent mbcnt to be merged, which fixed the reported issue. > > Do you have an alternative fix for the issue?

[PATCH] D153953: Revert "[AMDGPU] Mark mbcnt as convergent"

2023-06-28 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds created this revision. Herald added subscribers: kerbowa, tpr, dstuttard, yaxunl, jvesely, kzhuravl. Herald added a project: All. sameerds requested review of this revision. Herald added subscribers: llvm-commits, cfe-commits, wdng. Herald added projects: clang, LLVM. This reverts commit

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-06-07 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:386-387 +} else { + auto IntTy = dyn_cast(Args[i]->getType()); + if (IntTy && IntTy->getBitWidth() == 32) +WhatToStore.push_back( vikramRH wrote: >

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-23 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:184 // specify a string, i.e, the "%s" specifier with optional '*' characters. -static void locateCStrings(SparseBitVector<8> , Value *Fmt) { - StringRef Str; - if

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-17 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:184-185 // specify a string, i.e, the "%s" specifier with optional '*' characters. -static void locateCStrings(SparseBitVector<8> , Value *Fmt) { - StringRef Str; - if

[PATCH] D150427: [AMDGPU] Non hostcall printf support for HIP

2023-05-17 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:212 +// helper struct to package the string related data +typedef struct S { + std::string Str; you can just say "struct StringData" Comment at:

[PATCH] D141700: AMDGPU: Move enqueued block handling into clang

2023-02-05 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. LGTM, to the extent that I can see that the change does what is advertised, and the ultimately emitted HSA metadata preserves the current contract with the runtime. A couple of tests can use a little more explanatory comments as noted. Comment at:

[PATCH] D82087: AMDGPU/clang: Add builtins for llvm.amdgcn.ballot

2022-12-26 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. And note that the change description is written in a first-person train of thought. Please do rewrite it! CHANGES SINCE LAST ACTION https://reviews.llvm.org/D82087/new/ https://reviews.llvm.org/D82087 ___ cfe-commits

[PATCH] D82087: AMDGPU/clang: Add builtins for llvm.amdgcn.ballot

2022-12-26 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds accepted this revision. sameerds added inline comments. This revision is now accepted and ready to land. Comment at: clang/lib/Basic/Targets/AMDGPU.cpp:353 + if (HaveWave32 && HaveWave64) { +Diags.Report(diag::err_invalid_feature_combination) +<<

[PATCH] D82087: AMDGPU/clang: Add builtins for llvm.amdgcn.ballot

2022-12-23 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/test/CodeGenOpenCL/amdgpu-features.cl:7 +// RUN: %clang_cc1 -triple amdgcn -S -emit-llvm -o - %s | FileCheck --check-prefix=NOCPU %s +// RUN: %clang_cc1 -triple amdgcn -target-feature +wavefrontsize32 -S -emit-llvm -o - %s |

[PATCH] D128907: [Clang] Disable noundef attribute for languages which allow uninitialized function arguments

2022-07-13 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. The current set of reviewers is mostly loaded with HIP engineers who are familiar with the issue and the proposed solution. But this solution affects all languages with convergent functions, which is visible from the affected tests. It will be good to seek comments

[PATCH] D124813: [HLSL] Add clang builtin for HLSL.

2022-05-05 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds resigned from this revision. sameerds added a comment. I am not much familiar with DirectX and HLSL, so unable to review this patch. It might help to post on the Discourse under Clang Frontend: https://discourse.llvm.org/c/clang/6 Repository: rG LLVM Github Monorepo CHANGES SINCE

[PATCH] D121951: [AMDGPU][OpenCL] Remove "printf and hostcall" diagnostic

2022-03-29 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds accepted this revision. sameerds added a comment. LGTM! But not sure if @yaxunl and/or @arsenm should have another look at the final version. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121951/new/ https://reviews.llvm.org/D121951

[PATCH] D121951: [AMDGPU][OpenCL] Add "amdgpu-no-hostcall-ptr" in Clang codegen pre-COV_5

2022-03-29 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D121951#3411856 , @scott.linder wrote: > @yaxunl Does excluding device-libs via COV_None make sense? > > @sameerds Would you still rather we just not add this attribute in the > frontend at all? I'm OK with it if that is

[PATCH] D121951: [AMDGPU][OpenCL] Add "amdgpu-no-hostcall-ptr" in Clang codegen pre-COV_5

2022-03-25 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/CodeGen/TargetInfo.cpp:9381 + M.getTarget().getTargetOpts().CodeObjectVersion != 500) { +F->addFnAttr("amdgpu-no-hostcall-ptr"); + } The frontend does not need to worry about this attribute. See the

[PATCH] D121951: [AMDGPU] Only warn when mixing printf and hostcall

2022-03-18 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D121951#3392470 , @scott.linder wrote: > In D121951#3391472 , @sameerds > wrote: > >> The check for "__ockl_hostcall_internal" is not longer relevant with the new >> hostcall

[PATCH] D121951: [AMDGPU] Only warn when mixing printf and hostcall

2022-03-18 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. Yeah, it happened in D119216 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D121951/new/ https://reviews.llvm.org/D121951 ___ cfe-commits

[PATCH] D121951: [AMDGPU] Only warn when mixing printf and hostcall

2022-03-18 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. The check for "__ockl_hostcall_internal" is not longer relevant with the new hostcall attribute. Can we simply remove this check? What is the situation where the warning is still useful? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

2022-02-11 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds. This revision was automatically updated to reflect the committed changes. Closed by commit rGd8f99bb6e064: [AMDGPU] replace hostcall module flag with function attribute (authored by sameerds). Repository: rG LLVM Github Monorepo CHANGES

[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

2022-02-10 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds updated this revision to Diff 407762. sameerds added a comment. use AAPointerInfo; add more tests to demonstrate the same Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D119216/new/ https://reviews.llvm.org/D119216 Files:

[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

2022-02-09 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:566 + return false; +}; + sameerds wrote: > jdoerfert wrote: > > sameerds wrote: > > > jdoerfert wrote: > > > > sameerds wrote: > > > > > jdoerfert wrote: > > > > >

[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

2022-02-09 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds marked 2 inline comments as not done. sameerds added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:566 + return false; +}; + jdoerfert wrote: > sameerds wrote: > > jdoerfert wrote: > > > sameerds wrote: > > > >

[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

2022-02-08 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:566 + return false; +}; + jdoerfert wrote: > sameerds wrote: > > jdoerfert wrote: > > > jdoerfert wrote: > > > > sameerds wrote: > > > > > jdoerfert wrote: > > > > >

[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

2022-02-08 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds updated this revision to Diff 406889. sameerds added a comment. added tests for i128 load. hostcall position is now independent of subtarget. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D119216/new/ https://reviews.llvm.org/D119216

[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

2022-02-08 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:198 +const auto *STI = TM.getMCSubtargetInfo(); +return llvm::AMDGPU::getHostcallImplicitArgPosition(STI); + } arsenm wrote: > The ABI should not be a property of

[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

2022-02-08 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds updated this revision to Diff 406860. sameerds added a comment. further simplified the callback return value; moved hostcall info out of the subtarget Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D119216/new/

[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

2022-02-08 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:589 +A.checkForAllCallLikeInstructions(CheckForHostcallAccess, *this, + UsedAssumedInformation); + jdoerfert wrote: > Always check

[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

2022-02-08 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds updated this revision to Diff 406785. sameerds added a comment. eliminated NeedsHostcall by simply checking the return value of the walk over all calls Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D119216/new/

[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

2022-02-08 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:566 + return false; +}; + jdoerfert wrote: > You can use AAPointerInfo for the call site return IRPosition. It will > (through the iterations) gather all accesses

[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

2022-02-07 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds created this revision. Herald added subscribers: sdasgup3, wenzhicui, wrengr, Chia-hungDuan, foad, dcaballe, cota, teijeong, rdzhabarov, tatianashp, okura, jdoerfert, msifontes, jurahul, kuter, Kayjukh, grosul1, uenoku, Joonsoo, kerbowa, liufengdb, aartbik, mgester, arpith-jacob,

[PATCH] D115283: [AMDGPU] Set "amdgpu_hostcall" module flag if an AMDGPU function has calls to device lib functions that use hostcalls.

2022-01-31 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/CodeGen/TargetInfo.cpp:9434 + if (Callee && + (Callee->getName() == "__ockl_call_host_function" || + Callee->getName() == "__ockl_fprintf_stderr_begin")) { Just to confirm what others have

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-30 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D69498#2728451 , @JonChesterfield wrote: > I wonder if it is necessary for the exec mask to be implicit state, managed > by a convergent/divergent abstraction. > > We could reify it as an argument passed to kernels (where

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-24 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D69498#2712706 , @nhaehnle wrote: > In D69498#2711396 , @foad wrote: > >> I don't have much to add to the conversation except to point out that this >> definition defines

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-23 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D69498#2710675 , @arsenm wrote: > In D69498#2709218 , @sameerds wrote: > >> 1. It is actually okay add control dependences to a convergent function as >> long as the new branches are

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-22 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D69498#2708990 , @arsenm wrote: > You're still saying the same thing. This needs to be defined generically. > Frontends don't *have* to to anything, they'll just get the assumed > convergent behavior by default. Either

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-22 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D69498#2706524 , @arsenm wrote: > In D69498#2705441 , @sameerds wrote: > >> I realize now that what @foad says above puts the idea in a clearer context. >> Essentially, any check for

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-21 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. I realize now that what @foad says above puts the idea in a clearer context. Essentially, any check for isConvergent() isn't happening in a vacuum. It is relevant only in the presence of divergent control flow, which in turn is relevant only when the target has

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-21 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D69498#2704364 , @foad wrote: >> This recasts the whole question to be one about combining optimizations with >> target-specific information. The only changes required are in transforms >> that check

[PATCH] D69498: IR: Invert convergent attribute handling

2021-04-21 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. The way I see it, the notion of convergence is relevant only to a certain class of targets (usually represented by GPUs) and it only affects certain optimizations. Then why not have only these optimizations check `TTI` to see if convergence matters?

[PATCH] D82087: AMDGPU/clang: Add builtins for llvm.amdgcn.ballot

2020-07-12 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. >> https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions > > I think if the language interface insists on fixing the wave size, then I > think the correct solution is to implement this in the header

[PATCH] D82087: AMDGPU/clang: Add builtins for llvm.amdgcn.ballot

2020-07-08 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. The documentation for HIP __ballot seems to indicate that the user does not have to explicitly specify the warp size. How is that achieved with these new builtins? Can this be captured in a lit test here?

[PATCH] D78900: [HIP][AMDGPU] Enable structurizer workarounds

2020-06-04 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds abandoned this revision. sameerds added a comment. Abandoned in favour of enabling the workarounds in the AMDGPU backend: https://reviews.llvm.org/D81211 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78900/new/

[PATCH] D80996: [AMDGPU][OpenMP] Fix duplicate copies of arguments in commands

2020-06-03 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/Driver/ToolChains/HIP.cpp:389 - for (Arg *A : Args) { -DAL->append(A); + if (DeviceOffloadKind != Action::OFK_OpenMP) { +for (Arg *A : Args) { pdhaliwal wrote: > arsenm wrote: > > Needs a comment?

[PATCH] D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics

2020-05-29 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds accepted this revision. sameerds added a comment. LGTM, but please wait for approval from @arsenm Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D80804/new/ https://reviews.llvm.org/D80804 ___

[PATCH] D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics

2020-05-29 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D80804#2063522 , @saiislam wrote: > In D80804#2063451 , @sameerds wrote: > > > Actually, the question really is about why inc/dec are needed as separate > > operations either as IR

[PATCH] D80804: [AMDGPU] Expose llvm atomic inc/dec instructions as clang builtins for AMDGPU target

2020-05-29 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. Actually, the question really is about why inc/dec are needed as separate operations either as IR intrinsics or Clang builtins. Why not just expose a __builtin_amdgcn_atomicrmw that takes a scope, and map it to the LLVM atomicrmw? That would be way cleaner. The

[PATCH] D80804: [AMDGPU] Expose llvm atomic inc/dec instructions as clang builtins for AMDGPU target

2020-05-29 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/CodeGen/CGBuiltin.cpp:14529-14530 + + // llvm.amdgcn.atomic.inc and llvm.amdgcn.atomic.dec expects ordering and + // scope as unsigned values + Value *MemOrder = Builder.getInt32(static_cast(AO));

[PATCH] D80804: [AMDGPU] Expose llvm atomic inc/dec instructions as clang builtins for AMDGPU target

2020-05-29 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. The commit description needs fixing. These are not "llvm instructions" they are "AMDGCN intrinsics". They don't exist in the LangRef. Also, I would recommend inverting the first line of the description: "Introduce Clang builtins that are mapped to AMDGCN intrinsics".

[PATCH] D79400: [CMAKE] Fix build failure when source directory is read only

2020-05-29 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rG16fef6d0b46f: Fix build failure when source is read only (authored by pdhaliwal, committed by sameerds). Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

2020-05-27 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds accepted this revision. sameerds added a comment. This revision is now accepted and ready to land. LGTM! Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79754/new/ https://reviews.llvm.org/D79754

[PATCH] D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

2020-05-18 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/AST/Decl.cpp:3226 + // the only special cases that are supported by device-side runtime. + if (Context.getTargetInfo().getTriple().isAMDGCN() && + Context.getLangOpts().OpenMPIsDevice && Why is the

[PATCH] D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

2020-05-17 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/AST/Decl.cpp:3224 + // HIP does not have device-side standard library. printf and malloc are + // the only special cases that are supported by device-side runtime. This should say "OpenMP does not have

[PATCH] D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

2020-05-17 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/AST/Decl.cpp:3227 + !(BuiltinID == Builtin::BIprintf || BuiltinID == Builtin::BImalloc)) +return 0; + sameerds wrote: > This needs a test. This still needs a test. One that shows that specific

[PATCH] D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

2020-05-16 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/AST/Decl.cpp:3221 +!hasAttr()) || + Context.getTargetInfo().getTriple().isAMDGCN()) && !(BuiltinID == Builtin::BIprintf || BuiltinID == Builtin::BImalloc)) sameerds wrote: > This seems

[PATCH] D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

2020-05-15 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/include/clang/Basic/TargetInfo.h:1261 + /// Currently only supports NVPTX and AMDGCN + static bool isOpenMPGPU(llvm::Triple ) { +return T.isNVPTX() || T.isAMDGCN(); How is "OpenMP-compatible GPU" defined? I

[PATCH] D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

2020-05-11 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/AST/Decl.cpp:3227 + !(BuiltinID == Builtin::BIprintf || BuiltinID == Builtin::BImalloc)) +return 0; + This needs a test. Comment at: clang/lib/Frontend/CompilerInvocation.cpp:3102

[PATCH] D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

2020-05-11 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: llvm/include/llvm/ADT/Triple.h:700 return getArch() == Triple::r600 || getArch() == Triple::amdgcn; } yaxunl wrote: > jdoerfert wrote: > > What's the difference between an AMDGPU and AMDGCN? > AMDGPU inlude

[PATCH] D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

2020-05-11 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/Frontend/CompilerInvocation.cpp:3162 + Opts.OpenMPIsDevice && (T.isNVPTX() || T.isAMDGCN()) && Args.hasArg(options::OPT_fopenmp_cuda_force_full_runtime); jdoerfert wrote: > Can we please not

[PATCH] D79754: [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1

2020-05-11 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/AST/Decl.cpp:3224 + if (Context.getTargetInfo().getTriple().isAMDGCN() && + Context.BuiltinInfo.isPredefinedLibFunction(BuiltinID) && arsenm wrote: > This is also identical to the cuda handling above,

[PATCH] D78900: [HIP][AMDGPU] Enable structurizer workarounds

2020-05-10 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds marked an inline comment as done. sameerds added inline comments. Comment at: clang/lib/Driver/ToolChains/HIP.cpp:158 Args.MakeArgString(Twine("-filetype=") + (OutputIsAsm ? "asm" : "obj"))); + LlcArgs.push_back("--amdgpu-enable-structurizer-workarounds");

[PATCH] D78900: [HIP][AMDGPU] Enable structurizer workarounds

2020-04-27 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/Driver/ToolChains/HIP.cpp:158 Args.MakeArgString(Twine("-filetype=") + (OutputIsAsm ? "asm" : "obj"))); + LlcArgs.push_back("--amdgpu-enable-structurizer-workarounds"); arsenm wrote: > We should

[PATCH] D78900: [HIP][AMDGPU] Enable structurizer workarounds

2020-04-27 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds created this revision. Herald added subscribers: llvm-commits, cfe-commits, kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl, arsenm. Herald added projects: clang, LLVM. sameerds added reviewers: yaxunl, scchan, arsenm, rampitec, dstuttard. The

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-26 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rG06bdffb2bb45: [AMDGPU] Expose llvm fence instruction as clang intrinsic (authored by saiislam, committed by sameerds). Changed prior to commit: https://reviews.llvm.org/D75917?vs=260053=260207#toc

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-23 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds accepted this revision. sameerds added a comment. This revision is now accepted and ready to land. Thanks @saiislam ... this looks much better! Two nitpicks, that must be fixed. But it is okay if you directly submit after fixing them. 1. The change description should use "const char

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-22 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. Can you please submit a squashed single commit for review against the master branch? All the recent commits seem to be relative to previous commits, and I am having trouble looking at the change as a whole. The commit description needs some fixes: 1. Remove use of

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-06 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/test/CodeGenHIP/builtin_memory_fence.cpp:9 + // CHECK: fence syncscope("workgroup") seq_cst + __builtin_memory_fence(__ATOMIC_SEQ_CST, "workgroup"); + JonChesterfield wrote: > JonChesterfield wrote: > >

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-06 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds requested changes to this revision. sameerds added inline comments. This revision now requires changes to proceed. Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13651 + llvm::getConstantStringInfo(Scope, scp); + SSID =

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-06 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13651 + llvm::getConstantStringInfo(Scope, scp); + SSID = getLLVMContext().getOrInsertSyncScopeID(scp); + saiislam wrote: > sameerds wrote: > > This seems to be creating a new

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-04-05 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds requested changes to this revision. sameerds added inline comments. This revision now requires changes to proceed. Comment at: clang/include/clang/Basic/Builtins.def:1583 +// Second argument : target specific sync scope string +BUILTIN(__builtin_memory_fence, "vUicC*",

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-03-16 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D75917#1925700 , @JonChesterfield wrote: > > I think I follow. The syncscope takes a string, therefore the builtin that > maps onto fence should also take a string for that parameter? That's fine by > me. Will help if a

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-03-16 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D75917#1917296 , @JonChesterfield wrote: > In D75917#1916972 , @sameerds wrote: > > > Well, there is a problem: The LangRef says that scopes are target-defined. > > This change says

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-03-11 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D75917#1916664 , @JonChesterfield wrote: > In D75917#1916160 , @sameerds wrote: > > > how this builtin fits in with the overall scheme of language-specific and > > target-specific

[PATCH] D75917: Expose llvm fence instruction as clang intrinsic

2020-03-10 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. The commit summary needs improvement. The syntax is not really necessary there, but instead this needs a better explanation of how this builtin fits in with the overall scheme of language-specific and target-specific details of an atomic operation. For example, is

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2020-01-16 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes. Closed by commit rGed181efa175d: [HIP][AMDGPU] expand printf when compiling HIP to AMDGPU (authored by sameerds). Herald added a subscriber: kerbowa. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2020-01-14 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds marked 2 inline comments as done. sameerds added inline comments. Comment at: clang/test/CodeGenHIP/printf.cpp:18 +} + +// CHECK: [[BEGIN:%.*]] = call i64 @__ockl_printf_begin(i64 0) arsenm wrote: > sameerds wrote: > > sameerds wrote: > > > arsenm

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2020-01-14 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds updated this revision to Diff 237864. sameerds added a comment. - Added a test for address spaces - Removed an unnecessary addrspacecast in the strlen expansion - Updated the tests to pass -fcuda-is-device - printf is not a builtin for device code, so removed the code and tests related

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2020-01-09 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/test/CodeGenHIP/printf.cpp:18 +} + +// CHECK: [[BEGIN:%.*]] = call i64 @__ockl_printf_begin(i64 0) sameerds wrote: > arsenm wrote: > > This could use a lot more testcases. Can you add some half, float, and > >

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2020-01-07 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added a comment. In D71365#1807872 , @b-sumner wrote: > Should this be looking forward to also handling OpenCL, which does require > vector support? Sure. The implementation is general enough that we can point at the two places in

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2020-01-06 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds added inline comments. Comment at: clang/test/CodeGenHIP/printf.cpp:18 +} + +// CHECK: [[BEGIN:%.*]] = call i64 @__ockl_printf_begin(i64 0) arsenm wrote: > This could use a lot more testcases. Can you add some half, float, and double > as well as

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2020-01-06 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds updated this revision to Diff 236312. sameerds edited the summary of this revision. sameerds added a comment. Improved the test defined in clang/test/CodeGenHIP/printf.cpp Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71365/new/

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2019-12-11 Thread Sameer Sahasrabuddhe via Phabricator via cfe-commits
sameerds created this revision. Herald added subscribers: llvm-commits, cfe-commits, hiraditya, t-tye, tpr, dstuttard, yaxunl, mgorny, nhaehnle, wdng, jvesely, kzhuravl. Herald added projects: clang, LLVM. This change implements the expansion in two parts: - Add a utility function