[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-05-09 Thread Jeffrey Byrnes via cfe-commits
jrbyrnes wrote: > We should spent more energy making the scheduler sensible by default, instead > of creating all of this complexity. I would also prefer a more sensible default scheduler, but the driving usecase for this is global scheduling. The scheduler is doing inefficient things since

[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-05-01 Thread Jeffrey Byrnes via cfe-commits
@@ -437,16 +437,18 @@ void test_sched_group_barrier() } // CHECK-LABEL: @test_sched_group_barrier_rule -// CHECK: call void @llvm.amdgcn.sched.group.barrier.rule(i32 0, i32 1, i32 2, i32 0) -// CHECK: call void @llvm.amdgcn.sched.group.barrier.rule(i32 1, i32 2, i32 4, i32

[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-05-01 Thread Jeffrey Byrnes via cfe-commits
https://github.com/jrbyrnes updated https://github.com/llvm/llvm-project/pull/85304 >From 04dc59ff7757dea18e2202d1cbff1d675885fdae Mon Sep 17 00:00:00 2001 From: Jeffrey Byrnes Date: Tue, 12 Mar 2024 10:22:24 -0700 Subject: [PATCH 1/4] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to

[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-04-23 Thread Jeffrey Byrnes via cfe-commits
jrbyrnes wrote: Updated the PR as discussed offline. Support the variadic builtin arg via combining into mask for intrinsic. This sort of implies a limit of 64 rules, but we can workaround by add a new intrinsic with two masks (to support rules 65-128), and so on. For now, rules in this PR

[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-04-23 Thread Jeffrey Byrnes via cfe-commits
https://github.com/jrbyrnes ready_for_review https://github.com/llvm/llvm-project/pull/85304 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-04-23 Thread Jeffrey Byrnes via cfe-commits
https://github.com/jrbyrnes updated https://github.com/llvm/llvm-project/pull/85304 >From 04dc59ff7757dea18e2202d1cbff1d675885fdae Mon Sep 17 00:00:00 2001 From: Jeffrey Byrnes Date: Tue, 12 Mar 2024 10:22:24 -0700 Subject: [PATCH 1/2] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to

[clang] [llvm] [AMDGPU]: Add and codegen sched_group_barrier_inst (PR #78775)

2024-03-14 Thread Jeffrey Byrnes via cfe-commits
https://github.com/jrbyrnes closed https://github.com/llvm/llvm-project/pull/78775 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-03-14 Thread Jeffrey Byrnes via cfe-commits
jrbyrnes wrote: Supersedes https://github.com/llvm/llvm-project/pull/78775 https://github.com/llvm/llvm-project/pull/85304 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-03-14 Thread Jeffrey Byrnes via cfe-commits
https://github.com/jrbyrnes edited https://github.com/llvm/llvm-project/pull/85304 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AMDGPU] Extend __builtin_amdgcn_sched_group_barrier to support rules. (PR #85304)

2024-03-14 Thread Jeffrey Byrnes via cfe-commits
https://github.com/jrbyrnes created https://github.com/llvm/llvm-project/pull/85304 I am still working with the user to define the actual rules, so it is still a WIP. However, the current version contains the main machinery of the feature. This helps bridge the gap between

[clang] [llvm] [AMDGPU]: Add and codegen sched_group_barrier_inst (PR #78775)

2024-01-19 Thread Jeffrey Byrnes via cfe-commits
https://github.com/jrbyrnes created https://github.com/llvm/llvm-project/pull/78775 As stated, this simply adds and codegens the builtin/intrinsic. A subsequent patch will interface it with IGroupLP. The idea is to give the users more expression by allowing them to create schedgroups which

[libclc] [AMDGPU][MachineScheduler] Alternative way to control excess RP. (PR #68004)

2023-10-20 Thread Jeffrey Byrnes via cfe-commits
https://github.com/jrbyrnes commented: Just have a few questions about implementation details -- at a higher level, seems like we are trading one heuristic for another w.r.t flagging regions as ExcessRP -- so I'm curious about the relative performance.

[clang] [AMDGPU][MachineScheduler] Alternative way to control excess RP. (PR #68004)

2023-10-20 Thread Jeffrey Byrnes via cfe-commits
https://github.com/jrbyrnes commented: Just have a few questions about implementation details -- at a higher level, seems like we are trading one heuristic for another w.r.t flagging regions as ExcessRP -- so I'm curious about the relative performance.

[libclc] [AMDGPU][MachineScheduler] Alternative way to control excess RP. (PR #68004)

2023-10-20 Thread Jeffrey Byrnes via cfe-commits
@@ -894,10 +894,22 @@ void GCNSchedStage::setupNewBlock() { void GCNSchedStage::finalizeGCNRegion() { DAG.Regions[RegionIdx] = std::pair(DAG.RegionBegin, DAG.RegionEnd); - DAG.RescheduleRegions[RegionIdx] = false; jrbyrnes wrote: Why was this removed?

[libclc] [AMDGPU][MachineScheduler] Alternative way to control excess RP. (PR #68004)

2023-10-20 Thread Jeffrey Byrnes via cfe-commits
@@ -959,16 +970,6 @@ void GCNSchedStage::checkScheduling() { << DAG.MinOccupancy << ".\n"); } - unsigned MaxVGPRs = ST.getMaxNumVGPRs(MF); - unsigned MaxSGPRs = ST.getMaxNumSGPRs(MF); - if (PressureAfter.getVGPRNum(false) > MaxVGPRs || -

[clang-tools-extra] [AMDGPU][MachineScheduler] Alternative way to control excess RP. (PR #68004)

2023-10-20 Thread Jeffrey Byrnes via cfe-commits
@@ -1117,16 +1118,23 @@ bool OccInitialScheduleStage::shouldRevertScheduling(unsigned WavesAfter) { bool UnclusteredHighRPStage::shouldRevertScheduling(unsigned WavesAfter) { // If RP is not reduced in the unclustered reschedule stage, revert to the // old schedule. - if

[clang] [AMDGPU][MachineScheduler] Alternative way to control excess RP. (PR #68004)

2023-10-20 Thread Jeffrey Byrnes via cfe-commits
@@ -702,7 +702,7 @@ bool UnclusteredHighRPStage::initGCNSchedStage() { if (!GCNSchedStage::initGCNSchedStage()) return false; - if (DAG.RegionsWithHighRP.none() && DAG.RegionsWithExcessRP.none()) + if (DAG.RegionsWithExcessRP.none()) jrbyrnes wrote:

[clang-tools-extra] [AMDGPU][MachineScheduler] Alternative way to control excess RP. (PR #68004)

2023-10-20 Thread Jeffrey Byrnes via cfe-commits
https://github.com/jrbyrnes commented: Just have a few questions about implementation details -- at a higher level, seems like we are trading one heuristic for another w.r.t flagging regions as ExcessRP -- so I'm curious about the relative performance.

[clang] be8a65b - [HIP]: Add -fhip-emit-relocatable to override link job creation for -fno-gpu-rdc

2023-06-29 Thread Jeffrey Byrnes via cfe-commits
Author: Jeffrey Byrnes Date: 2023-06-29T08:18:28-07:00 New Revision: be8a65b598b3b80f73e862a01c7eaafe84d853a0 URL: https://github.com/llvm/llvm-project/commit/be8a65b598b3b80f73e862a01c7eaafe84d853a0 DIFF: