[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-06-14 Thread Fangrui Song via llvm-branch-commits


@@ -0,0 +1,92 @@
+//===- TargetImpl.h -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLD_ELF_ARCH_TARGETIMPL_H
+#define LLD_ELF_ARCH_TARGETIMPL_H
+
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "Relocations.h"
+#include "Symbols.h"
+#include "llvm/BinaryFormat/ELF.h"
+
+namespace lld {
+namespace elf {
+
+// getControlTransferAddend: If this relocation is used for control transfer
+// instructions (e.g. branch, branch-link or call) or code references (e.g.
+// virtual function pointers) and indicates an address-insignificant reference,
+// return the effective addend for the relocation, otherwise return
+// std::nullopt. The effective addend for a relocation is the addend that is
+// used to determine its branch destination.
+//
+// getBranchInfo: If a control transfer relocation referring to is+offset
+// directly transfers control to a relocated branch instruction in the 
specified
+// section, return the relocation for the branch target as well as its 
effective
+// addend (see above). Otherwise return {nullptr, 0}.
+//
+// mergeControlTransferRelocations: Given r1, a relocation for which
+// getControlTransferAddend() returned a value, and r2, a relocation returned 
by
+// getBranchInfo(), modify r1 so that it branches directly to the target of r2.
+template 
+inline void applyBranchToBranchOptImpl(
+Ctx &ctx, GetBranchInfo getBranchInfo,
+GetControlTransferAddend getControlTransferAddend,
+MergeControlTransferRelocations mergeControlTransferRelocations) {
+  // Needs to run serially because it writes to the relocations array as well 
as
+  // reading relocations of other sections.
+  for (ELFFileBase *f : ctx.objectFiles) {
+auto getRelocBranchInfo =
+[&getBranchInfo](Relocation &r,
+ uint64_t addend) -> std::pair 
{
+  auto *target = dyn_cast_or_null(r.sym);
+  // We don't allow preemptible symbols or ifuncs (may go somewhere else),
+  // absolute symbols (runtime behavior unknown), non-executable memory
+  // (ditto) or non-regular sections (no section data).
+  if (!target || target->isPreemptible || target->isGnuIFunc() ||

MaskRay wrote:

I agree that there should be a --emit-relocs test

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-06-14 Thread Fangrui Song via llvm-branch-commits


@@ -0,0 +1,94 @@
+//===- TargetImpl.h -*- C++ 
-*-===//

MaskRay wrote:

`//===--===//`
 for new file per https://llvm.org/docs/CodingStandards.html#file-headers

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-06-14 Thread Fangrui Song via llvm-branch-commits


@@ -0,0 +1,98 @@
+# REQUIRES: x86
+
+## Test that the branch-to-branch optimization follows the links
+## from f1 -> f2 -> f3 and updates all references to point to f3.
+ 
+# RUN: llvm-mc -filetype=obj -triple=x86_64-pc-linux %s -o %t.o
+# RUN: ld.lld %t.o -o %t --branch-to-branch
+# RUN: llvm-objdump -d -s %t | FileCheck --check-prefixes=CHECK,B2B %s
+# RUN: ld.lld %t.o -o %t -O2
+# RUN: llvm-objdump -d -s %t | FileCheck --check-prefixes=CHECK,B2B %s
+
+## Test that branch-to-branch is disabled by default.
+
+# RUN: ld.lld %t.o -o %t
+# RUN: llvm-objdump -d -s %t | FileCheck --check-prefixes=CHECK,NOB2B %s
+# RUN: ld.lld %t.o -o %t -O2 --no-branch-to-branch
+# RUN: llvm-objdump -d -s %t | FileCheck --check-prefixes=CHECK,NOB2B %s
+
+## Test that branch-to-branch is disabled for preemptible symbols.
+
+# RUN: ld.lld %t.o -o %t --branch-to-branch -shared
+# RUN: llvm-objdump -d -s %t | FileCheck --check-prefixes=CHECK,NOB2B %s
+
+.section .rodata.vtable,"a"
+.globl vtable
+vtable:
+# B2B: Contents of section .rodata:
+# B2B-NEXT: [[VF:[0-9a-f]{8}]]

MaskRay wrote:

We need to check the exact value, otherwise the PLT32 with an addend not -4 is 
not tested.

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-06-14 Thread Fangrui Song via llvm-branch-commits


@@ -0,0 +1,94 @@
+//===- TargetImpl.h -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLD_ELF_ARCH_TARGETIMPL_H
+#define LLD_ELF_ARCH_TARGETIMPL_H
+
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "Relocations.h"
+#include "Symbols.h"
+#include "llvm/BinaryFormat/ELF.h"
+
+namespace lld {
+namespace elf {
+
+// getControlTransferAddend: If this relocation is used for control transfer
+// instructions (e.g. branch, branch-link or call) or code references (e.g.
+// virtual function pointers) and indicates an address-insignificant reference,
+// return the effective addend for the relocation, otherwise return
+// std::nullopt. The effective addend for a relocation is the addend that is
+// used to determine its branch destination.
+//
+// getBranchInfoAtTarget: If a control transfer relocation referring to
+// is+offset directly transfers control to a relocated branch instruction in 
the
+// specified section, return the relocation for the branch target as well as 
its
+// effective addend (see above). Otherwise return {nullptr, 0}.
+//
+// redirectControlTransferRelocations: Given r1, a relocation for which
+// getControlTransferAddend() returned a value, and r2, a relocation returned 
by
+// getBranchInfo(), modify r1 so that it branches directly to the target of r2.
+template 
+inline void applyBranchToBranchOptImpl(
+Ctx &ctx, GetControlTransferAddend getControlTransferAddend,
+GetBranchInfoAtTarget getBranchInfoAtTarget,
+RedirectControlTransferRelocations redirectControlTransferRelocations) {
+  // Needs to run serially because it writes to the relocations array as well 
as
+  // reading relocations of other sections.
+  for (ELFFileBase *f : ctx.objectFiles) {
+auto getRelocBranchInfo =
+[&getBranchInfoAtTarget](
+Relocation &r,
+uint64_t addend) -> std::pair {
+  auto *target = dyn_cast_or_null(r.sym);
+  // We don't allow preemptible symbols or ifuncs (may go somewhere else),
+  // absolute symbols (runtime behavior unknown), non-executable or 
writable
+  // memory (ditto) or non-regular sections (no section data).
+  if (!target || target->isPreemptible || target->isGnuIFunc() ||
+  !target->section ||
+  !(target->section->flags & llvm::ELF::SHF_EXECINSTR) ||
+  (target->section->flags & llvm::ELF::SHF_WRITE) ||
+  target->section->kind() != SectionBase::Regular)
+return {nullptr, 0};
+  return getBranchInfoAtTarget(*cast(target->section),
+   target->value + addend);
+};
+for (InputSectionBase *s : f->getSections()) {
+  if (!s)
+continue;
+  for (Relocation &r : s->relocations) {
+if (std::optional addend =

MaskRay wrote:

can use early return for this if the next if to reduce indentation

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-06-14 Thread Fangrui Song via llvm-branch-commits


@@ -0,0 +1,94 @@
+//===- TargetImpl.h -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLD_ELF_ARCH_TARGETIMPL_H
+#define LLD_ELF_ARCH_TARGETIMPL_H
+
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "Relocations.h"
+#include "Symbols.h"
+#include "llvm/BinaryFormat/ELF.h"
+

MaskRay wrote:

namespace lld::elf

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-06-14 Thread Peter Collingbourne via llvm-branch-commits

pcc wrote:

Sorry, I was having a problem with spr with this change so I pushed the
updated change directly to the branch.

I think there are some merge conflicts by now in the include list but they
should be easy to resolve.
-- 
Peter

On Sat, Jun 14, 2025, 17:58 Fangrui Song ***@***.***> wrote:

> *MaskRay* left a comment (llvm/llvm-project#138366)
> 
>
> pcc  wants to merge 1 commit into
> users/pcc/spr/main.elf-add-branch-to-branch-optimization
> 
> from users/pcc/spr/elf-add-branch-to-branch-optimization
> 
>
> Is the base branch associated with an open PR?
>
> Neither
>
> curl -L https://github.com/llvm/llvm-project/pull/138366.diff | patch -p1
>
> spr patch 138366
>
> works.
>
> —
> Reply to this email directly, view it on GitHub
> ,
> or unsubscribe
> 
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>


https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-06-14 Thread Fangrui Song via llvm-branch-commits

MaskRay wrote:

Perhaps you'll need to change the base branch to `main` and force push to 
users/pcc/spr/elf-add-branch-to-branch-optimization?

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-06-14 Thread Fangrui Song via llvm-branch-commits


@@ -975,6 +977,62 @@ void AArch64::relocateAlloc(InputSectionBase &sec, uint8_t 
*buf) const {
   }
 }
 
+static std::optional getControlTransferAddend(InputSection &is,
+Relocation &r) {
+  // Identify a control transfer relocation for the branch-to-branch
+  // optimization. A "control transfer relocation" means a B or BL
+  // target but it also includes relative vtable relocations for example.
+  //
+  // We require the relocation type to be JUMP26, CALL26 or PLT32. With a
+  // relocation type of PLT32 the value may be assumed to be used for branching
+  // directly to the symbol and the addend is only used to produce the 
relocated
+  // value (hence the effective addend is always 0). This is because if a PLT 
is
+  // needed the addend will be added to the address of the PLT, and it doesn't
+  // make sense to branch into the middle of a PLT. For example, relative 
vtable
+  // relocations use PLT32 and 0 or a positive value as the addend but still 
are
+  // used to branch to the symbol.
+  //
+  // With JUMP26 or CALL26 the only reasonable interpretation of a non-zero
+  // addend is that we are branching to symbol+addend so that becomes the
+  // effective addend.
+  if (r.type == R_AARCH64_PLT32)
+return 0;
+  if (r.type == R_AARCH64_JUMP26 || r.type == R_AARCH64_CALL26)
+return r.addend;
+  return std::nullopt;
+}
+
+static std::pair getBranchInfo(InputSection &is,
+   uint64_t offset) {
+  auto *i = std::lower_bound(

MaskRay wrote:

Complex lower_bound/upper_bound can be simplified with `partition_point` 

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-06-14 Thread Fangrui Song via llvm-branch-commits

MaskRay wrote:

> [pcc](https://github.com/pcc) wants to merge 1 commit into 
> [users/pcc/spr/main.elf-add-branch-to-branch-optimization](https://github.com/llvm/llvm-project/tree/users/pcc/spr/main.elf-add-branch-to-branch-optimization)
>  from 
> [users/pcc/spr/elf-add-branch-to-branch-optimization](https://github.com/llvm/llvm-project/tree/users/pcc/spr/elf-add-branch-to-branch-optimization)

Is the base branch associated with an open PR? 

Neither
```
curl -L https://github.com/llvm/llvm-project/pull/138366.diff | patch -p1

spr patch 138366
```
works.

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-06-09 Thread Peter Collingbourne via llvm-branch-commits

pcc wrote:

@MaskRay ping.

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-27 Thread Peter Smith via llvm-branch-commits

smithp35 wrote:

Thanks for the updates. I don't have any more comments.

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-27 Thread Peter Smith via llvm-branch-commits


@@ -975,6 +977,62 @@ void AArch64::relocateAlloc(InputSectionBase &sec, uint8_t 
*buf) const {
   }
 }
 
+static std::optional getControlTransferAddend(InputSection &is,
+Relocation &r) {
+  // Identify a control transfer relocation for the branch-to-branch
+  // optimization. A "control transfer relocation" means a B or BL
+  // target but it also includes relative vtable relocations for example.
+  //
+  // We require the relocation type to be JUMP26, CALL26 or PLT32. With a
+  // relocation type of PLT32 the value may be assumed to be used for branching
+  // directly to the symbol and the addend is only used to produce the 
relocated
+  // value (hence the effective addend is always 0). This is because if a PLT 
is
+  // needed the addend will be added to the address of the PLT, and it doesn't
+  // make sense to branch into the middle of a PLT. For example, relative 
vtable
+  // relocations use PLT32 and 0 or a positive value as the addend but still 
are
+  // used to branch to the symbol.
+  //
+  // With JUMP26 or CALL26 the only reasonable interpretation of a non-zero
+  // addend is that we are branching to symbol+addend so that becomes the
+  // effective addend.
+  if (r.type == R_AARCH64_PLT32)
+return 0;
+  if (r.type == R_AARCH64_JUMP26 || r.type == R_AARCH64_CALL26)
+return r.addend;
+  return std::nullopt;
+}
+
+static std::pair getBranchInfo(InputSection &is,
+   uint64_t offset) {
+  auto *i = std::lower_bound(
+  is.relocations.begin(), is.relocations.end(), offset,
+  [](Relocation &r, uint64_t offset) { return r.offset < offset; });
+  if (i != is.relocations.end() && i->offset == offset &&
+  i->type == R_AARCH64_JUMP26) {
+return {i, i->addend};
+  }

smithp35 wrote:

Agree that BTI instructions should be in a separate patch. It would require 
disassembling to find one so may result in longer link times. Skipping over BTI 
with direct branches could apply even when the target wasn't another branch.

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-27 Thread Peter Smith via llvm-branch-commits


@@ -0,0 +1,92 @@
+//===- TargetImpl.h -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLD_ELF_ARCH_TARGETIMPL_H
+#define LLD_ELF_ARCH_TARGETIMPL_H
+
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "Relocations.h"
+#include "Symbols.h"
+#include "llvm/BinaryFormat/ELF.h"
+
+namespace lld {
+namespace elf {
+
+// getControlTransferAddend: If this relocation is used for control transfer
+// instructions (e.g. branch, branch-link or call) or code references (e.g.
+// virtual function pointers) and indicates an address-insignificant reference,
+// return the effective addend for the relocation, otherwise return
+// std::nullopt. The effective addend for a relocation is the addend that is
+// used to determine its branch destination.
+//
+// getBranchInfo: If a control transfer relocation referring to is+offset
+// directly transfers control to a relocated branch instruction in the 
specified
+// section, return the relocation for the branch target as well as its 
effective
+// addend (see above). Otherwise return {nullptr, 0}.
+//
+// mergeControlTransferRelocations: Given r1, a relocation for which
+// getControlTransferAddend() returned a value, and r2, a relocation returned 
by
+// getBranchInfo(), modify r1 so that it branches directly to the target of r2.
+template 
+inline void applyBranchToBranchOptImpl(
+Ctx &ctx, GetBranchInfo getBranchInfo,
+GetControlTransferAddend getControlTransferAddend,
+MergeControlTransferRelocations mergeControlTransferRelocations) {
+  // Needs to run serially because it writes to the relocations array as well 
as
+  // reading relocations of other sections.
+  for (ELFFileBase *f : ctx.objectFiles) {
+auto getRelocBranchInfo =
+[&getBranchInfo](Relocation &r,
+ uint64_t addend) -> std::pair 
{
+  auto *target = dyn_cast_or_null(r.sym);
+  // We don't allow preemptible symbols or ifuncs (may go somewhere else),
+  // absolute symbols (runtime behavior unknown), non-executable memory
+  // (ditto) or non-regular sections (no section data).
+  if (!target || target->isPreemptible || target->isGnuIFunc() ||

smithp35 wrote:

Yes, just checked and it does copy the relocation addend.

I agree that this wouldn't need a test case.

As an aside when checking where the addends were read in I ran into this bit of 
copyRelocations 
https://github.com/llvm/llvm-project/blob/main/lld/ELF/InputSection.cpp#L433 
```
  if (ctx.arg.relax && !ctx.arg.relocatable &&
  (ctx.arg.emachine == EM_RISCV || ctx.arg.emachine == EM_LOONGARCH)) {
// On LoongArch and RISC-V, relaxation might change relocations: copy
// from internal ones that are updated by relaxation.
InputSectionBase *sec = getRelocatedSection();
copyRelocations(
ctx, buf,
llvm::make_range(sec->relocations.begin(), sec->relocations.end()));
```

I think I mentioned in a previous comment that bolt uses emit-relocations so it 
may be worth following suite here when the transformation is applied.

I suspect that if bolt trusts the original relocation then in worst case the 
transformation is undone though. 

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-23 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/138366

>From d67e152baaf8487e5cb049166ce61e905011171e Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Wed, 30 Apr 2025 18:25:54 -0700
Subject: [PATCH] ELF: Add branch-to-branch optimization.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
N   Min   MaxMedian   AvgStddev
x 512 1.2264546 1.3481076 1.2970261 1.2965788   0.018620888
+ 512 1.2561196 1.3839965 1.3214632 1.3209327   0.019443971
Difference at 95.0% confidence
0.0243538 +/- 0.00233202
1.87831% +/- 0.179859%
(Student's t, pooled s = 0.0190369)
```

[1] 
https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Pull Request: https://github.com/llvm/llvm-project/pull/138366
---
 lld/ELF/Arch/AArch64.cpp| 59 +++
 lld/ELF/Arch/TargetImpl.h   | 94 
 lld/ELF/Arch/X86_64.cpp | 69 +
 lld/ELF/Config.h|  1 +
 lld/ELF/Driver.cpp  |  2 +
 lld/ELF/Options.td  |  4 +
 lld/ELF/Relocations.cpp |  8 +-
 lld/ELF/Target.h|  1 +
 lld/docs/ld.lld.1   |  8 +-
 lld/test/ELF/aarch64-branch-to-branch.s | 61 +++
 lld/test/ELF/x86-64-branch-to-branch.s  | 98 +
 11 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 lld/ELF/Arch/TargetImpl.h
 create mode 100644 lld/test/ELF/aarch64-branch-to-branch.s
 create mode 100644 lld/test/ELF/x86-64-branch-to-branch.s

diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp
index 9538dd4a70bae..5a3c86b209ea5 100644
--- a/lld/ELF/Arch/AArch64.cpp
+++ b/lld/ELF/Arch/AArch64.cpp
@@ -11,6 +11,7 @@
 #include "Symbols.h"
 #include "SyntheticSections.h"
 #include "Target.h"
+#include "TargetImpl.h"
 #include "lld/Common/ErrorHandler.h"
 #include "llvm/BinaryFormat/ELF.h"
 #include "llvm/Support/Endian.h"
@@ -83,6 +84,7 @@ class AArch64 : public TargetInfo {
 uint64_t val) const override;
   RelExpr adjustTlsExpr(RelType type, RelExpr expr) const override;
   void relocateAlloc(InputSectionBase &sec, uint8_t *buf) const override;
+  void applyBranchToBranchOpt() const override;
 
 private:
   void relaxTlsGdToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
@@ -975,6 +977,63 @@ void AArch64::relocateAlloc(InputSectionBase &sec, uint8_t 
*buf) const {
   }
 }
 
+static std::optional getControlTransferAddend(InputSection &is,
+Relocation &r) {
+  // Identify a control transfer relocation for the branch-to-branch
+  // optimization. A "control transfer relocation" means a B or BL
+  // target but it also includes relative vtable relocations for example.
+  //
+  // We require the relocation type to be JUMP26, CALL26 or PLT32. With a
+  // relocation type of PLT32 the value may be assumed to be used for branching
+  // directly to the symbol and the addend is only used to produce the 
relocated
+  // value (hence the effective addend is always 0). This is because if a PLT 
is
+  // needed the addend will be added to the address of the PLT, and it doesn't
+  // make sense to branch into the middle of a PLT. For example, relative 
vtable
+  // relocations use PLT32 and 0 or a positive value as the addend but still 
are
+  // used to branch to the symbol.
+  //
+  // With JUMP26 or CALL26 the only reasonable interpret

[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-23 Thread Peter Collingbourne via llvm-branch-commits

pcc wrote:

Right, this feature doesn't change section sizes, so there shouldn't be an 
interaction with SHT_LLVM_BB_ADDR_MAP. AFAICT LLD doesn't contain code that 
parses SHT_LLVM_BB_ADDR_MAP so I don't see value in adding a test for it.

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-23 Thread Rahman Lavaee via llvm-branch-commits

rlavaee wrote:

At my first glance, it seems this feature overwrites the target of the branch 
and doesn't move or relax the branches within the section, which would 
interfere with the SHT_LLVM_BB_ADDR_MAP. Could you please add a test just to be 
safe? You could use amend  a test with the SHT_LLVM_BB_ADDR_MAP section 
(https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/X86/basic-block-address-map.ll).


https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-22 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/138366

>From 03060849dc81f83ec48f05995ac8fd6df846c25b Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Fri, 2 May 2025 16:57:28 -0700
Subject: [PATCH 1/5] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?=
 =?UTF-8?q?itial=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.6-beta.1
---
 lld/ELF/Arch/AArch64.cpp| 58 +
 lld/ELF/Arch/TargetImpl.h   | 87 +
 lld/ELF/Arch/X86_64.cpp | 54 +++
 lld/ELF/Config.h|  1 +
 lld/ELF/Driver.cpp  |  2 +
 lld/ELF/Options.td  |  4 ++
 lld/ELF/Relocations.cpp |  8 ++-
 lld/ELF/Target.h|  1 +
 lld/docs/ld.lld.1   |  8 ++-
 lld/test/ELF/aarch64-branch-to-branch.s | 58 +
 lld/test/ELF/x86-64-branch-to-branch.s  | 58 +
 11 files changed, 335 insertions(+), 4 deletions(-)
 create mode 100644 lld/ELF/Arch/TargetImpl.h
 create mode 100644 lld/test/ELF/aarch64-branch-to-branch.s
 create mode 100644 lld/test/ELF/x86-64-branch-to-branch.s

diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp
index 9538dd4a70bae..f3a24bd8a9184 100644
--- a/lld/ELF/Arch/AArch64.cpp
+++ b/lld/ELF/Arch/AArch64.cpp
@@ -11,6 +11,7 @@
 #include "Symbols.h"
 #include "SyntheticSections.h"
 #include "Target.h"
+#include "TargetImpl.h"
 #include "lld/Common/ErrorHandler.h"
 #include "llvm/BinaryFormat/ELF.h"
 #include "llvm/Support/Endian.h"
@@ -83,6 +84,7 @@ class AArch64 : public TargetInfo {
 uint64_t val) const override;
   RelExpr adjustTlsExpr(RelType type, RelExpr expr) const override;
   void relocateAlloc(InputSectionBase &sec, uint8_t *buf) const override;
+  void applyBranchToBranchOpt() const override;
 
 private:
   void relaxTlsGdToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
@@ -975,6 +977,62 @@ void AArch64::relocateAlloc(InputSectionBase &sec, uint8_t 
*buf) const {
   }
 }
 
+static std::optional getControlTransferAddend(InputSection &is,
+Relocation &r) {
+  // Identify a control transfer relocation for the branch-to-branch
+  // optimization. A "control transfer relocation" means a B or BL
+  // target but it also includes relative vtable relocations for example.
+  //
+  // We require the relocation type to be JUMP26, CALL26 or PLT32. With a
+  // relocation type of PLT32 the value may be assumed to be used for branching
+  // directly to the symbol and the addend is only used to produce the 
relocated
+  // value (hence the effective addend is always 0). This is because if a PLT 
is
+  // needed the addend will be added to the address of the PLT, and it doesn't
+  // make sense to branch into the middle of a PLT. For example, relative 
vtable
+  // relocations use PLT32 and 0 or a positive value as the addend but still 
are
+  // used to branch to the symbol.
+  //
+  // With JUMP26 or CALL26 the only reasonable interpretation of a non-zero
+  // addend is that we are branching to symbol+addend so that becomes the
+  // effective addend.
+  if (r.type == R_AARCH64_PLT32)
+return 0;
+  if (r.type == R_AARCH64_JUMP26 || r.type == R_AARCH64_CALL26)
+return r.addend;
+  return std::nullopt;
+}
+
+static std::pair getBranchInfo(InputSection &is,
+   uint64_t offset) {
+  auto *i = std::lower_bound(
+  is.relocations.begin(), is.relocations.end(), offset,
+  [](Relocation &r, uint64_t offset) { return r.offset < offset; });
+  if (i != is.relocations.end() && i->offset == offset &&
+  i->type == R_AARCH64_JUMP26) {
+return {i, i->addend};
+  }
+  return {nullptr, 0};
+}
+
+static void mergeControlTransferRelocations(Relocation &r1,
+const Relocation &r2) {
+  r1.expr = r2.expr;
+  r1.sym = r2.sym;
+  // With PLT32 we must respect the original addend as that affects the value's
+  // interpretation. With the other relocation types the original addend is
+  // irrelevant because it referred to an offset within the original target
+  // section so we overwrite it.
+  if (r1.type == R_AARCH64_PLT32)
+r1.addend += r2.addend;
+  else
+r1.addend = r2.addend;
+}
+
+void AArch64::applyBranchToBranchOpt() const {
+  applyBranchToBranchOptImpl(ctx, getBranchInfo, getControlTransferAddend,
+ mergeControlTransferRelocations);
+}
+
 // AArch64 may use security features in variant PLT sequences. These are:
 // Pointer Authentication (PAC), introduced in armv8.3-a and Branch Target
 // Indicator (BTI) introduced in armv8.5-a. The additional instructions used
diff --git a/lld/ELF/Arch/TargetImpl.h b/lld/ELF/Arch/TargetImpl.h
new file mode 100644
index 000

[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-22 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/138366

>From e0581c892d07d8bb5518fa412b75b8830f5fb14a Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Wed, 30 Apr 2025 18:25:54 -0700
Subject: [PATCH] ELF: Add branch-to-branch optimization.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When code calls a function which then immediately tail calls another
function there is no need to go via the intermediate function. By
branching directly to the target function we reduce the program's working
set for a slight increase in runtime performance.

Normally it is relatively uncommon to have functions that just tail call
another function, but with LLVM control flow integrity we have jump tables
that replace the function itself as the canonical address. As a result,
when a function address is taken and called directly, for example after
a compiler optimization resolves the indirect call, or if code built
without control flow integrity calls the function, the call will go via
the jump table.

The impact of this optimization was measured using a large internal
Google benchmark. The results were as follows:

CFI enabled:  +0.1% ± 0.05% queries per second
CFI disabled: +0.01% queries per second [not statistically significant]

The optimization is enabled by default at -O2 but may also be enabled
or disabled individually with --{,no-}branch-to-branch.

This optimization is implemented for AArch64 and X86_64 only.

lld's runtime performance (real execution time) after adding this
optimization was measured using firefox-x64 from lld-speed-test [1]
with ldflags "-O2 -S" on an Apple M2 Ultra. The results are as follows:

```
N   Min   MaxMedian   AvgStddev
x 512 1.2264546 1.3481076 1.2970261 1.2965788   0.018620888
+ 512 1.2561196 1.3839965 1.3214632 1.3209327   0.019443971
Difference at 95.0% confidence
0.0243538 +/- 0.00233202
1.87831% +/- 0.179859%
(Student's t, pooled s = 0.0190369)
```

[1] 
https://discourse.llvm.org/t/improving-the-reproducibility-of-linker-benchmarking/86057

Pull Request: https://github.com/llvm/llvm-project/pull/138366
---
 lld/ELF/Arch/AArch64.cpp| 59 +++
 lld/ELF/Arch/TargetImpl.h   | 93 +++
 lld/ELF/Arch/X86_64.cpp | 69 +
 lld/ELF/Config.h|  1 +
 lld/ELF/Driver.cpp  |  2 +
 lld/ELF/Options.td  |  4 +
 lld/ELF/Relocations.cpp |  8 +-
 lld/ELF/Target.h|  1 +
 lld/docs/ld.lld.1   |  8 +-
 lld/test/ELF/aarch64-branch-to-branch.s | 61 +++
 lld/test/ELF/x86-64-branch-to-branch.s  | 98 +
 11 files changed, 400 insertions(+), 4 deletions(-)
 create mode 100644 lld/ELF/Arch/TargetImpl.h
 create mode 100644 lld/test/ELF/aarch64-branch-to-branch.s
 create mode 100644 lld/test/ELF/x86-64-branch-to-branch.s

diff --git a/lld/ELF/Arch/AArch64.cpp b/lld/ELF/Arch/AArch64.cpp
index 9538dd4a70bae..5a3c86b209ea5 100644
--- a/lld/ELF/Arch/AArch64.cpp
+++ b/lld/ELF/Arch/AArch64.cpp
@@ -11,6 +11,7 @@
 #include "Symbols.h"
 #include "SyntheticSections.h"
 #include "Target.h"
+#include "TargetImpl.h"
 #include "lld/Common/ErrorHandler.h"
 #include "llvm/BinaryFormat/ELF.h"
 #include "llvm/Support/Endian.h"
@@ -83,6 +84,7 @@ class AArch64 : public TargetInfo {
 uint64_t val) const override;
   RelExpr adjustTlsExpr(RelType type, RelExpr expr) const override;
   void relocateAlloc(InputSectionBase &sec, uint8_t *buf) const override;
+  void applyBranchToBranchOpt() const override;
 
 private:
   void relaxTlsGdToLe(uint8_t *loc, const Relocation &rel, uint64_t val) const;
@@ -975,6 +977,63 @@ void AArch64::relocateAlloc(InputSectionBase &sec, uint8_t 
*buf) const {
   }
 }
 
+static std::optional getControlTransferAddend(InputSection &is,
+Relocation &r) {
+  // Identify a control transfer relocation for the branch-to-branch
+  // optimization. A "control transfer relocation" means a B or BL
+  // target but it also includes relative vtable relocations for example.
+  //
+  // We require the relocation type to be JUMP26, CALL26 or PLT32. With a
+  // relocation type of PLT32 the value may be assumed to be used for branching
+  // directly to the symbol and the addend is only used to produce the 
relocated
+  // value (hence the effective addend is always 0). This is because if a PLT 
is
+  // needed the addend will be added to the address of the PLT, and it doesn't
+  // make sense to branch into the middle of a PLT. For example, relative 
vtable
+  // relocations use PLT32 and 0 or a positive value as the addend but still 
are
+  // used to branch to the symbol.
+  //
+  // With JUMP26 or CALL26 the only reasonable interpreta

[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-22 Thread Peter Collingbourne via llvm-branch-commits


@@ -0,0 +1,92 @@
+//===- TargetImpl.h -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLD_ELF_ARCH_TARGETIMPL_H
+#define LLD_ELF_ARCH_TARGETIMPL_H
+
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "Relocations.h"
+#include "Symbols.h"
+#include "llvm/BinaryFormat/ELF.h"
+
+namespace lld {
+namespace elf {
+
+// getControlTransferAddend: If this relocation is used for control transfer
+// instructions (e.g. branch, branch-link or call) or code references (e.g.
+// virtual function pointers) and indicates an address-insignificant reference,
+// return the effective addend for the relocation, otherwise return
+// std::nullopt. The effective addend for a relocation is the addend that is
+// used to determine its branch destination.
+//
+// getBranchInfo: If a control transfer relocation referring to is+offset

pcc wrote:

Done

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-22 Thread Peter Collingbourne via llvm-branch-commits


@@ -0,0 +1,92 @@
+//===- TargetImpl.h -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLD_ELF_ARCH_TARGETIMPL_H
+#define LLD_ELF_ARCH_TARGETIMPL_H
+
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "Relocations.h"
+#include "Symbols.h"
+#include "llvm/BinaryFormat/ELF.h"
+
+namespace lld {
+namespace elf {
+
+// getControlTransferAddend: If this relocation is used for control transfer
+// instructions (e.g. branch, branch-link or call) or code references (e.g.
+// virtual function pointers) and indicates an address-insignificant reference,
+// return the effective addend for the relocation, otherwise return
+// std::nullopt. The effective addend for a relocation is the addend that is
+// used to determine its branch destination.
+//
+// getBranchInfo: If a control transfer relocation referring to is+offset
+// directly transfers control to a relocated branch instruction in the 
specified
+// section, return the relocation for the branch target as well as its 
effective
+// addend (see above). Otherwise return {nullptr, 0}.
+//
+// mergeControlTransferRelocations: Given r1, a relocation for which
+// getControlTransferAddend() returned a value, and r2, a relocation returned 
by
+// getBranchInfo(), modify r1 so that it branches directly to the target of r2.
+template 
+inline void applyBranchToBranchOptImpl(
+Ctx &ctx, GetBranchInfo getBranchInfo,
+GetControlTransferAddend getControlTransferAddend,
+MergeControlTransferRelocations mergeControlTransferRelocations) {
+  // Needs to run serially because it writes to the relocations array as well 
as
+  // reading relocations of other sections.
+  for (ELFFileBase *f : ctx.objectFiles) {
+auto getRelocBranchInfo =
+[&getBranchInfo](Relocation &r,
+ uint64_t addend) -> std::pair 
{
+  auto *target = dyn_cast_or_null(r.sym);
+  // We don't allow preemptible symbols or ifuncs (may go somewhere else),
+  // absolute symbols (runtime behavior unknown), non-executable memory
+  // (ditto) or non-regular sections (no section data).
+  if (!target || target->isPreemptible || target->isGnuIFunc() ||

pcc wrote:

Shouldn't SHT_REL just work already because we read the implicit addend when 
producing the Relocation object?

I wanted to add a test case for this but it looks like llvm-mc doesn't have an 
option to write SHT_REL and instead SHT_REL is tested with yaml2obj hacks, e.g. 
`lld/test/ELF/aarch64-reloc-implicit-addend.test`. I think that test is already 
providing enough coverage of the SHT_REL path (otherwise we would need 
duplicate and difficult to maintain tests of every feature that processes 
Relocations).

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-22 Thread Peter Collingbourne via llvm-branch-commits


@@ -975,6 +977,62 @@ void AArch64::relocateAlloc(InputSectionBase &sec, uint8_t 
*buf) const {
   }
 }
 
+static std::optional getControlTransferAddend(InputSection &is,
+Relocation &r) {
+  // Identify a control transfer relocation for the branch-to-branch
+  // optimization. A "control transfer relocation" means a B or BL
+  // target but it also includes relative vtable relocations for example.
+  //
+  // We require the relocation type to be JUMP26, CALL26 or PLT32. With a
+  // relocation type of PLT32 the value may be assumed to be used for branching
+  // directly to the symbol and the addend is only used to produce the 
relocated
+  // value (hence the effective addend is always 0). This is because if a PLT 
is
+  // needed the addend will be added to the address of the PLT, and it doesn't
+  // make sense to branch into the middle of a PLT. For example, relative 
vtable
+  // relocations use PLT32 and 0 or a positive value as the addend but still 
are
+  // used to branch to the symbol.
+  //
+  // With JUMP26 or CALL26 the only reasonable interpretation of a non-zero
+  // addend is that we are branching to symbol+addend so that becomes the
+  // effective addend.
+  if (r.type == R_AARCH64_PLT32)
+return 0;
+  if (r.type == R_AARCH64_JUMP26 || r.type == R_AARCH64_CALL26)
+return r.addend;
+  return std::nullopt;
+}
+
+static std::pair getBranchInfo(InputSection &is,
+   uint64_t offset) {
+  auto *i = std::lower_bound(
+  is.relocations.begin(), is.relocations.end(), offset,
+  [](Relocation &r, uint64_t offset) { return r.offset < offset; });
+  if (i != is.relocations.end() && i->offset == offset &&
+  i->type == R_AARCH64_JUMP26) {
+return {i, i->addend};
+  }

pcc wrote:

Regarding BTI instructions, that should work, but let's do that in a followup.

In principle, a hot patch could overwrite an initial B instruction as well, so 
in general users desiring hot patch compatibility would need to disable this 
entirely by passing  `--no-branch-to-branch`. Since hot patching is uncommon I 
think we probably shouldn't accommodate hot patching by default. We generally 
expect the program not to write to read-only sections (e.g. ICF and string tail 
merging will merge read-only sections even though the sections/strings could be 
written to by bypassing page protections and affect all merged sections) and 
this optimization is consistent with that. I checked the linker flags used by 
the Linux kernel (which I know hot patches itself at startup) and it doesn't 
pass a `-O` flag so it won't be broken by this change.

While thinking about hot patching I realized that we should have a check that 
the target section is not writable, so I added that.

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-22 Thread Peter Collingbourne via llvm-branch-commits


@@ -0,0 +1,92 @@
+//===- TargetImpl.h -*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLD_ELF_ARCH_TARGETIMPL_H
+#define LLD_ELF_ARCH_TARGETIMPL_H
+
+#include "InputFiles.h"
+#include "InputSection.h"
+#include "Relocations.h"
+#include "Symbols.h"
+#include "llvm/BinaryFormat/ELF.h"
+
+namespace lld {
+namespace elf {
+
+// getControlTransferAddend: If this relocation is used for control transfer
+// instructions (e.g. branch, branch-link or call) or code references (e.g.
+// virtual function pointers) and indicates an address-insignificant reference,
+// return the effective addend for the relocation, otherwise return
+// std::nullopt. The effective addend for a relocation is the addend that is
+// used to determine its branch destination.
+//
+// getBranchInfo: If a control transfer relocation referring to is+offset
+// directly transfers control to a relocated branch instruction in the 
specified
+// section, return the relocation for the branch target as well as its 
effective
+// addend (see above). Otherwise return {nullptr, 0}.
+//
+// mergeControlTransferRelocations: Given r1, a relocation for which

pcc wrote:

Done

https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-22 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc edited https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] ELF: Add branch-to-branch optimization. (PR #138366)

2025-05-22 Thread Peter Collingbourne via llvm-branch-commits

https://github.com/pcc edited https://github.com/llvm/llvm-project/pull/138366
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits