[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Vitaly Buka via cfe-commits


@@ -345,27 +345,25 @@ void llvm::filterDeadComdatFunctions(
 
 std::string llvm::getUniqueModuleId(Module *M) {
   MD5 Md5;
-  bool ExportsSymbols = false;
-  auto AddGlobal = [&](GlobalValue &GV) {
-if (GV.isDeclaration() || GV.getName().starts_with("llvm.") ||
-!GV.hasExternalLinkage() || GV.hasComdat())
-  return;
-ExportsSymbols = true;
-Md5.update(GV.getName());
-Md5.update(ArrayRef{0});
-  };
-
-  for (auto &F : *M)

vitalybuka wrote:

Thanks!

https://github.com/llvm/llvm-project/pull/135728
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Peter Collingbourne via cfe-commits


@@ -345,27 +345,25 @@ void llvm::filterDeadComdatFunctions(
 
 std::string llvm::getUniqueModuleId(Module *M) {
   MD5 Md5;
-  bool ExportsSymbols = false;
-  auto AddGlobal = [&](GlobalValue &GV) {
-if (GV.isDeclaration() || GV.getName().starts_with("llvm.") ||
-!GV.hasExternalLinkage() || GV.hasComdat())
-  return;
-ExportsSymbols = true;
-Md5.update(GV.getName());
-Md5.update(ArrayRef{0});
-  };
-
-  for (auto &F : *M)

pcc wrote:

It got moved into the loop over `global_values` below.

https://github.com/llvm/llvm-project/pull/135728
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Vitaly Buka via cfe-commits

vitalybuka wrote:

LGTM

https://github.com/llvm/llvm-project/pull/135728
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Vitaly Buka via cfe-commits


@@ -345,27 +345,25 @@ void llvm::filterDeadComdatFunctions(
 
 std::string llvm::getUniqueModuleId(Module *M) {
   MD5 Md5;
-  bool ExportsSymbols = false;
-  auto AddGlobal = [&](GlobalValue &GV) {
-if (GV.isDeclaration() || GV.getName().starts_with("llvm.") ||
-!GV.hasExternalLinkage() || GV.hasComdat())
-  return;
-ExportsSymbols = true;
-Md5.update(GV.getName());
-Md5.update(ArrayRef{0});
-  };
-
-  for (auto &F : *M)

vitalybuka wrote:

Why this part is gone even without UniqueSourceFileNames?

https://github.com/llvm/llvm-project/pull/135728
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Peter Collingbourne via cfe-commits

pcc wrote:

https://github.com/llvm/llvm-project/pull/135832


https://github.com/llvm/llvm-project/pull/135728
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Peter Collingbourne via cfe-commits

pcc wrote:

> Is the "name" in this context the whole path? Or just the filename? I know 
> many projects have files with the same name in different folders (including 
> LLVM itsef).

It is the whole path. Let me clarify this in the documentation.

https://github.com/llvm/llvm-project/pull/135728
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Eli Friedman via cfe-commits

https://github.com/efriedma-quic commented:

Is the "name" in this context the whole path?  Or just the filename?  I know 
many projects have files with the same name in different folders (including 
LLVM itsef).

https://github.com/llvm/llvm-project/pull/135728
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Peter Collingbourne via cfe-commits

https://github.com/pcc closed https://github.com/llvm/llvm-project/pull/135728
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Teresa Johnson via cfe-commits

https://github.com/teresajohnson approved this pull request.


https://github.com/llvm/llvm-project/pull/135728
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

> > I was mostly thinking that detecting and erroring at LTO time would result 
> > in a clearer error (like at the start of LTO::addModule, where it is 
> > looking for partially split LTO units, which we do by adding the module 
> > flag to the index flags).
> 
> I don't think it would be detected at that point. The duplicate symbol error 
> would be detected by the linker when it adds the bitcode file to its symbol 
> table and likely cause it to error out before calling LTO::addModule which is 
> only called once the resolutions are known. So if we wanted a better error 
> message it would need to be added to each linker.

I didn't mean to catch the duplicate symbol, but rather to error if the flag 
was given but we have multiple LTO modules with the same source name, in which 
case we can give a specific error that this option was used when the source 
names were not in fact unique.

I'll lgtm this patch, but something to consider as a follow on.


https://github.com/llvm/llvm-project/pull/135728
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Peter Collingbourne via cfe-commits

https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/135728

>From 4ddc4d6fcd938b66cce586c18a9e165c6d065121 Mon Sep 17 00:00:00 2001
From: Peter Collingbourne 
Date: Mon, 14 Apr 2025 19:04:00 -0700
Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?=
 =?UTF-8?q?itial=20version?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.6-beta.1
---
 clang/docs/ControlFlowIntegrity.rst   | 16 +-
 clang/docs/UsersManual.rst|  7 ++
 clang/include/clang/Basic/CodeGenOptions.def  |  2 ++
 clang/include/clang/Driver/Options.td |  7 ++
 clang/lib/CodeGen/CodeGenModule.cpp   |  4 
 clang/lib/Driver/ToolChains/Clang.cpp |  3 +++
 clang/test/CodeGen/unique-source-file-names.c |  2 ++
 clang/test/Driver/unique-source-file-names.c  |  5 +
 llvm/lib/Transforms/Utils/ModuleUtils.cpp | 22 +--
 .../unique-source-file-names.ll   | 22 +++
 10 files changed, 74 insertions(+), 16 deletions(-)
 create mode 100644 clang/test/CodeGen/unique-source-file-names.c
 create mode 100644 clang/test/Driver/unique-source-file-names.c
 create mode 100644 
llvm/test/Transforms/ThinLTOBitcodeWriter/unique-source-file-names.ll

diff --git a/clang/docs/ControlFlowIntegrity.rst 
b/clang/docs/ControlFlowIntegrity.rst
index a88271faf6b42..baff9ab54ff26 100644
--- a/clang/docs/ControlFlowIntegrity.rst
+++ b/clang/docs/ControlFlowIntegrity.rst
@@ -19,11 +19,12 @@ of undefined behavior that can potentially allow attackers 
to subvert the
 program's control flow. These schemes have been optimized for performance,
 allowing developers to enable them in release builds.
 
-To enable Clang's available CFI schemes, use the flag ``-fsanitize=cfi``.
-You can also enable a subset of available :ref:`schemes `.
-As currently implemented, all schemes rely on link-time optimization (LTO);
-so it is required to specify ``-flto``, and the linker used must support LTO,
-for example via the `gold plugin`_.
+To enable Clang's available CFI schemes, use the flag
+``-fsanitize=cfi``. You can also enable a subset of available
+:ref:`schemes `. As currently implemented, all schemes
+except for ``kcfi`` rely on link-time optimization (LTO); so it is
+required to specify ``-flto`` or ``-flto=thin``, and the linker used
+must support LTO, for example via the `gold plugin`_.
 
 To allow the checks to be implemented efficiently, the program must
 be structured such that certain object files are compiled with CFI
@@ -41,6 +42,11 @@ default visibility setting is ``-fvisibility=default``, 
which would disable
 CFI checks for classes without visibility attributes. Most users will want
 to specify ``-fvisibility=hidden``, which enables CFI checks for such classes.
 
+When using ``-fsanitize=cfi*`` with ``-flto=thin``, it is recommended
+to reduce link times by passing `-funique-source-file-names
+`_, provided
+that your program is compatible with it.
+
 Experimental support for :ref:`cross-DSO control flow integrity
 ` exists that does not require classes to have hidden LTO
 visibility. This cross-DSO support has unstable ABI at this time.
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 2a93c2552d7dc..d473503931144 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -2297,6 +2297,13 @@ are listed below.
pure ThinLTO, as all split regular LTO modules are merged and LTO linked
with regular LTO.
 
+.. option:: -f[no-]unique-source-file-names
+
+   When enabled, allows the compiler to assume that each object file
+   passed to the linker has been compiled using a unique source file
+   name. This is useful for reducing link times when doing ThinLTO
+   in combination with whole-program devirtualization or CFI.
+
 .. option:: -fforce-emit-vtables
 
In order to improve devirtualization, forces emitting of vtables even in
diff --git a/clang/include/clang/Basic/CodeGenOptions.def 
b/clang/include/clang/Basic/CodeGenOptions.def
index a436c0ec98d5b..c5990fb248689 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -278,6 +278,8 @@ CODEGENOPT(SanitizeCfiICallNormalizeIntegers, 1, 0) ///< 
Normalize integer types
 ///< CFI icall function 
signatures
 CODEGENOPT(SanitizeCfiCanonicalJumpTables, 1, 0) ///< Make jump table symbols 
canonical
  ///< instead of creating a 
local jump table.
+CODEGENOPT(UniqueSourceFileNames, 1, 0) ///< Allow the compiler to assume that 
TUs
+///< have unique source file names at 
link time
 CODEGENOPT(SanitizeKcfiArity, 1, 0) ///< Embed arity in KCFI patchable 
function prefix
 CODEGENOPT(SanitizeCoverageType, 2, 0) ///< Type of sanitizer coverage

[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Peter Collingbourne via cfe-commits

pcc wrote:

> Could that be addressed by only allowing this flag under LTO?

`-fsanitize=address -flto=thin` would have the same issue (if there is indeed 
an issue, which I think there probably isn't).

> Probably at least add a note to the new documentation for the option that it 
> will result in linker errors if used inappropriately.

Added.

> I was mostly thinking that detecting and erroring at LTO time would result in 
> a clearer error (like at the start of LTO::addModule, where it is looking for 
> partially split LTO units, which we do by adding the module flag to the index 
> flags).

I don't think it would be detected at that point. The duplicate symbol error 
would be detected by the linker when it adds the bitcode file to its symbol 
table and likely cause it to error out before calling LTO::addModule which is 
only called once the resolutions are known. So if we wanted a better error 
message it would need to be added to each linker.

https://github.com/llvm/llvm-project/pull/135728
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] Introduce -funique-source-file-names flag. (PR #135728)

2025-04-15 Thread Teresa Johnson via cfe-commits

teresajohnson wrote:

> > I assume this will give a duplicate symbol linker error if used 
> > inappropriately? Any chance of more subtle bugs?
> 
> As far as LTO is concerned I think it can only result in a duplicate symbol 
> error as the resulting symbols will have external strong linkage. There are 
> two non-LTO uses of `getModuleUniqueId` in the tree:
> 
> * The PowerPC AsmPrinter. It looks like this one is used to produce symbols 
> with external linkage, so the usage error would be detected at link time.
> * The ASan pass. It's unclear what would happen if used incorrectly (a 
> discarded section error I think?) but it doesn't look like we can do much 
> about it anyway because LTO isn't necessarily enabled with ASan.

Could that be addressed by only allowing this flag under LTO?

> 
> > Should there be some error detection at LTO link time? I.e. if the module 
> > flag is set on the LTO linked modules can and should we detect and error on 
> > duplicate source file names?
> 
> I don't think it is needed as the LTO-specific cases are already detectable.

Probably at least add a note to the new documentation for the option that it 
will result in linker errors if used inappropriately.

I was mostly thinking that detecting and erroring at LTO time would result in a 
clearer error (like at the start of LTO::addModule, where it is looking for 
partially split LTO units, which we do by adding the module flag to the index 
flags). E.g. I could see a case where someone adds this option to their 
makefiles, time passes and they forget about it, a new file is added that 
violates the constraints, and boom they get a linker error that they don't 
understand (and then the compiler gets an bug report).

https://github.com/llvm/llvm-project/pull/135728
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits