[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-17 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat closed 
https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-16 Thread Donát Nagy via cfe-commits

NagyDonat wrote:

> I'm somewhat unsatisfied, but it's not because of this PR. This is an 
> excellent way forward, I just wish we had a better abstraction.

I agree that the architecture that's being formalized by this commit is not 
perfect.

If I would be reimplementing a static analyzer from zero, I would probably 
separate:
- small "Reporter" classes which
  - have a user-facing name,
  - can be enabled/disabled by the user,
  - have `reportBug(...)` methods that produce a bug report if the Reporter is 
enabled;
  - implement unique logic (e.g. message formatting) that is only relevant for 
one sub-checker;
- big "Modeler" classes which
  - own one or several (or perhaps zero) Reporters,
  - are active if any of their Reporters is enabled (or if some other Modeler 
depends on them),
  - implement various callbacks (`check::XXX`, `eval::Call` etc.) and update 
the `State` and call `MyReporter::reportBug(...)` methods from these callbacks;
  - implement the shared logic of a group of related Reporters (i.e. what is 
currently represented by a single multipart checker class).

However, given the existing codebase I don't think that it's feasible to 
suddenly switch to an architecture like this. However, we could gradually 
approach this transition, e.g. if we gradually replace `CheckerNameRef` with a 
richer and more complex class that can become our `Reporter`.

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-16 Thread Balazs Benics via cfe-commits
=?utf-8?q?Don=C3=A1t?= Nagy ,
=?utf-8?q?Don=C3=A1t?= Nagy ,
=?utf-8?q?Don=C3=A1t?= Nagy ,
=?utf-8?q?Don=C3=A1t?= Nagy ,
=?utf-8?q?Don=C3=A1t?= Nagy ,
=?utf-8?q?Don=C3=A1t?= Nagy ,
=?utf-8?q?Don=C3=A1t?= Nagy ,
=?utf-8?q?Don=C3=A1t?= Nagy ,
=?utf-8?q?Don=C3=A1t?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal approved this pull request.

I'm somewhat unsatisfied, but it's not because of this PR.
This is an excellent way forward, I just wish we had a better abstraction.
Thanks for cleaning this up.

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-15 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-15 Thread Donát Nagy via cfe-commits


@@ -190,23 +203,38 @@ class CheckerManager {
   // Checker registration.
   
//======//
 
-  /// Used to register checkers.
-  /// All arguments are automatically passed through to the checker
-  /// constructor.
+  /// Construct the singleton instance of a checker, register it for the
+  /// supported callbacks and record its name with `registerCheckerPart()`.
+  /// Arguments passed to this function are forwarded to the constructor of the
+  /// checker.
+  ///
+  /// If `CHECKER` has multiple parts, then the constructor call and the
+  /// callback registration only happen within the first `registerChecker()`
+  /// call; while the subsequent calls only enable additional parts of the
+  /// existing checker object (while registering their names).
   ///
   /// \returns a pointer to the checker object.
-  template 
-  CHECKER *registerChecker(AT &&... Args) {
+  template 
+  CHECKER *registerChecker(AT &&...Args) {
+// This assert could be removed but then need to make sure that calls that
+// register different parts of the same checker pass the same arguments.
+static_assert(
+Idx == DefaultPart || !sizeof...(AT),
+"Argument forwarding isn't supported with multi-part checkers!");
+
 CheckerTag Tag = getTag();
-std::unique_ptr &Ref = CheckerTags[Tag];
-assert(!Ref && "Checker already registered, use getChecker!");
 
-std::unique_ptr Checker =
-std::make_unique(std::forward(Args)...);
-Checker->Name = CurrentCheckerName;
-CHECKER::_register(Checker.get(), *this);
-Ref = std::move(Checker);
-return static_cast(Ref.get());
+std::unique_ptr &Ref = CheckerTags[Tag];
+if (!Ref) {
+  std::unique_ptr Checker =
+  std::make_unique(std::forward(Args)...);
+  CHECKER::_register(Checker.get(), *this);
+  Ref = std::move(Checker);

NagyDonat wrote:

Oops, there is a compilation failure and I'm pretty sure that it's caused by 
this.

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-15 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-15 Thread Donát Nagy via cfe-commits

NagyDonat wrote:

The reason why I the support for multiple `RegisteredNames` instead of a single 
`Name` is directly introduced within `CheckerBase` is that this significantly 
simplifies the implementation. [1]

However if you wish so I could easily introduce a `static_assert` which ensures 
that multipart checkers must be declared as
```c++
class FancyChecker: public Checker {
  ...
}
```
or even as 
```c++
class FancyChecker: public Checker> {
  ...
}
```

[1]: Code that can handle an arbitrary number of names is trivially capable of 
handling a single name; so if every checker has an array of `RegisteredNames` 
under the hood (which may be an 1-element array), then the same code can handle 
both the simple and the complex checkers. On the other hand, if some checkers 
have a single `Name` while others have an array of multiple names, then the two 
cases must be handled separately by ugly `if (HasMultipleNames) {...} else 
{...}` conditionals.

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-15 Thread Donát Nagy via cfe-commits


@@ -35,9 +35,10 @@ class DivZeroChecker : public 
Checker> {
 public:
   /// This checker class implements several user facing checkers
   enum CheckKind { CK_DivideZero, CK_TaintedDivChecker, CK_NumCheckKinds };
-  bool ChecksEnabled[CK_NumCheckKinds] = {false};
-  CheckerNameRef CheckNames[CK_NumCheckKinds];
-  mutable std::unique_ptr BugTypes[CK_NumCheckKinds];
+  BugType BugTypes[CK_NumCheckKinds] = {
+  {this, CK_DivideZero, "Division by zero"},
+  {this, CK_TaintedDivChecker, "Division by zero",
+   categories::TaintedData}};

NagyDonat wrote:

OK, I will remove the prefix -- and standardize the use of the `Checker` prefix 
(I don't know whether we should use it for both constants or neither, but it 
should be the same in both cases).

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits


@@ -485,16 +485,60 @@ class Call {
 } // end eval namespace
 
 class CheckerBase : public ProgramPointTag {
-  CheckerNameRef Name;
+  /// A single checker class (i.e. a subclass of `CheckerBase`) can implement
+  /// multiple user-facing checkers that have separate names and can be enabled
+  /// separately, but are backed by the same singleton checker object.
+  SmallVector, 1> RegisteredNames;
+
   friend class ::clang::ento::CheckerManager;
 
 public:
-  StringRef getTagDescription() const override;
-  CheckerNameRef getName() const;
+  CheckerNameRef getName(CheckerPartIdx Idx = DefaultPart) const {
+assert(Idx < RegisteredNames.size() && "Checker part index is too large!");
+std::optional Name = RegisteredNames[Idx];
+assert(Name && "Requested checker part is not registered!");
+return *Name;
+  }
+
+  bool isPartEnabled(CheckerPartIdx Idx) const {
+return Idx < RegisteredNames.size() && RegisteredNames[Idx].has_value();
+  }
+
+  void registerCheckerPart(CheckerPartIdx Idx, CheckerNameRef Name) {
+// Paranoia: notice if e.g. UINT_MAX is passed as a checker part index.
+assert(Idx < 256 && "Checker part identifiers should be small integers.");
+
+if (Idx >= RegisteredNames.size())
+  RegisteredNames.resize(Idx + 1, std::nullopt);
+
+assert(!RegisteredNames[Idx] && "Repeated registration of checker a 
part!");
+RegisteredNames[Idx] = Name;
+  }
+
+  StringRef getTagDescription() const override {
+// This method inherited from `ProgramPointTag` has two unrelated roles:
+// (1) The analyzer option handling logic uses this method to query the
+// name of a checker.
+// (2) When the `ExplodedGraph` is dumped in DOT format for debugging,
+// this is called to attach a description to the nodes. (This happens
+// for all subclasses of `ProgramPointTag`, not just checkers.)
+// FIXME: Application (1) should be aware of multiple parts within the same
+// checker class instance, so it should directly use `getName` instead of
+// this inherited interface which cannot support a `CheckerPartIdx`.
+// FIXME: Ideally application (2) should return a string that describes the
+// whole checker class, not just one of it parts. However, this is only for
+// debugging, so returning the name of one part is probably good enough.

NagyDonat wrote:

I have concrete plans for cleaning up the issues for this method, but I would 
like to implement them as a separate follow-up commit when this PR is merged 
(or at least stable enough).

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits


@@ -35,9 +35,10 @@ class DivZeroChecker : public 
Checker> {
 public:
   /// This checker class implements several user facing checkers
   enum CheckKind { CK_DivideZero, CK_TaintedDivChecker, CK_NumCheckKinds };
-  bool ChecksEnabled[CK_NumCheckKinds] = {false};
-  CheckerNameRef CheckNames[CK_NumCheckKinds];
-  mutable std::unique_ptr BugTypes[CK_NumCheckKinds];
+  BugType BugTypes[CK_NumCheckKinds] = {
+  {this, CK_DivideZero, "Division by zero"},
+  {this, CK_TaintedDivChecker, "Division by zero",
+   categories::TaintedData}};

NagyDonat wrote:

I renamed the enum constants in 
https://github.com/llvm/llvm-project/pull/130985/commits/0caca9644e943ffe13dda5f4b9c670de30926d1f
 .

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat updated 
https://github.com/llvm/llvm-project/pull/130985

From 895b6947690ec51d8e8bccfa09420afae4449343 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Mon, 3 Mar 2025 15:33:44 +0100
Subject: [PATCH 01/10] [NFC][analyzer] Framework for multipart checkers

In the static analyzer codebase we have a traditional pattern where a
single checker class (and its singleton instance) acts as the
implementation of several (user-facing or modeling) checkers that have
shared state and logic, but have their own names and can be enabled or
disabled separately.

Currently these multipart checker classes all reimplement the same
boilerplate logic to store the enabled/disabled state, the name and the
bug types associated with the checker parts. This commit extends
`CheckerBase`, `BugType` and the checker registration process to offer
an easy-to-use alternative to that boilerplate (which includes the ugly
lazy initialization of `std::unique_ptr`s).

In this new framework the single-part checkers are internally
represented as "multipart checkers with just one part" (because this way
I don't need to reimplement the same logic twice) but this does not
require any changes in the code of simple single-part checkers.

I do not claim that these multi-part checkers are perfect from an
architectural point of view; but they won't suddenly disappear after
many years of existence, so we might as well introduce a clear framework
for them. (Switching to e.g. 1:1 correspondence between checker classes
and checker names would be a prohibitively complex change.)
---
 .../StaticAnalyzer/Core/BugReporter/BugType.h | 44 ++-
 .../clang/StaticAnalyzer/Core/Checker.h   | 39 --
 .../StaticAnalyzer/Core/CheckerManager.h  | 54 ++-
 clang/lib/StaticAnalyzer/Core/Checker.cpp |  5 +-
 4 files changed, 108 insertions(+), 34 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h 
b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
index 97cf7eda0ae2f..3a635e0d0125a 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
@@ -17,6 +17,7 @@
 #include "clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h"
 #include "clang/StaticAnalyzer/Core/Checker.h"
 #include 
+#include 
 
 namespace clang {
 
@@ -26,36 +27,53 @@ class BugReporter;
 
 class BugType {
 private:
-  const CheckerNameRef CheckerName;
+  struct CheckerPartRef {
+const CheckerBase *Checker;
+CheckerPartIdx Idx;
+  };
+  using CheckerNameInfo = std::variant;
+
+  const CheckerNameInfo CheckerName;
   const std::string Description;
   const std::string Category;
-  const CheckerBase *Checker;
   bool SuppressOnSink;
 
   virtual void anchor();
 
 public:
+  // Straightforward constructor where the checker name is specified directly.
+  // TODO: As far as I know all applications of this constructor involve ugly
+  // hacks that could be avoided by switching to a different constructor.
+  // When those are all eliminated, this constructor should be removed to
+  // eliminate the `variant` and simplify this class.
   BugType(CheckerNameRef CheckerName, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
   : CheckerName(CheckerName), Description(Desc), Category(Cat),
-Checker(nullptr), SuppressOnSink(SuppressOnSink) {}
+SuppressOnSink(SuppressOnSink) {}
+  // Constructor that can be called from the constructor of a checker object.
+  // At that point the checker name is not yet available, but we can save a
+  // pointer to the checker and later use that to query the name.
   BugType(const CheckerBase *Checker, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
-  : CheckerName(), Description(Desc), Category(Cat), Checker(Checker),
-SuppressOnSink(SuppressOnSink) {}
+  : CheckerName(CheckerPartRef{Checker, DefaultPart}), Description(Desc),
+Category(Cat), SuppressOnSink(SuppressOnSink) {}
+  // Constructor that can be called from the constructor of a checker object
+  // when it has multiple parts with separate names. We save the name and the
+  // part index to be able to query the name of that part later.
+  BugType(const CheckerBase *Checker, CheckerPartIdx Idx, StringRef Desc,
+  StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
+  : CheckerName(CheckerPartRef{Checker, Idx}), Description(Desc),
+Category(Cat), SuppressOnSink(SuppressOnSink) {}
   virtual ~BugType() = default;
 
   StringRef getDescription() const { return Description; }
   StringRef getCategory() const { return Category; }
   StringRef getCheckerName() const {
-// FIXME: This is a workaround to ensure that the correct checker name is
-// used. The checker names are set after the constructors are run.
-// 

[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits


@@ -190,23 +203,38 @@ class CheckerManager {
   // Checker registration.
   
//======//
 
-  /// Used to register checkers.
-  /// All arguments are automatically passed through to the checker
-  /// constructor.
+  /// Construct the singleton instance of a checker, register it for the
+  /// supported callbacks and record its name with `registerCheckerPart()`.
+  /// Arguments passed to this function are forwarded to the constructor of the
+  /// checker.
+  ///
+  /// If `CHECKER` has multiple parts, then the constructor call and the
+  /// callback registration only happen within the first `registerChecker()`
+  /// call; while the subsequent calls only enable additional parts of the
+  /// existing checker object (while registering their names).
   ///
   /// \returns a pointer to the checker object.
-  template 
-  CHECKER *registerChecker(AT &&... Args) {
+  template 
+  CHECKER *registerChecker(AT &&...Args) {
+// This assert could be removed but then need to make sure that calls that
+// register different parts of the same checker pass the same arguments.

NagyDonat wrote:

```suggestion
// This assert could be removed but then we need to make sure that calls
// registering different parts of the same checker pass the same arguments.
```

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat updated 
https://github.com/llvm/llvm-project/pull/130985

From 895b6947690ec51d8e8bccfa09420afae4449343 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Mon, 3 Mar 2025 15:33:44 +0100
Subject: [PATCH 1/9] [NFC][analyzer] Framework for multipart checkers

In the static analyzer codebase we have a traditional pattern where a
single checker class (and its singleton instance) acts as the
implementation of several (user-facing or modeling) checkers that have
shared state and logic, but have their own names and can be enabled or
disabled separately.

Currently these multipart checker classes all reimplement the same
boilerplate logic to store the enabled/disabled state, the name and the
bug types associated with the checker parts. This commit extends
`CheckerBase`, `BugType` and the checker registration process to offer
an easy-to-use alternative to that boilerplate (which includes the ugly
lazy initialization of `std::unique_ptr`s).

In this new framework the single-part checkers are internally
represented as "multipart checkers with just one part" (because this way
I don't need to reimplement the same logic twice) but this does not
require any changes in the code of simple single-part checkers.

I do not claim that these multi-part checkers are perfect from an
architectural point of view; but they won't suddenly disappear after
many years of existence, so we might as well introduce a clear framework
for them. (Switching to e.g. 1:1 correspondence between checker classes
and checker names would be a prohibitively complex change.)
---
 .../StaticAnalyzer/Core/BugReporter/BugType.h | 44 ++-
 .../clang/StaticAnalyzer/Core/Checker.h   | 39 --
 .../StaticAnalyzer/Core/CheckerManager.h  | 54 ++-
 clang/lib/StaticAnalyzer/Core/Checker.cpp |  5 +-
 4 files changed, 108 insertions(+), 34 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h 
b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
index 97cf7eda0ae2f..3a635e0d0125a 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
@@ -17,6 +17,7 @@
 #include "clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h"
 #include "clang/StaticAnalyzer/Core/Checker.h"
 #include 
+#include 
 
 namespace clang {
 
@@ -26,36 +27,53 @@ class BugReporter;
 
 class BugType {
 private:
-  const CheckerNameRef CheckerName;
+  struct CheckerPartRef {
+const CheckerBase *Checker;
+CheckerPartIdx Idx;
+  };
+  using CheckerNameInfo = std::variant;
+
+  const CheckerNameInfo CheckerName;
   const std::string Description;
   const std::string Category;
-  const CheckerBase *Checker;
   bool SuppressOnSink;
 
   virtual void anchor();
 
 public:
+  // Straightforward constructor where the checker name is specified directly.
+  // TODO: As far as I know all applications of this constructor involve ugly
+  // hacks that could be avoided by switching to a different constructor.
+  // When those are all eliminated, this constructor should be removed to
+  // eliminate the `variant` and simplify this class.
   BugType(CheckerNameRef CheckerName, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
   : CheckerName(CheckerName), Description(Desc), Category(Cat),
-Checker(nullptr), SuppressOnSink(SuppressOnSink) {}
+SuppressOnSink(SuppressOnSink) {}
+  // Constructor that can be called from the constructor of a checker object.
+  // At that point the checker name is not yet available, but we can save a
+  // pointer to the checker and later use that to query the name.
   BugType(const CheckerBase *Checker, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
-  : CheckerName(), Description(Desc), Category(Cat), Checker(Checker),
-SuppressOnSink(SuppressOnSink) {}
+  : CheckerName(CheckerPartRef{Checker, DefaultPart}), Description(Desc),
+Category(Cat), SuppressOnSink(SuppressOnSink) {}
+  // Constructor that can be called from the constructor of a checker object
+  // when it has multiple parts with separate names. We save the name and the
+  // part index to be able to query the name of that part later.
+  BugType(const CheckerBase *Checker, CheckerPartIdx Idx, StringRef Desc,
+  StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
+  : CheckerName(CheckerPartRef{Checker, Idx}), Description(Desc),
+Category(Cat), SuppressOnSink(SuppressOnSink) {}
   virtual ~BugType() = default;
 
   StringRef getDescription() const { return Description; }
   StringRef getCategory() const { return Category; }
   StringRef getCheckerName() const {
-// FIXME: This is a workaround to ensure that the correct checker name is
-// used. The checker names are set after the constructors are run.
-// In

[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits


@@ -116,6 +116,19 @@ class CheckerNameRef {
   operator StringRef() const { return Name; }
 };
 
+/// A single checker class (and its singleton instance) can act as the
+/// implementation of several (user-facing or modeling) checker parts that
+/// have shared state and logic, but have their own names and can be enabled or
+/// disabled separately.
+/// Each checker class that implement multiple parts introduces its own enum
+/// type to assign small numerical indices (0, 1, 2 ...) to their parts. The
+/// type alias 'CheckerPartIdx' is conceptually the union of these enum types.
+using CheckerPartIdx = unsigned;
+
+/// If a checker doesn't have multiple parts, then its single part is
+/// represented by this index.
+constexpr CheckerPartIdx DefaultPart = 0;

NagyDonat wrote:

Yes, I wasn't familiar enough with the rules for "`constexpr`" (now I looked it 
up: it does imply `inline` for functions and `static` data members, but doesn't 
imply it for other variables).

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat updated 
https://github.com/llvm/llvm-project/pull/130985

From 895b6947690ec51d8e8bccfa09420afae4449343 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Mon, 3 Mar 2025 15:33:44 +0100
Subject: [PATCH 1/7] [NFC][analyzer] Framework for multipart checkers

In the static analyzer codebase we have a traditional pattern where a
single checker class (and its singleton instance) acts as the
implementation of several (user-facing or modeling) checkers that have
shared state and logic, but have their own names and can be enabled or
disabled separately.

Currently these multipart checker classes all reimplement the same
boilerplate logic to store the enabled/disabled state, the name and the
bug types associated with the checker parts. This commit extends
`CheckerBase`, `BugType` and the checker registration process to offer
an easy-to-use alternative to that boilerplate (which includes the ugly
lazy initialization of `std::unique_ptr`s).

In this new framework the single-part checkers are internally
represented as "multipart checkers with just one part" (because this way
I don't need to reimplement the same logic twice) but this does not
require any changes in the code of simple single-part checkers.

I do not claim that these multi-part checkers are perfect from an
architectural point of view; but they won't suddenly disappear after
many years of existence, so we might as well introduce a clear framework
for them. (Switching to e.g. 1:1 correspondence between checker classes
and checker names would be a prohibitively complex change.)
---
 .../StaticAnalyzer/Core/BugReporter/BugType.h | 44 ++-
 .../clang/StaticAnalyzer/Core/Checker.h   | 39 --
 .../StaticAnalyzer/Core/CheckerManager.h  | 54 ++-
 clang/lib/StaticAnalyzer/Core/Checker.cpp |  5 +-
 4 files changed, 108 insertions(+), 34 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h 
b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
index 97cf7eda0ae2f..3a635e0d0125a 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
@@ -17,6 +17,7 @@
 #include "clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h"
 #include "clang/StaticAnalyzer/Core/Checker.h"
 #include 
+#include 
 
 namespace clang {
 
@@ -26,36 +27,53 @@ class BugReporter;
 
 class BugType {
 private:
-  const CheckerNameRef CheckerName;
+  struct CheckerPartRef {
+const CheckerBase *Checker;
+CheckerPartIdx Idx;
+  };
+  using CheckerNameInfo = std::variant;
+
+  const CheckerNameInfo CheckerName;
   const std::string Description;
   const std::string Category;
-  const CheckerBase *Checker;
   bool SuppressOnSink;
 
   virtual void anchor();
 
 public:
+  // Straightforward constructor where the checker name is specified directly.
+  // TODO: As far as I know all applications of this constructor involve ugly
+  // hacks that could be avoided by switching to a different constructor.
+  // When those are all eliminated, this constructor should be removed to
+  // eliminate the `variant` and simplify this class.
   BugType(CheckerNameRef CheckerName, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
   : CheckerName(CheckerName), Description(Desc), Category(Cat),
-Checker(nullptr), SuppressOnSink(SuppressOnSink) {}
+SuppressOnSink(SuppressOnSink) {}
+  // Constructor that can be called from the constructor of a checker object.
+  // At that point the checker name is not yet available, but we can save a
+  // pointer to the checker and later use that to query the name.
   BugType(const CheckerBase *Checker, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
-  : CheckerName(), Description(Desc), Category(Cat), Checker(Checker),
-SuppressOnSink(SuppressOnSink) {}
+  : CheckerName(CheckerPartRef{Checker, DefaultPart}), Description(Desc),
+Category(Cat), SuppressOnSink(SuppressOnSink) {}
+  // Constructor that can be called from the constructor of a checker object
+  // when it has multiple parts with separate names. We save the name and the
+  // part index to be able to query the name of that part later.
+  BugType(const CheckerBase *Checker, CheckerPartIdx Idx, StringRef Desc,
+  StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
+  : CheckerName(CheckerPartRef{Checker, Idx}), Description(Desc),
+Category(Cat), SuppressOnSink(SuppressOnSink) {}
   virtual ~BugType() = default;
 
   StringRef getDescription() const { return Description; }
   StringRef getCategory() const { return Category; }
   StringRef getCheckerName() const {
-// FIXME: This is a workaround to ensure that the correct checker name is
-// used. The checker names are set after the constructors are run.
-// In

[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits


@@ -190,23 +203,38 @@ class CheckerManager {
   // Checker registration.
   
//======//
 
-  /// Used to register checkers.
-  /// All arguments are automatically passed through to the checker
-  /// constructor.
+  /// Construct the singleton instance of a checker, register it for the
+  /// supported callbacks and record its name with `registerCheckerPart()`.
+  /// Arguments passed to this function are forwarded to the constructor of the
+  /// checker.
+  ///
+  /// If `CHECKER` has multiple parts, then the constructor call and the
+  /// callback registration only happen within the first `registerChecker()`
+  /// call; while the subsequent calls only enable additional parts of the
+  /// existing checker object (while registering their names).
   ///
   /// \returns a pointer to the checker object.
-  template 
-  CHECKER *registerChecker(AT &&... Args) {
+  template 
+  CHECKER *registerChecker(AT &&...Args) {
+// This assert could be removed but then need to make sure that calls that
+// register different parts of the same checker pass the same arguments.
+static_assert(
+Idx == DefaultPart || !sizeof...(AT),
+"Argument forwarding isn't supported with multi-part checkers!");
+
 CheckerTag Tag = getTag();
-std::unique_ptr &Ref = CheckerTags[Tag];
-assert(!Ref && "Checker already registered, use getChecker!");
 
-std::unique_ptr Checker =
-std::make_unique(std::forward(Args)...);
-Checker->Name = CurrentCheckerName;
-CHECKER::_register(Checker.get(), *this);
-Ref = std::move(Checker);
-return static_cast(Ref.get());
+std::unique_ptr &Ref = CheckerTags[Tag];
+if (!Ref) {
+  std::unique_ptr Checker =
+  std::make_unique(std::forward(Args)...);
+  CHECKER::_register(Checker.get(), *this);
+  Ref = std::move(Checker);

NagyDonat wrote:

Note that `CHECKER::template _register(Ref.get(), *this);` does not 
work either, but I don't know why.

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -35,9 +35,10 @@ class DivZeroChecker : public 
Checker> {
 public:
   /// This checker class implements several user facing checkers
   enum CheckKind { CK_DivideZero, CK_TaintedDivChecker, CK_NumCheckKinds };
-  bool ChecksEnabled[CK_NumCheckKinds] = {false};
-  CheckerNameRef CheckNames[CK_NumCheckKinds];
-  mutable std::unique_ptr BugTypes[CK_NumCheckKinds];
+  BugType BugTypes[CK_NumCheckKinds] = {
+  {this, CK_DivideZero, "Division by zero"},
+  {this, CK_TaintedDivChecker, "Division by zero",
+   categories::TaintedData}};

steakhal wrote:

If we are already here, can we drop the `CK_` prefixes? The llvm policy about 
these prefixes apply only if the names would be exposed, aka. in a header in a 
namespace.

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -190,23 +203,38 @@ class CheckerManager {
   // Checker registration.
   
//======//
 
-  /// Used to register checkers.
-  /// All arguments are automatically passed through to the checker
-  /// constructor.
+  /// Construct the singleton instance of a checker, register it for the
+  /// supported callbacks and record its name with `registerCheckerPart()`.
+  /// Arguments passed to this function are forwarded to the constructor of the
+  /// checker.
+  ///
+  /// If `CHECKER` has multiple parts, then the constructor call and the
+  /// callback registration only happen within the first `registerChecker()`
+  /// call; while the subsequent calls only enable additional parts of the
+  /// existing checker object (while registering their names).
   ///
   /// \returns a pointer to the checker object.
-  template 
-  CHECKER *registerChecker(AT &&... Args) {
+  template 
+  CHECKER *registerChecker(AT &&...Args) {
+// This assert could be removed but then need to make sure that calls that
+// register different parts of the same checker pass the same arguments.
+static_assert(
+Idx == DefaultPart || !sizeof...(AT),
+"Argument forwarding isn't supported with multi-part checkers!");
+
 CheckerTag Tag = getTag();
-std::unique_ptr &Ref = CheckerTags[Tag];
-assert(!Ref && "Checker already registered, use getChecker!");
 
-std::unique_ptr Checker =
-std::make_unique(std::forward(Args)...);
-Checker->Name = CurrentCheckerName;
-CHECKER::_register(Checker.get(), *this);
-Ref = std::move(Checker);
-return static_cast(Ref.get());
+std::unique_ptr &Ref = CheckerTags[Tag];
+if (!Ref) {
+  std::unique_ptr Checker =
+  std::make_unique(std::forward(Args)...);
+  CHECKER::_register(Checker.get(), *this);
+  Ref = std::move(Checker);

steakhal wrote:

Can we simplify this into:
```suggestion
  Ref = std::make_unique(std::forward(Args)...);
  CHECKER::_register(Ref.get(), *this);
```

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -190,23 +203,38 @@ class CheckerManager {
   // Checker registration.
   
//======//
 
-  /// Used to register checkers.
-  /// All arguments are automatically passed through to the checker
-  /// constructor.
+  /// Construct the singleton instance of a checker, register it for the
+  /// supported callbacks and record its name with `registerCheckerPart()`.
+  /// Arguments passed to this function are forwarded to the constructor of the
+  /// checker.
+  ///
+  /// If `CHECKER` has multiple parts, then the constructor call and the
+  /// callback registration only happen within the first `registerChecker()`
+  /// call; while the subsequent calls only enable additional parts of the
+  /// existing checker object (while registering their names).
   ///
   /// \returns a pointer to the checker object.
-  template 
-  CHECKER *registerChecker(AT &&... Args) {
+  template 
+  CHECKER *registerChecker(AT &&...Args) {
+// This assert could be removed but then need to make sure that calls that
+// register different parts of the same checker pass the same arguments.
+static_assert(
+Idx == DefaultPart || !sizeof...(AT),
+"Argument forwarding isn't supported with multi-part checkers!");
+
 CheckerTag Tag = getTag();
-std::unique_ptr &Ref = CheckerTags[Tag];
-assert(!Ref && "Checker already registered, use getChecker!");
 
-std::unique_ptr Checker =
-std::make_unique(std::forward(Args)...);
-Checker->Name = CurrentCheckerName;
-CHECKER::_register(Checker.get(), *this);
-Ref = std::move(Checker);
-return static_cast(Ref.get());
+std::unique_ptr &Ref = CheckerTags[Tag];
+if (!Ref) {
+  std::unique_ptr Checker =
+  std::make_unique(std::forward(Args)...);
+  CHECKER::_register(Checker.get(), *this);
+  Ref = std::move(Checker);
+}
+
+CHECKER *Result = static_cast(Ref.get());
+Result->registerCheckerPart(Idx, CurrentCheckerName);
+return Result;

steakhal wrote:

I couldn't see why we would need the extra variable here too. How about this?
```suggestion
Ref->registerCheckerPart(Idx, CurrentCheckerName);
return static_cast(Ref.get());
```

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -116,6 +116,19 @@ class CheckerNameRef {
   operator StringRef() const { return Name; }
 };
 
+/// A single checker class (and its singleton instance) can act as the
+/// implementation of several (user-facing or modeling) checker parts that
+/// have shared state and logic, but have their own names and can be enabled or
+/// disabled separately.
+/// Each checker class that implement multiple parts introduces its own enum
+/// type to assign small numerical indices (0, 1, 2 ...) to their parts. The
+/// type alias 'CheckerPartIdx' is conceptually the union of these enum types.
+using CheckerPartIdx = unsigned;
+
+/// If a checker doesn't have multiple parts, then its single part is
+/// represented by this index.
+constexpr CheckerPartIdx DefaultPart = 0;

steakhal wrote:

```suggestion
/// If a checker doesn't have multiple parts, then its single part is
/// represented by this index.
constexpr inline CheckerPartIdx DefaultPart = 0;
```

This is in a header, so shouldn't this have "inline" as well?

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-14 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -485,16 +485,60 @@ class Call {
 } // end eval namespace
 
 class CheckerBase : public ProgramPointTag {
-  CheckerNameRef Name;
+  /// A single checker class (i.e. a subclass of `CheckerBase`) can implement
+  /// multiple user-facing checkers that have separate names and can be enabled
+  /// separately, but are backed by the same singleton checker object.
+  SmallVector, 1> RegisteredNames;
+
   friend class ::clang::ento::CheckerManager;
 
 public:
-  StringRef getTagDescription() const override;
-  CheckerNameRef getName() const;
+  CheckerNameRef getName(CheckerPartIdx Idx = DefaultPart) const {
+assert(Idx < RegisteredNames.size() && "Checker part index is too large!");
+std::optional Name = RegisteredNames[Idx];
+assert(Name && "Requested checker part is not registered!");
+return *Name;
+  }
+
+  bool isPartEnabled(CheckerPartIdx Idx) const {
+return Idx < RegisteredNames.size() && RegisteredNames[Idx].has_value();
+  }
+
+  void registerCheckerPart(CheckerPartIdx Idx, CheckerNameRef Name) {
+// Paranoia: notice if e.g. UINT_MAX is passed as a checker part index.
+assert(Idx < 256 && "Checker part identifiers should be small integers.");
+
+if (Idx >= RegisteredNames.size())
+  RegisteredNames.resize(Idx + 1, std::nullopt);
+
+assert(!RegisteredNames[Idx] && "Repeated registration of checker a 
part!");
+RegisteredNames[Idx] = Name;
+  }
+
+  StringRef getTagDescription() const override {
+// This method inherited from `ProgramPointTag` has two unrelated roles:
+// (1) The analyzer option handling logic uses this method to query the
+// name of a checker.
+// (2) When the `ExplodedGraph` is dumped in DOT format for debugging,
+// this is called to attach a description to the nodes. (This happens
+// for all subclasses of `ProgramPointTag`, not just checkers.)
+// FIXME: Application (1) should be aware of multiple parts within the same
+// checker class instance, so it should directly use `getName` instead of
+// this inherited interface which cannot support a `CheckerPartIdx`.
+// FIXME: Ideally application (2) should return a string that describes the
+// whole checker class, not just one of it parts. However, this is only for
+// debugging, so returning the name of one part is probably good enough.
+for (const auto &OptName : RegisteredNames)
+  if (OptName)
+return *OptName;
+
+return "Unregistered checker";
+  }
 
-  /// See CheckerManager::runCheckersForPrintState.
+  /// Debug state dump callback, see CheckerManager::runCheckersForPrintState.
+  /// Default implementation does nothing.
   virtual void printState(raw_ostream &Out, ProgramStateRef State,
-  const char *NL, const char *Sep) const { }
+  const char *NL, const char *Sep) const;
 };

steakhal wrote:

Why did you change this hunk?

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-13 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-12 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Donát Nagy (NagyDonat)


Changes

In the static analyzer codebase we have a traditional pattern where a single 
checker class (and its singleton instance) acts as the implementation of 
several (user-facing or modeling) checkers that have shared state and logic, 
but have their own names and can be enabled or disabled separately. 
 
Currently these multipart checker classes all reimplement the same boilerplate 
logic to store the enabled/disabled state, the name and the bug types 
associated with the checker parts. This commit extends `CheckerBase`, `BugType` 
and the checker registration process to offer an easy-to-use alternative to 
that boilerplate (which includes the ugly lazy initialization of 
`std::unique_ptr`s).
 
In this new framework the single-part checkers are internally represented as 
"multipart checkers with just one part" (because this way I don't need to 
reimplement the same logic twice) but this does not require any changes in the 
code of simple single-part checkers.
 
I do not claim that these multi-part checkers are perfect from an architectural 
point of view; but they won't suddenly disappear after many years of existence, 
so we might as well introduce a clear framework for them. (Switching to e.g. 
1:1 correspondence between checker classes and checker names would be a 
prohibitively complex change.)

This PR ports `DivZeroChecker` to the new framework as a proof of concept. I'm 
planning to do a series of follow-up commits to port the rest of the multi-part 
checker.

---
Full diff: https://github.com/llvm/llvm-project/pull/130985.diff


5 Files Affected:

- (modified) clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h 
(+31-13) 
- (modified) clang/include/clang/StaticAnalyzer/Core/Checker.h (+34-5) 
- (modified) clang/include/clang/StaticAnalyzer/Core/CheckerManager.h (+41-13) 
- (modified) clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp (+14-30) 
- (modified) clang/lib/StaticAnalyzer/Core/Checker.cpp (+2-3) 


``diff
diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h 
b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
index 97cf7eda0ae2f..3a635e0d0125a 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
@@ -17,6 +17,7 @@
 #include "clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h"
 #include "clang/StaticAnalyzer/Core/Checker.h"
 #include 
+#include 
 
 namespace clang {
 
@@ -26,36 +27,53 @@ class BugReporter;
 
 class BugType {
 private:
-  const CheckerNameRef CheckerName;
+  struct CheckerPartRef {
+const CheckerBase *Checker;
+CheckerPartIdx Idx;
+  };
+  using CheckerNameInfo = std::variant;
+
+  const CheckerNameInfo CheckerName;
   const std::string Description;
   const std::string Category;
-  const CheckerBase *Checker;
   bool SuppressOnSink;
 
   virtual void anchor();
 
 public:
+  // Straightforward constructor where the checker name is specified directly.
+  // TODO: As far as I know all applications of this constructor involve ugly
+  // hacks that could be avoided by switching to a different constructor.
+  // When those are all eliminated, this constructor should be removed to
+  // eliminate the `variant` and simplify this class.
   BugType(CheckerNameRef CheckerName, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
   : CheckerName(CheckerName), Description(Desc), Category(Cat),
-Checker(nullptr), SuppressOnSink(SuppressOnSink) {}
+SuppressOnSink(SuppressOnSink) {}
+  // Constructor that can be called from the constructor of a checker object.
+  // At that point the checker name is not yet available, but we can save a
+  // pointer to the checker and later use that to query the name.
   BugType(const CheckerBase *Checker, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
-  : CheckerName(), Description(Desc), Category(Cat), Checker(Checker),
-SuppressOnSink(SuppressOnSink) {}
+  : CheckerName(CheckerPartRef{Checker, DefaultPart}), Description(Desc),
+Category(Cat), SuppressOnSink(SuppressOnSink) {}
+  // Constructor that can be called from the constructor of a checker object
+  // when it has multiple parts with separate names. We save the name and the
+  // part index to be able to query the name of that part later.
+  BugType(const CheckerBase *Checker, CheckerPartIdx Idx, StringRef Desc,
+  StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
+  : CheckerName(CheckerPartRef{Checker, Idx}), Description(Desc),
+Category(Cat), SuppressOnSink(SuppressOnSink) {}
   virtual ~BugType() = default;
 
   StringRef getDescription() const { return Description; }
   StringRef getCategory() const { return Category; }

[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-12 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat edited 
https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-12 Thread Donát Nagy via cfe-commits


@@ -485,16 +485,60 @@ class Call {
 } // end eval namespace
 
 class CheckerBase : public ProgramPointTag {
-  CheckerNameRef Name;
+  /// A single checker class (i.e. a subclass of `CheckerBase`) can implement
+  /// multiple user-facing checkers that have separate names and can be enabled
+  /// separately, but are backed by the same singleton checker object.
+  SmallVector, 1> RegisteredNames;
+
   friend class ::clang::ento::CheckerManager;
 
 public:
-  StringRef getTagDescription() const override;
-  CheckerNameRef getName() const;
+  CheckerNameRef getName(CheckerPartIdx Idx = DefaultPart) const {
+assert(Idx < RegisteredNames.size() && "Checker part index is too large!");
+std::optional Name = RegisteredNames[Idx];
+assert(Name && "Requested checker part is not registered!");
+return *Name;
+  }
+
+  bool isPartEnabled(CheckerPartIdx Idx) const {
+return Idx < RegisteredNames.size() && RegisteredNames[Idx].has_value();
+  }
+
+  void registerCheckerPart(CheckerPartIdx Idx, CheckerNameRef Name) {
+// Paranoia: notice if e.g. UINT_MAX is passed as a checker part index.
+assert(Idx < 256 && "Checker part identifiers should be small integers.");
+
+if (Idx >= RegisteredNames.size())
+  RegisteredNames.resize(Idx + 1, std::nullopt);
+
+assert(!RegisteredNames[Idx] && "Repeated registration of checker a 
part!");
+RegisteredNames[Idx] = Name;
+  }
+
+  StringRef getTagDescription() const override {
+// This method inherited from `ProgramPointTag` has two unrelated roles:
+// (1) The analyzer option handling logic uses this method to query the
+// name of a checker.
+// (2) When the `ExplodedGraph` is dumped in DOT format for debugging,
+// this is called to attach a description to the nodes. (This happens
+// for all subclasses of `ProgramPointTag`, not just checkers.)
+// FIXME: Application (1) should be aware of multiple parts within the same
+// checker class instance, so it should directly use `getName` instead of
+// this inherited interface which cannot support a `CheckerPartIdx`.
+// FIXME: Ideally application (2) should return a string that describes the
+// whole checker class, not just one of it parts. However, this is only for
+// debugging, so returning the name of one part is probably good enough.

NagyDonat wrote:

I will clean up this situation soon -- perhaps within this PR, perhaps in a 
follow-up commit that can be merged shortly after this one.

I tried to refactor the checker option handling code to use `getName` instead 
of this method, but I ran into awkward issues where a unittest introduces 
mocked checker classes and relies on the fact that this method is `virtual` 
(while `getName` is not virtual and has no valid reason to be virtual).

https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-12 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat updated 
https://github.com/llvm/llvm-project/pull/130985

From 895b6947690ec51d8e8bccfa09420afae4449343 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Mon, 3 Mar 2025 15:33:44 +0100
Subject: [PATCH 1/4] [NFC][analyzer] Framework for multipart checkers

In the static analyzer codebase we have a traditional pattern where a
single checker class (and its singleton instance) acts as the
implementation of several (user-facing or modeling) checkers that have
shared state and logic, but have their own names and can be enabled or
disabled separately.

Currently these multipart checker classes all reimplement the same
boilerplate logic to store the enabled/disabled state, the name and the
bug types associated with the checker parts. This commit extends
`CheckerBase`, `BugType` and the checker registration process to offer
an easy-to-use alternative to that boilerplate (which includes the ugly
lazy initialization of `std::unique_ptr`s).

In this new framework the single-part checkers are internally
represented as "multipart checkers with just one part" (because this way
I don't need to reimplement the same logic twice) but this does not
require any changes in the code of simple single-part checkers.

I do not claim that these multi-part checkers are perfect from an
architectural point of view; but they won't suddenly disappear after
many years of existence, so we might as well introduce a clear framework
for them. (Switching to e.g. 1:1 correspondence between checker classes
and checker names would be a prohibitively complex change.)
---
 .../StaticAnalyzer/Core/BugReporter/BugType.h | 44 ++-
 .../clang/StaticAnalyzer/Core/Checker.h   | 39 --
 .../StaticAnalyzer/Core/CheckerManager.h  | 54 ++-
 clang/lib/StaticAnalyzer/Core/Checker.cpp |  5 +-
 4 files changed, 108 insertions(+), 34 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h 
b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
index 97cf7eda0ae2f..3a635e0d0125a 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
@@ -17,6 +17,7 @@
 #include "clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h"
 #include "clang/StaticAnalyzer/Core/Checker.h"
 #include 
+#include 
 
 namespace clang {
 
@@ -26,36 +27,53 @@ class BugReporter;
 
 class BugType {
 private:
-  const CheckerNameRef CheckerName;
+  struct CheckerPartRef {
+const CheckerBase *Checker;
+CheckerPartIdx Idx;
+  };
+  using CheckerNameInfo = std::variant;
+
+  const CheckerNameInfo CheckerName;
   const std::string Description;
   const std::string Category;
-  const CheckerBase *Checker;
   bool SuppressOnSink;
 
   virtual void anchor();
 
 public:
+  // Straightforward constructor where the checker name is specified directly.
+  // TODO: As far as I know all applications of this constructor involve ugly
+  // hacks that could be avoided by switching to a different constructor.
+  // When those are all eliminated, this constructor should be removed to
+  // eliminate the `variant` and simplify this class.
   BugType(CheckerNameRef CheckerName, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
   : CheckerName(CheckerName), Description(Desc), Category(Cat),
-Checker(nullptr), SuppressOnSink(SuppressOnSink) {}
+SuppressOnSink(SuppressOnSink) {}
+  // Constructor that can be called from the constructor of a checker object.
+  // At that point the checker name is not yet available, but we can save a
+  // pointer to the checker and later use that to query the name.
   BugType(const CheckerBase *Checker, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
-  : CheckerName(), Description(Desc), Category(Cat), Checker(Checker),
-SuppressOnSink(SuppressOnSink) {}
+  : CheckerName(CheckerPartRef{Checker, DefaultPart}), Description(Desc),
+Category(Cat), SuppressOnSink(SuppressOnSink) {}
+  // Constructor that can be called from the constructor of a checker object
+  // when it has multiple parts with separate names. We save the name and the
+  // part index to be able to query the name of that part later.
+  BugType(const CheckerBase *Checker, CheckerPartIdx Idx, StringRef Desc,
+  StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
+  : CheckerName(CheckerPartRef{Checker, Idx}), Description(Desc),
+Category(Cat), SuppressOnSink(SuppressOnSink) {}
   virtual ~BugType() = default;
 
   StringRef getDescription() const { return Description; }
   StringRef getCategory() const { return Category; }
   StringRef getCheckerName() const {
-// FIXME: This is a workaround to ensure that the correct checker name is
-// used. The checker names are set after the constructors are run.
-// In

[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-12 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat updated 
https://github.com/llvm/llvm-project/pull/130985

From 895b6947690ec51d8e8bccfa09420afae4449343 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Mon, 3 Mar 2025 15:33:44 +0100
Subject: [PATCH 1/3] [NFC][analyzer] Framework for multipart checkers

In the static analyzer codebase we have a traditional pattern where a
single checker class (and its singleton instance) acts as the
implementation of several (user-facing or modeling) checkers that have
shared state and logic, but have their own names and can be enabled or
disabled separately.

Currently these multipart checker classes all reimplement the same
boilerplate logic to store the enabled/disabled state, the name and the
bug types associated with the checker parts. This commit extends
`CheckerBase`, `BugType` and the checker registration process to offer
an easy-to-use alternative to that boilerplate (which includes the ugly
lazy initialization of `std::unique_ptr`s).

In this new framework the single-part checkers are internally
represented as "multipart checkers with just one part" (because this way
I don't need to reimplement the same logic twice) but this does not
require any changes in the code of simple single-part checkers.

I do not claim that these multi-part checkers are perfect from an
architectural point of view; but they won't suddenly disappear after
many years of existence, so we might as well introduce a clear framework
for them. (Switching to e.g. 1:1 correspondence between checker classes
and checker names would be a prohibitively complex change.)
---
 .../StaticAnalyzer/Core/BugReporter/BugType.h | 44 ++-
 .../clang/StaticAnalyzer/Core/Checker.h   | 39 --
 .../StaticAnalyzer/Core/CheckerManager.h  | 54 ++-
 clang/lib/StaticAnalyzer/Core/Checker.cpp |  5 +-
 4 files changed, 108 insertions(+), 34 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h 
b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
index 97cf7eda0ae2f..3a635e0d0125a 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
@@ -17,6 +17,7 @@
 #include "clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h"
 #include "clang/StaticAnalyzer/Core/Checker.h"
 #include 
+#include 
 
 namespace clang {
 
@@ -26,36 +27,53 @@ class BugReporter;
 
 class BugType {
 private:
-  const CheckerNameRef CheckerName;
+  struct CheckerPartRef {
+const CheckerBase *Checker;
+CheckerPartIdx Idx;
+  };
+  using CheckerNameInfo = std::variant;
+
+  const CheckerNameInfo CheckerName;
   const std::string Description;
   const std::string Category;
-  const CheckerBase *Checker;
   bool SuppressOnSink;
 
   virtual void anchor();
 
 public:
+  // Straightforward constructor where the checker name is specified directly.
+  // TODO: As far as I know all applications of this constructor involve ugly
+  // hacks that could be avoided by switching to a different constructor.
+  // When those are all eliminated, this constructor should be removed to
+  // eliminate the `variant` and simplify this class.
   BugType(CheckerNameRef CheckerName, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
   : CheckerName(CheckerName), Description(Desc), Category(Cat),
-Checker(nullptr), SuppressOnSink(SuppressOnSink) {}
+SuppressOnSink(SuppressOnSink) {}
+  // Constructor that can be called from the constructor of a checker object.
+  // At that point the checker name is not yet available, but we can save a
+  // pointer to the checker and later use that to query the name.
   BugType(const CheckerBase *Checker, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
-  : CheckerName(), Description(Desc), Category(Cat), Checker(Checker),
-SuppressOnSink(SuppressOnSink) {}
+  : CheckerName(CheckerPartRef{Checker, DefaultPart}), Description(Desc),
+Category(Cat), SuppressOnSink(SuppressOnSink) {}
+  // Constructor that can be called from the constructor of a checker object
+  // when it has multiple parts with separate names. We save the name and the
+  // part index to be able to query the name of that part later.
+  BugType(const CheckerBase *Checker, CheckerPartIdx Idx, StringRef Desc,
+  StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
+  : CheckerName(CheckerPartRef{Checker, Idx}), Description(Desc),
+Category(Cat), SuppressOnSink(SuppressOnSink) {}
   virtual ~BugType() = default;
 
   StringRef getDescription() const { return Description; }
   StringRef getCategory() const { return Category; }
   StringRef getCheckerName() const {
-// FIXME: This is a workaround to ensure that the correct checker name is
-// used. The checker names are set after the constructors are run.
-// In

[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-12 Thread Donát Nagy via cfe-commits

https://github.com/NagyDonat created 
https://github.com/llvm/llvm-project/pull/130985

In the static analyzer codebase we have a traditional pattern where a single 
checker class (and its singleton instance) acts as the implementation of 
several (user-facing or modeling) checkers that have shared state and logic, 
but have their own names and can be enabled or disabled separately. 
 
Currently these multipart checker classes all reimplement the same boilerplate 
logic to store the enabled/disabled state, the name and the bug types 
associated with the checker parts. This commit extends `CheckerBase`, `BugType` 
and the checker registration process to offer an easy-to-use alternative to 
that boilerplate (which includes the ugly lazy initialization of 
`std::unique_ptr`s).
 
In this new framework the single-part checkers are internally represented as 
"multipart checkers with just one part" (because this way I don't need to 
reimplement the same logic twice) but this does not require any changes in the 
code of simple single-part checkers.
 
I do not claim that these multi-part checkers are perfect from an architectural 
point of view; but they won't suddenly disappear after many years of existence, 
so we might as well introduce a clear framework for them. (Switching to e.g. 
1:1 correspondence between checker classes and checker names would be a 
prohibitively complex change.)

This PR ports `DivZeroChecker` to the new framework as a proof of concept. I'm 
planning to do a series of follow-up commits to port the rest of the multi-part 
checker.

From 895b6947690ec51d8e8bccfa09420afae4449343 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= 
Date: Mon, 3 Mar 2025 15:33:44 +0100
Subject: [PATCH 1/2] [NFC][analyzer] Framework for multipart checkers

In the static analyzer codebase we have a traditional pattern where a
single checker class (and its singleton instance) acts as the
implementation of several (user-facing or modeling) checkers that have
shared state and logic, but have their own names and can be enabled or
disabled separately.

Currently these multipart checker classes all reimplement the same
boilerplate logic to store the enabled/disabled state, the name and the
bug types associated with the checker parts. This commit extends
`CheckerBase`, `BugType` and the checker registration process to offer
an easy-to-use alternative to that boilerplate (which includes the ugly
lazy initialization of `std::unique_ptr`s).

In this new framework the single-part checkers are internally
represented as "multipart checkers with just one part" (because this way
I don't need to reimplement the same logic twice) but this does not
require any changes in the code of simple single-part checkers.

I do not claim that these multi-part checkers are perfect from an
architectural point of view; but they won't suddenly disappear after
many years of existence, so we might as well introduce a clear framework
for them. (Switching to e.g. 1:1 correspondence between checker classes
and checker names would be a prohibitively complex change.)
---
 .../StaticAnalyzer/Core/BugReporter/BugType.h | 44 ++-
 .../clang/StaticAnalyzer/Core/Checker.h   | 39 --
 .../StaticAnalyzer/Core/CheckerManager.h  | 54 ++-
 clang/lib/StaticAnalyzer/Core/Checker.cpp |  5 +-
 4 files changed, 108 insertions(+), 34 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h 
b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
index 97cf7eda0ae2f..3a635e0d0125a 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h
@@ -17,6 +17,7 @@
 #include "clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h"
 #include "clang/StaticAnalyzer/Core/Checker.h"
 #include 
+#include 
 
 namespace clang {
 
@@ -26,36 +27,53 @@ class BugReporter;
 
 class BugType {
 private:
-  const CheckerNameRef CheckerName;
+  struct CheckerPartRef {
+const CheckerBase *Checker;
+CheckerPartIdx Idx;
+  };
+  using CheckerNameInfo = std::variant;
+
+  const CheckerNameInfo CheckerName;
   const std::string Description;
   const std::string Category;
-  const CheckerBase *Checker;
   bool SuppressOnSink;
 
   virtual void anchor();
 
 public:
+  // Straightforward constructor where the checker name is specified directly.
+  // TODO: As far as I know all applications of this constructor involve ugly
+  // hacks that could be avoided by switching to a different constructor.
+  // When those are all eliminated, this constructor should be removed to
+  // eliminate the `variant` and simplify this class.
   BugType(CheckerNameRef CheckerName, StringRef Desc,
   StringRef Cat = categories::LogicError, bool SuppressOnSink = false)
   : CheckerName(CheckerName), Description(Desc), Category(Cat),
-Checker(nullptr), SuppressOnSink(SuppressOnSink) {}
+SuppressOnSink(S

[clang] [NFC][analyzer] Framework for multipart checkers (PR #130985)

2025-03-12 Thread via cfe-commits
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff 72ee9b82151f17d6bd37abc501124d46c1cb63c3 
26ecff25d303f5581dc963e53a387274746a0a09 --extensions cpp,h -- 
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugType.h 
clang/include/clang/StaticAnalyzer/Core/Checker.h 
clang/include/clang/StaticAnalyzer/Core/CheckerManager.h 
clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp 
clang/lib/StaticAnalyzer/Core/Checker.cpp
``





View the diff from clang-format here.


``diff
diff --git a/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp 
b/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
index 4378d3b5aa..19de36800a 100644
--- a/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/DivZeroChecker.cpp
@@ -36,9 +36,9 @@ public:
   /// This checker class implements several user facing checkers
   enum CheckKind { CK_DivideZero, CK_TaintedDivChecker, CK_NumCheckKinds };
   BugType BugTypes[CK_NumCheckKinds] = {
-{this, CK_DivideZero, "Division by zero"},
-{this, CK_TaintedDivChecker, "Division by zero", categories::TaintedData}
-  };
+  {this, CK_DivideZero, "Division by zero"},
+  {this, CK_TaintedDivChecker, "Division by zero",
+   categories::TaintedData}};
 
   void checkPreStmt(const BinaryOperator *B, CheckerContext &C) const;
 };
@@ -127,9 +127,7 @@ void ento::registerDivZeroChecker(CheckerManager &Mgr) {
   Mgr.registerChecker();
 }
 
-bool ento::shouldRegisterDivZeroChecker(const CheckerManager &) {
-  return true;
-}
+bool ento::shouldRegisterDivZeroChecker(const CheckerManager &) { return true; 
}
 
 void ento::registerTaintedDivChecker(CheckerManager &Mgr) {
   Mgr.registerChecker();

``




https://github.com/llvm/llvm-project/pull/130985
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits