[clang] [analyzer] Never create Regions wrapping reference TypedValueRegions (NFCI) (PR #118096)

2024-11-29 Thread Balazs Benics via cfe-commits

https://github.com/steakhal closed 
https://github.com/llvm/llvm-project/pull/118096
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Never create Regions wrapping reference TypedValueRegions (NFCI) (PR #118096)

2024-11-29 Thread Donát Nagy via cfe-commits

NagyDonat wrote:

> The second is, how can we end up with such subregions. I think the only way 
> is by field regions. If we had something that I missed, the assert would 
> catch it.

Quick guesses: can we form a base or derived object subregion from them? Is 
there some hacky pathway where we create an element region to represent 
something that's not a real element access but something similar? 
(`ElementRegion` may mean 53 different things, I wouldn't be surprised if it 
can appear somehow...)

https://github.com/llvm/llvm-project/pull/118096
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Never create Regions wrapping reference TypedValueRegions (NFCI) (PR #118096)

2024-11-29 Thread Gábor Horváth via cfe-commits

Xazax-hun wrote:

Sorry, I might have been a bit unclear.

> We should never have reference typed value regions wrapped by subregions.

I think my question is what makes the "wrapped by subregions" part special? Why 
not just "never have reference typed value regions"?

https://github.com/llvm/llvm-project/pull/118096
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Never create Regions wrapping reference TypedValueRegions (NFCI) (PR #118096)

2024-11-29 Thread Balazs Benics via cfe-commits

steakhal wrote:

> The change itself looks good to me, but I wonder what is the reason that we 
> only want to do this canonicalization when we access fields?

It took me a while to unwrap your question. Actually, think of it as an 
invariant. We should never have reference typed value regions wrapped by 
subregions. This is checked/enforced by how we create subregions (see the 
asserts).

The second is, how can we end up with such subregions. I think the only way is 
by field regions. If we had something that I missed, the assert would catch it.

Do you have something in mind?

https://github.com/llvm/llvm-project/pull/118096
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Never create Regions wrapping reference TypedValueRegions (NFCI) (PR #118096)

2024-11-29 Thread Gábor Horváth via cfe-commits

https://github.com/Xazax-hun approved this pull request.

The change itself looks good to me, but I wonder what is the reason that we 
only want to do this canonicalization when we access fields?

https://github.com/llvm/llvm-project/pull/118096
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Never create Regions wrapping reference TypedValueRegions (NFCI) (PR #118096)

2024-11-29 Thread via cfe-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)


Changes

Like in the test case:
```c++
struct String {
  String(const String &) {}
};

struct MatchComponent {
  unsigned numbers[2];
  String prerelease;
  MatchComponent(MatchComponent const &) = default;
};

MatchComponent get();
void consume(MatchComponent const &);

MatchComponent parseMatchComponent() {
  MatchComponent component = get();
  component.numbers[0] = 10;
  component.numbers[1] = 20;
  return component; // We should have no stack addr escape warning here.
}

void top() {
  consume(parseMatchComponent());
}
```

When calling `consume(parseMatchComponent())` the `parseMatchComponent()` would 
return a copy of a temporary of `component`. That copy would invoke the 
`MatchComponent::MatchComponent(const MatchComponent &)` ctor.

That ctor would have a (reference typed) ParamVarRegion, holding the location 
(lvalue) of the object we are about to copy (&component). So far so good, 
but just before evaluating the binding operation for initializing the `numbers` 
field of the temporary, we evaluate the ArrayInitLoopExpr representing the 
by-value elementwise copy of the array `component.numbers`. This is represented 
by a LazyCompoundVal, because we (usually) don't just copy large arrays and 
bind many individual direct bindings. Rather, we take a snapshot by using a LCV.

However, notice that the LCV representing this copy would look like this:
  lazyCompoundVal{ParamVarRegion{"reference param of cctor"}.numbers}

Notice that it refers to the numbers field of a reference. It would be much 
better to desugar the reference to the actual object, thus it should be: 
`lazyCompoundVal{component.numbers}`

Actually, when binding the result of the ArrayInitLoopExpr to the 
`temp_object.numbers` in the compiler-generated member initializer of the 
cctor, we should have two individual direct bindings because this is a "small 
array":
```
  binding &Element{temp_object.numbers, 0} <- 
loadFrom(&Element{component.numbers, 0})
  binding &Element{temp_object.numbers, 1} <- 
loadFrom(&Element{component.numbers, 1})
```
Where `loadFrom(...)` would be:
```
  loadFrom(&Element{component.numbers, 0}): 10 U32b
  loadFrom(&Element{component.numbers, 1}): 20 U32b
```
So the store should look like this, after PostInitializer of 
`temp_object.numbers`:
```
  temp_object at offset  0: 10 U32b
  temp_object at offset 32: 20 U32b
```

The lesson is that it's okay to have TypedValueRegions of references as long as 
we don't form subregions of those. If we ever want to refer to a subregion of a 
"reference" we actually meant to "desugar" the reference and slice a subregion 
of the pointee of the reference instead.

Once this canonicalization is in place, we can also drop the special handling 
of references in `ProcessInitializer`, because now reference TypedValueRegions 
are eagerly desugared into their referee region when forming a subregion of it.

There should be no practical differences, but there are of course bugs that 
this patch may surface.

---
Full diff: https://github.com/llvm/llvm-project/pull/118096.diff


5 Files Affected:

- (modified) 
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h (+1) 
- (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+1-9) 
- (modified) clang/lib/StaticAnalyzer/Core/MemRegion.cpp (+8) 
- (modified) clang/lib/StaticAnalyzer/Core/ProgramState.cpp (+12) 
- (modified) clang/test/Analysis/initializer.cpp (+26) 


``diff
diff --git 
a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h 
b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
index eef7a54f03bf11..29f534eba2a265 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
@@ -487,6 +487,7 @@ class ProgramState : public llvm::FoldingSetNode {
   friend void ProgramStateRetain(const ProgramState *state);
   friend void ProgramStateRelease(const ProgramState *state);
 
+  SVal desugarReference(SVal Val) const;
   SVal wrapSymbolicRegion(SVal Base) const;
 };
 
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp 
b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 22eab9f66418d4..648c7daf823ed3 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -1206,15 +1206,7 @@ void ExprEngine::ProcessInitializer(const CFGInitializer 
CFGInit,
 while ((ASE = dyn_cast(Init)))
   Init = ASE->getBase()->IgnoreImplicit();
 
-SVal LValue = State->getSVal(Init, stackFrame);
-if (!Field->getType()->isReferenceType()) {
-  if (std::optional LValueLoc = LValue.getAs()) {
-InitVal = State->getSVal(*LValueLoc);
-  } else if (auto CV = LValue.getAs()) {
-// Initializer list for an array.
-InitVal = *CV;
-  }
-}
+InitVal =

[clang] [analyzer] Never create Regions wrapping reference TypedValueRegions (NFCI) (PR #118096)

2024-11-29 Thread Balazs Benics via cfe-commits

https://github.com/steakhal created 
https://github.com/llvm/llvm-project/pull/118096

Like in the test case:
```c++
struct String {
  String(const String &) {}
};

struct MatchComponent {
  unsigned numbers[2];
  String prerelease;
  MatchComponent(MatchComponent const &) = default;
};

MatchComponent get();
void consume(MatchComponent const &);

MatchComponent parseMatchComponent() {
  MatchComponent component = get();
  component.numbers[0] = 10;
  component.numbers[1] = 20;
  return component; // We should have no stack addr escape warning here.
}

void top() {
  consume(parseMatchComponent());
}
```

When calling `consume(parseMatchComponent())` the `parseMatchComponent()` would 
return a copy of a temporary of `component`. That copy would invoke the 
`MatchComponent::MatchComponent(const MatchComponent &)` ctor.

That ctor would have a (reference typed) ParamVarRegion, holding the location 
(lvalue) of the object we are about to copy (&component). So far so good, but 
just before evaluating the binding operation for initializing the `numbers` 
field of the temporary, we evaluate the ArrayInitLoopExpr representing the 
by-value elementwise copy of the array `component.numbers`. This is represented 
by a LazyCompoundVal, because we (usually) don't just copy large arrays and 
bind many individual direct bindings. Rather, we take a snapshot by using a LCV.

However, notice that the LCV representing this copy would look like this:
  lazyCompoundVal{ParamVarRegion{"reference param of cctor"}.numbers}

Notice that it refers to the numbers field of a reference. It would be much 
better to desugar the reference to the actual object, thus it should be: 
`lazyCompoundVal{component.numbers}`

Actually, when binding the result of the ArrayInitLoopExpr to the 
`temp_object.numbers` in the compiler-generated member initializer of the 
cctor, we should have two individual direct bindings because this is a "small 
array":
```
  binding &Element{temp_object.numbers, 0} <- 
loadFrom(&Element{component.numbers, 0})
  binding &Element{temp_object.numbers, 1} <- 
loadFrom(&Element{component.numbers, 1})
```
Where `loadFrom(...)` would be:
```
  loadFrom(&Element{component.numbers, 0}): 10 U32b
  loadFrom(&Element{component.numbers, 1}): 20 U32b
```
So the store should look like this, after PostInitializer of 
`temp_object.numbers`:
```
  temp_object at offset  0: 10 U32b
  temp_object at offset 32: 20 U32b
```

The lesson is that it's okay to have TypedValueRegions of references as long as 
we don't form subregions of those. If we ever want to refer to a subregion of a 
"reference" we actually meant to "desugar" the reference and slice a subregion 
of the pointee of the reference instead.

Once this canonicalization is in place, we can also drop the special handling 
of references in `ProcessInitializer`, because now reference TypedValueRegions 
are eagerly desugared into their referee region when forming a subregion of it.

There should be no practical differences, but there are of course bugs that 
this patch may surface.

>From 079fd758f8533cb66b026ba780d06d8a34f15b72 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Fri, 29 Nov 2024 13:57:49 +0100
Subject: [PATCH] [analyzer] Never create Regions wrapping reference
 TypedValueRegions (NFCI)

Like in the test case:
```c++
struct String {
  String(const String &) {}
};

struct MatchComponent {
  unsigned numbers[2];
  String prerelease;
  MatchComponent(MatchComponent const &) = default;
};

MatchComponent get();
void consume(MatchComponent const &);

MatchComponent parseMatchComponent() {
  MatchComponent component = get();
  component.numbers[0] = 10;
  component.numbers[1] = 20;
  return component; // We should have no stack addr escape warning here.
}

void top() {
  consume(parseMatchComponent());
}
```

When calling `consume(parseMatchComponent())` the `parseMatchComponent()`
would return a copy of a temporary of `component`. That copy would
invoke the `MatchComponent::MatchComponent(const MatchComponent &)` ctor.

That ctor would have a (reference typed) ParamVarRegion, holding the
location (lvalue) of the object we are about to copy (&component).
So far so good, but just before evaluating the binding operation for
initializing the `numbers` field of the temporary, we evaluate the
ArrayInitLoopExpr representing the by-value elementwise copy of the
array `component.numbers`. This is represented by a LazyCompoundVal,
because we (usually) don't just copy large arrays and bind many
individual direct bindings. Rather, we take a snapshot by using a LCV.

However, notice that the LCV representing this copy would look like this:
  lazyCompoundVal{ParamVarRegion{"reference param of cctor"}.numbers}

Notice that it refers to the numbers field of a reference.
It would be much better to desugar the reference to the actual object,
thus it should be: `lazyCompoundVal{component.numbers}`

Actually, when binding the result of the ArrayInitLoopExpr to the
`temp_object.numbers`