[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-25 Thread Chuanqi Xu via cfe-commits


@@ -401,36 +432,85 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema 
&S, VarDecl *CoroPromise,
 return Calls;
   }
   Expr *CoroHandle = CoroHandleRes.get();
+  Calls.UseAwaitSuspendDestroy = false;
   CallExpr *AwaitSuspend = cast_or_null(
   BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
   if (!AwaitSuspend)
 return Calls;
+
+  // When this `await_suspend()` overload is annotated with
+  // `[[clang::coro_await_suspend_destroy]]`, do NOT call `await_suspend()` --
+  // instead call `await_suspend_destroy(Promise&)`.  This assumes that the
+  // `await_suspend()` is just a compatibility stub consisting of:
+  // await_suspend_destroy(handle.promise());
+  // handle.destroy();
+  // Users of the attribute must follow this contract.  Then, diagnostics from
+  // both `await_suspend` and `await_suspend_destroy` will get exposed.
+  CallExpr *PlainAwaitSuspend = nullptr;
+  if (FunctionDecl *AwaitSuspendCallee = AwaitSuspend->getDirectCallee()) {
+if (AwaitSuspendCallee->hasAttr()) {
+  Calls.UseAwaitSuspendDestroy = true;
+  ExprResult PromiseRefRes =
+  buildPromiseRef(S, CoroPromise->getType(), Loc);
+  if (PromiseRefRes.isInvalid()) {
+Calls.IsInvalid = true;
+return Calls;
+  }
+  Expr *PromiseRef = PromiseRefRes.get();
+  PlainAwaitSuspend = AwaitSuspend;
+  AwaitSuspend = cast_or_null(
+  BuildSubExpr(ACT::ACT_Suspend, "await_suspend_destroy", PromiseRef));
+  if (!AwaitSuspend)
+return Calls;
+}
+  }

ChuanqiXu9 wrote:

I think you get my points. 

To make this more clear: the reason why your patch can optimize some cases is, 
your patch skips `coro.await.suspend` intrinsics if the await_suspend function 
is marked as the attribute your proposed.

We can generalize this to a part:
- A semantical part. For example, the `coro_await_suspend_destroy` attribute 
and its semantics you proposed.
- Skip emitting `coro.await.suspend`. This is why we can optimize it. 

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-24 Thread via cfe-commits


@@ -401,36 +432,85 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema 
&S, VarDecl *CoroPromise,
 return Calls;
   }
   Expr *CoroHandle = CoroHandleRes.get();
+  Calls.UseAwaitSuspendDestroy = false;
   CallExpr *AwaitSuspend = cast_or_null(
   BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
   if (!AwaitSuspend)
 return Calls;
+
+  // When this `await_suspend()` overload is annotated with
+  // `[[clang::coro_await_suspend_destroy]]`, do NOT call `await_suspend()` --
+  // instead call `await_suspend_destroy(Promise&)`.  This assumes that the
+  // `await_suspend()` is just a compatibility stub consisting of:
+  // await_suspend_destroy(handle.promise());
+  // handle.destroy();
+  // Users of the attribute must follow this contract.  Then, diagnostics from
+  // both `await_suspend` and `await_suspend_destroy` will get exposed.
+  CallExpr *PlainAwaitSuspend = nullptr;
+  if (FunctionDecl *AwaitSuspendCallee = AwaitSuspend->getDirectCallee()) {
+if (AwaitSuspendCallee->hasAttr()) {
+  Calls.UseAwaitSuspendDestroy = true;
+  ExprResult PromiseRefRes =
+  buildPromiseRef(S, CoroPromise->getType(), Loc);
+  if (PromiseRefRes.isInvalid()) {
+Calls.IsInvalid = true;
+return Calls;
+  }
+  Expr *PromiseRef = PromiseRefRes.get();
+  PlainAwaitSuspend = AwaitSuspend;
+  AwaitSuspend = cast_or_null(
+  BuildSubExpr(ACT::ACT_Suspend, "await_suspend_destroy", PromiseRef));
+  if (!AwaitSuspend)
+return Calls;
+}
+  }

snarkmaster wrote:

> emit await_suspend directly, instead of via `coro.await_suspend` intrinsics

I agree it's a nice, clean idea. This is sort of what I tried in the very first 
version of this (https://github.com/llvm/llvm-project/pull/152029), which I 
quickly put into "draft" mode because I realized my implementation was wrong.

Here's my old thinking, which led to wrong / broken code:

 - `await_suspend()` must contain an `h.destroy()` call since we want 
portability to compilers without the attribute.

 - I wanted to make the `await_suspend` wrapper omit `coro.suspend` -- my goal 
here is to make it look like a non-suspending coroutine so it optimizes like a 
plain function.

 - The `h.destroy()` ultimately invokes a `coro.destroy` intrinsic. The 
implementation of that intrinsic does an indirect call through a `destroy` 
pointer that is written to the coro frame via one of the intrinsics that I 
elided. That frame setup would also write `resume` pointer which is required by 
`h.done()`.
 
 - So, now the `h.destroy()` call will fail with a null pointer dereference.

I tried a few permutations, but at the time I didn't understand the code well 
enough to try the right one, perhaps.

Are you saying that specifically **the following** version will both maintain 
the internal invariants, and optimize well?

- Keep `coro.save` and `coro.suspend`
- Replace `coro.awaits.suspend.XXX` by a direct `await_suspend`-wrapper call

I am happy to try this out, though I'll be pleasantly surprised all the 
indirection optimizes away and it emits good code.

@ChuanqiXu9, can you confirm I have the correct idea now? Are there any gotchas?

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-24 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> Hey folks, Thanks for the work.
> 
> Please note that adding a new attribute should go through an RFC. I know 
> @yuxuanchen1997 proposed something in the same design space a while ago - but 
> we never seem to have reached consensus 
> https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044
> 
> Given that there have been multiple attempts at similar direction, we should 
> make sure everyone interested in this work has a chance to see it! Thanks

Just to note that I don't feel the PR and @yuxuanchen1997 's PR are in the same 
design space. I think they are for different problems.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-24 Thread Chuanqi Xu via cfe-commits


@@ -401,36 +432,85 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema 
&S, VarDecl *CoroPromise,
 return Calls;
   }
   Expr *CoroHandle = CoroHandleRes.get();
+  Calls.UseAwaitSuspendDestroy = false;
   CallExpr *AwaitSuspend = cast_or_null(
   BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
   if (!AwaitSuspend)
 return Calls;
+
+  // When this `await_suspend()` overload is annotated with
+  // `[[clang::coro_await_suspend_destroy]]`, do NOT call `await_suspend()` --
+  // instead call `await_suspend_destroy(Promise&)`.  This assumes that the
+  // `await_suspend()` is just a compatibility stub consisting of:
+  // await_suspend_destroy(handle.promise());
+  // handle.destroy();
+  // Users of the attribute must follow this contract.  Then, diagnostics from
+  // both `await_suspend` and `await_suspend_destroy` will get exposed.
+  CallExpr *PlainAwaitSuspend = nullptr;
+  if (FunctionDecl *AwaitSuspendCallee = AwaitSuspend->getDirectCallee()) {
+if (AwaitSuspendCallee->hasAttr()) {
+  Calls.UseAwaitSuspendDestroy = true;
+  ExprResult PromiseRefRes =
+  buildPromiseRef(S, CoroPromise->getType(), Loc);
+  if (PromiseRefRes.isInvalid()) {
+Calls.IsInvalid = true;
+return Calls;
+  }
+  Expr *PromiseRef = PromiseRefRes.get();
+  PlainAwaitSuspend = AwaitSuspend;
+  AwaitSuspend = cast_or_null(
+  BuildSubExpr(ACT::ACT_Suspend, "await_suspend_destroy", PromiseRef));
+  if (!AwaitSuspend)
+return Calls;
+}
+  }

ChuanqiXu9 wrote:

On the one hand, I don't feel it bad. On the other hand, I don't want to touch 
overloading logics... 

I had an idea that, in CGCoroutine, when we find the await_suspend is marked 
with the attribute (maybe with another name), we can emit it directly instead 
via coro.await_suspend intrinsics. It should work. As it is just what you did 
right now. This is what we said as, attributes to not emitting 
coro.await_suspend intrinsics.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-23 Thread via cfe-commits

snarkmaster wrote:

Thanks! I'm happy to distill the conversation above into an RFC, but I first 
want to ask @ChuanqiXu9 if the `lookupReturnTypeIfAwaitSuspendDestroy()` 
implementation (and corresponding [discussion of overload 
resolution](https://github.com/llvm/llvm-project/pull/152623#discussion_r2288794474))
 is satisfactory.

The rationale for settling this is that we might as well propose the most 
implementation-friendly attribute placement.

@ChuanqiXu9, let me know your reaction to the linked comment. If you need to 
see the full PR first, LMK, and I'll spend the hour or two to clean it up.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-23 Thread Corentin Jabot via cfe-commits

cor3ntin wrote:

Hey folks,
Thanks for the work.

Please note that adding a new attribute should go through an RFC.
I know @yuxuanchen1997 proposed something in the same design space a while ago 
- but we never seem to have reached consensus 
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044
 

Given that there have been multiple attempts at similar direction, we should 
make sure everyone interested in this work has a chance to see it! Thanks 


https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-23 Thread Corentin Jabot via cfe-commits

https://github.com/cor3ntin requested changes to this pull request.

Let's make sure we don't merge that accidentally 

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-20 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-20 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-20 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-20 Thread via cfe-commits


@@ -401,36 +432,85 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema 
&S, VarDecl *CoroPromise,
 return Calls;
   }
   Expr *CoroHandle = CoroHandleRes.get();
+  Calls.UseAwaitSuspendDestroy = false;
   CallExpr *AwaitSuspend = cast_or_null(
   BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
   if (!AwaitSuspend)
 return Calls;
+
+  // When this `await_suspend()` overload is annotated with
+  // `[[clang::coro_await_suspend_destroy]]`, do NOT call `await_suspend()` --
+  // instead call `await_suspend_destroy(Promise&)`.  This assumes that the
+  // `await_suspend()` is just a compatibility stub consisting of:
+  // await_suspend_destroy(handle.promise());
+  // handle.destroy();
+  // Users of the attribute must follow this contract.  Then, diagnostics from
+  // both `await_suspend` and `await_suspend_destroy` will get exposed.
+  CallExpr *PlainAwaitSuspend = nullptr;
+  if (FunctionDecl *AwaitSuspendCallee = AwaitSuspend->getDirectCallee()) {
+if (AwaitSuspendCallee->hasAttr()) {
+  Calls.UseAwaitSuspendDestroy = true;
+  ExprResult PromiseRefRes =
+  buildPromiseRef(S, CoroPromise->getType(), Loc);
+  if (PromiseRefRes.isInvalid()) {
+Calls.IsInvalid = true;
+return Calls;
+  }
+  Expr *PromiseRef = PromiseRefRes.get();
+  PlainAwaitSuspend = AwaitSuspend;
+  AwaitSuspend = cast_or_null(
+  BuildSubExpr(ACT::ACT_Suspend, "await_suspend_destroy", PromiseRef));
+  if (!AwaitSuspend)
+return Calls;
+}
+  }

snarkmaster wrote:

> why do overload resolution

At a high level, because the behavior of the awaiter can depend on the 
coroutine type it's suspending. Very concretely, with `folly::result`, I would 
like `co_await or_unwind(someResultFn())` to unwrap the result , or propagate 
the error, in **both** `folly::result` coroutines (short-circuiting / 
non-suspending) and `folly::coro::Task` coroutines (asynchronous, 
sometimes-susspending).

So, the above `or_unwind` awaiter can only apply 
`[[clang::coro_await_suspend_destroy]]` to 
`await_suspend(std::coroutine_handle>)`, but not to 
the `Task` promise type.

In the current version of this PR, the main `lit` IR test does cover this 
promise-dependent overload behavior. So, you can look at that for minimal 
example code.

---

> good method to resolve overloads

I hacked something up using `LookupQualifiedName` and 
`OverloadCandidateSet::AddMethodCandidate`. 

Corner case: this calls `DeduceReturnType` when `await_suspend` both has the 
attribute &  returns `auto`. I have to run body analysis in order to issue the 
"return types don't match" diagnostic. If that's too expensive, then

1) It would not be a huge problem to drop the `DeduceReturnType` call and just 
make it an error, forcing the user to make the return type known from the 
declaration.

2) The "return types must match" diagnostic is not THAT high-value, so we could 
even drop the diagnostic, and let users who don't read the docs shoot 
themselves in the foot. 

Does the following lookup strategy seem like an improvement to you? If yes, 
I'll use it to only create the one call expression we actually need, and add 
some more tests for the diagnostic.

```cpp
/// Return the type of `await_suspend` if it has `CoroAwaitSuspendDestroyAttr`.
///
/// @returns A null type on failure to look up the `await_suspend` overload.
/// @returns A null type the `await_suspend` overload lacks the attribute.
/// @returns The return type of the selected `await_suspend` overload if it has
///  the attribute, which is also the expected return type of the
///  `await_suspend_destroy` method that will be called instead.
static QualType lookupReturnTypeIfAwaitSuspendDestroy(Sema &S, Expr *Operand,
 QualType ArgumentType,
 SourceLocation Loc) {
  QualType ObjectType = Operand->getType();

  LookupResult Found(S, S.PP.getIdentifierInfo("await_suspend"), Loc, 
Sema::LookupOrdinaryName);
  {
CXXRecordDecl *ObjectRD = ObjectType->getAsCXXRecordDecl();
if (!ObjectRD)
  return QualType();
S.LookupQualifiedName(Found, ObjectRD);
if (Found.empty())
  return QualType();
  }

  // Attempt overload resolution using a dummy argument
  OverloadCandidateSet Candidates(Loc, OverloadCandidateSet::CSK_Normal);
  OpaqueValueExpr HandleArg(Loc, ArgumentType, VK_PRValue);
  SmallVector Args = {&HandleArg};
  for (LookupResult::iterator I = Found.begin(), E = Found.end(); I != E; ++I) {
S.AddMethodCandidate(I.getPair(),
ObjectType,
Operand->Classify(S.Context),
Args, Candidates);
  }

  OverloadCandidateSet::iterator Best;
  if (Candidates.BestViableFunction(S, Loc, Best) == OR_Success) {
// Test the best candidate for the attribute
if (Best->Function && 
Best

[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread Chuanqi Xu via cfe-commits


@@ -401,36 +432,85 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema 
&S, VarDecl *CoroPromise,
 return Calls;
   }
   Expr *CoroHandle = CoroHandleRes.get();
+  Calls.UseAwaitSuspendDestroy = false;
   CallExpr *AwaitSuspend = cast_or_null(
   BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
   if (!AwaitSuspend)
 return Calls;
+
+  // When this `await_suspend()` overload is annotated with
+  // `[[clang::coro_await_suspend_destroy]]`, do NOT call `await_suspend()` --
+  // instead call `await_suspend_destroy(Promise&)`.  This assumes that the
+  // `await_suspend()` is just a compatibility stub consisting of:
+  // await_suspend_destroy(handle.promise());
+  // handle.destroy();
+  // Users of the attribute must follow this contract.  Then, diagnostics from
+  // both `await_suspend` and `await_suspend_destroy` will get exposed.
+  CallExpr *PlainAwaitSuspend = nullptr;
+  if (FunctionDecl *AwaitSuspendCallee = AwaitSuspend->getDirectCallee()) {
+if (AwaitSuspendCallee->hasAttr()) {
+  Calls.UseAwaitSuspendDestroy = true;
+  ExprResult PromiseRefRes =
+  buildPromiseRef(S, CoroPromise->getType(), Loc);
+  if (PromiseRefRes.isInvalid()) {
+Calls.IsInvalid = true;
+return Calls;
+  }
+  Expr *PromiseRef = PromiseRefRes.get();
+  PlainAwaitSuspend = AwaitSuspend;
+  AwaitSuspend = cast_or_null(
+  BuildSubExpr(ACT::ACT_Suspend, "await_suspend_destroy", PromiseRef));
+  if (!AwaitSuspend)
+return Calls;
+}
+  }

ChuanqiXu9 wrote:

I don't have a good method now. And I am confused why do you have to do 
overload resolution? Is it bad to lookup `await_suspend_destroy ` directly?

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread via cfe-commits


@@ -401,36 +432,85 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema 
&S, VarDecl *CoroPromise,
 return Calls;
   }
   Expr *CoroHandle = CoroHandleRes.get();
+  Calls.UseAwaitSuspendDestroy = false;
   CallExpr *AwaitSuspend = cast_or_null(
   BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
   if (!AwaitSuspend)
 return Calls;
+
+  // When this `await_suspend()` overload is annotated with
+  // `[[clang::coro_await_suspend_destroy]]`, do NOT call `await_suspend()` --
+  // instead call `await_suspend_destroy(Promise&)`.  This assumes that the
+  // `await_suspend()` is just a compatibility stub consisting of:
+  // await_suspend_destroy(handle.promise());
+  // handle.destroy();
+  // Users of the attribute must follow this contract.  Then, diagnostics from
+  // both `await_suspend` and `await_suspend_destroy` will get exposed.
+  CallExpr *PlainAwaitSuspend = nullptr;
+  if (FunctionDecl *AwaitSuspendCallee = AwaitSuspend->getDirectCallee()) {
+if (AwaitSuspendCallee->hasAttr()) {
+  Calls.UseAwaitSuspendDestroy = true;
+  ExprResult PromiseRefRes =
+  buildPromiseRef(S, CoroPromise->getType(), Loc);
+  if (PromiseRefRes.isInvalid()) {
+Calls.IsInvalid = true;
+return Calls;
+  }
+  Expr *PromiseRef = PromiseRefRes.get();
+  PlainAwaitSuspend = AwaitSuspend;
+  AwaitSuspend = cast_or_null(
+  BuildSubExpr(ACT::ACT_Suspend, "await_suspend_destroy", PromiseRef));
+  if (!AwaitSuspend)
+return Calls;
+}
+  }

snarkmaster wrote:

Do you have a pointer for the best way to do overload resolution so I can test 
for whether the attribute is present, without constructing the AST for the 
handle or for the call? 

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread Chuanqi Xu via cfe-commits


@@ -401,36 +432,85 @@ static ReadySuspendResumeResult buildCoawaitCalls(Sema 
&S, VarDecl *CoroPromise,
 return Calls;
   }
   Expr *CoroHandle = CoroHandleRes.get();
+  Calls.UseAwaitSuspendDestroy = false;
   CallExpr *AwaitSuspend = cast_or_null(
   BuildSubExpr(ACT::ACT_Suspend, "await_suspend", CoroHandle));
   if (!AwaitSuspend)
 return Calls;
+
+  // When this `await_suspend()` overload is annotated with
+  // `[[clang::coro_await_suspend_destroy]]`, do NOT call `await_suspend()` --
+  // instead call `await_suspend_destroy(Promise&)`.  This assumes that the
+  // `await_suspend()` is just a compatibility stub consisting of:
+  // await_suspend_destroy(handle.promise());
+  // handle.destroy();
+  // Users of the attribute must follow this contract.  Then, diagnostics from
+  // both `await_suspend` and `await_suspend_destroy` will get exposed.
+  CallExpr *PlainAwaitSuspend = nullptr;
+  if (FunctionDecl *AwaitSuspendCallee = AwaitSuspend->getDirectCallee()) {
+if (AwaitSuspendCallee->hasAttr()) {
+  Calls.UseAwaitSuspendDestroy = true;
+  ExprResult PromiseRefRes =
+  buildPromiseRef(S, CoroPromise->getType(), Loc);
+  if (PromiseRefRes.isInvalid()) {
+Calls.IsInvalid = true;
+return Calls;
+  }
+  Expr *PromiseRef = PromiseRefRes.get();
+  PlainAwaitSuspend = AwaitSuspend;
+  AwaitSuspend = cast_or_null(
+  BuildSubExpr(ACT::ACT_Suspend, "await_suspend_destroy", PromiseRef));
+  if (!AwaitSuspend)
+return Calls;
+}
+  }

ChuanqiXu9 wrote:

This is not good. We don't have GC or any standard mechanism to remove an AST 
node. I mean, it is not good to generate an AST Node (the await_suspend call) 
here first but then replace it with `await_suspend_destroy`. This is not 
optimal.

If a suspend expression, if you need to generate  a call to 
`await_suspend_destroy`, you shouldn't generate  a call to `await_suspend` 
first.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread Chuanqi Xu via cfe-commits


@@ -284,11 +284,41 @@ static ExprResult buildCoroutineHandle(Sema &S, QualType 
PromiseType,
   return S.BuildCallExpr(nullptr, FromAddr.get(), Loc, FramePtr, Loc);
 }
 
+// To support [[clang::coro_await_suspend_destroy]], this builds
+//   *static_cast(
+//   __builtin_coro_promise(handle, alignof(Promise), false))

ChuanqiXu9 wrote:

```suggestion

```

The comment is not helpful

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread Chuanqi Xu via cfe-commits


@@ -9363,6 +9363,126 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatFunction;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute applies to an
+``await_suspend(std::coroutine_handle)`` member function of a
+coroutine awaiter.  When applied, suspensions into the awaiter use an optimized
+call path that bypasses standard suspend intrinsics, and immediately destroys
+the suspending coro.
+
+Instead of calling the annotated ``await_suspend()``, the coroutine calls
+``await_suspend_destroy(Promise&)`` and immediately destroys the coroutine.
+
+Although it is not called, it is strongly recommended that `await_suspend()`
+contain the following portability stub.  The stub ensures the awaiter behaves
+equivalently without `coro_await_suspend_destroy` support, and makes the
+control flow clear to readers unfamiliar with the attribute:
+
+.. code-block:: c++
+
+  void await_suspend_destroy(Promise&) { /* actual implementation*/ }
+  [[clang::coro_await_suspend_destroy]]
+  void await_suspend(std::coroutine_handle handle) {
+// Stub to preserve behavior when the attribute is not supported
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }

ChuanqiXu9 wrote:

It is clear now but may be too verbose. But I get your point.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread via cfe-commits


@@ -9363,6 +9363,126 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatFunction;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute applies to an
+``await_suspend(std::coroutine_handle)`` member function of a
+coroutine awaiter.  When applied, suspensions into the awaiter use an optimized
+call path that bypasses standard suspend intrinsics, and immediately destroys
+the suspending coro.
+
+Instead of calling the annotated ``await_suspend()``, the coroutine calls
+``await_suspend_destroy(Promise&)`` and immediately destroys the coroutine.
+
+Although it is not called, it is strongly recommended that `await_suspend()`
+contain the following portability stub.  The stub ensures the awaiter behaves
+equivalently without `coro_await_suspend_destroy` support, and makes the
+control flow clear to readers unfamiliar with the attribute:
+
+.. code-block:: c++
+
+  void await_suspend_destroy(Promise&) { /* actual implementation*/ }
+  [[clang::coro_await_suspend_destroy]]
+  void await_suspend(std::coroutine_handle handle) {
+// Stub to preserve behavior when the attribute is not supported
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }

snarkmaster wrote:

Okay, sounds like I need to make it very explicit. Let me put my lawyer hat on! 
Let me know if the following is good enough, to replace the lines you 
highlighted?

---

Here is the formal contract for this attribute.

The attribute is considered *active* when **both** of these are true:
  - The compiler supports it -- i.e. the macro 
`__has_cpp_attribute(clang::coro_await_suspend_destroy)` expands to a nonzero 
integer.
  - The `await_suspend` overload applicable to the current coroutine's promise 
type is annotated with `[[clang::coro_await_suspend_destroy]]`.

If the attribute is **inactive**, then the compiler will follow the C++ 
standard suspension behavior. When `await_ready()` returns `false`:
  - First, the coroutine is suspended -- the compiler saves the coroutine state 
and creates a handle.
  - Then, `await_suspend` is invoked with the handle.
  - Note: With an inactive attribute, `await_suspend_destroy(Promise&)` may be 
defined, but is not part of the compiler's protocol. 

If the attribute is **active**, the compiler will follow this non-standard 
protocol whenever `await_ready()` returns `false`:
  - First, `await_suspend_destroy` is invoked with a mutable reference to 
awaiting coroutine's promise. 
  - Then, the coroutine is immediately destroyed, as if on `co_return ...;` but 
without invoking either `return_void()` or `return_value()`.
  - Notes:
  * The coroutine is **not** suspended, and a handle is **not** created.
  * The applicable `await_suspend` is **not** called. It must still be 
declared, since the compiler looks for the attribute on this special member, 
but a definition is optional. NB: Before providing a definition, read the note 
on portability below.

**Portability note:** It is strongly recommended to write your code in a way 
that does **not** rely on support for this attribute. Fortunately, the 
attribute's contract is designed so that portability does not require 
conditional compilation. 

Suppose you have the following standard `await_suspend`:

```cpp
void await_suspend(std::coroutine_handle& h) {
  record_suspension_via_promise(h.promise());
  h.destroy();
}
```

Without loss of portability, you can replace it by `await_suspend_destroy`, 
plus a fallback `await_suspend`. Depending on the compiler, either one may be 
the entry point, but the behavior will be the same -- except for the speed, 
size, and allocation-elision benefits of the attribute.

```cpp
// Entry point when `clang::coro_await_suspend_destroy` is supported
void await_suspend_destroy(MyPromise& p) {
  record_suspension_via_promise(p);
}
// Entry point when `clang::coro_await_suspend_destroy` is not supported.
// Emits no code when `clang::coro_await_suspend_destroy` is supported.
void await_suspend(std::coroutine_handle& h) {
  await_suspend_destroy(h.promise());
  h.destroy();
}
```

The "standard" and "replacement" forms are equivalent because the fallback 
`await_suspend` replicates the attribute's contract, when the attribute is not 
supported by the compiler.

**Warning:** Even if you only use Clang, do not neglect to add the portability 
stub -- LLVM reserves the right to remove support for this attribute in a later 
major release.  


https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread Chuanqi Xu via cfe-commits


@@ -9363,6 +9363,126 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatFunction;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute applies to an
+``await_suspend(std::coroutine_handle)`` member function of a
+coroutine awaiter.  When applied, suspensions into the awaiter use an optimized
+call path that bypasses standard suspend intrinsics, and immediately destroys
+the suspending coro.
+
+Instead of calling the annotated ``await_suspend()``, the coroutine calls
+``await_suspend_destroy(Promise&)`` and immediately destroys the coroutine.
+
+Although it is not called, it is strongly recommended that `await_suspend()`
+contain the following portability stub.  The stub ensures the awaiter behaves
+equivalently without `coro_await_suspend_destroy` support, and makes the
+control flow clear to readers unfamiliar with the attribute:
+
+.. code-block:: c++
+
+  void await_suspend_destroy(Promise&) { /* actual implementation*/ }
+  [[clang::coro_await_suspend_destroy]]
+  void await_suspend(std::coroutine_handle handle) {
+// Stub to preserve behavior when the attribute is not supported
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }

ChuanqiXu9 wrote:

This is still confusing. The behavior in my mind may be, the coroutine is 
immediately destroyed after the annotated `await_suspend` finished. In another 
word, the user can think we'll insert an unconditionally .destroy after the 
annotated `await_suspend`. The existence of `await_suspend_destroy` here is 
super confusing.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread Eli Friedman via cfe-commits

efriedma-quic wrote:

git-clang-format uses the "--lines" option to clang-format restrict formatting 
updates.  If that isn't working correctly, please file a bug.  You can ignore 
the bot after that.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread via cfe-commits

snarkmaster wrote:

Addressed nit, and reworked docs explaining the portability stub.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread via cfe-commits


@@ -9363,6 +9363,119 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatFunction;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute applies to an
+``await_suspend(std::coroutine_handle)`` member function of a
+coroutine awaiter.  When applied, suspensions into the awaiter use an optimized
+call path that bypasses standard suspend intrinsics, and immediately destroys
+the suspending coro.
+
+Each annotated ``await_suspend`` member must contain a compatibility stub:
+
+.. code-block:: c++
+
+  [[clang::coro_await_suspend_destroy]]
+  void await_suspend(std::coroutine_handle handle) {
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }
+
+An awaiter type may provide both annotated and non-annotated overloads of
+``await_suspend()``, as long as each invocation of an annotated overload has a
+corresponding ``await_suspend_destroy(Promise&)`` overload.
+
+Instead of calling the annotated ``await_suspend()``, the coroutine calls
+``await_suspend_destroy(Promise&)`` and immediately destroys the coroutine
+``await_suspend_destroy()`` must return ``void`` (Note: if desired, it
+would be straightforward to also support the "symmetric transfer"
+``std::coroutine_handle`` return type.)
+
+This optimization improves code speed and size for "short-circuiting"
+coroutines — those that use coroutine syntax **exclusively** for early returns
+and control flow rather than true asynchronous operations.
+
+Specifically, a short-circuiting awaiter is one that either proceeds
+immediately (``await_ready()`` returns ``true``, skipping to
+``await_resume()``) or terminates the coroutine execution.
+
+Then, a short-circuiting coroutine is one where **all** the awaiters (including
+``co_await``, ``co_yield``, initial, and final suspend) are short-circuiting.
+
+The short-circuiting coroutine concept introduced above has close analogs in
+other languages:
+
+- Rust has ``Result`` and a ``?`` operator to unpack it, while
+  ``folly::result`` is a C++ short-circuiting coroutine, within which
+  ``co_await or_unwind(someResult())`` acts just like ``someResult()?``.
+
+- Haskell has ``Maybe`` & ``Error`` monads.  A short-circuiting ``co_await``
+  loosely corresponds to the monadic ``>>=``, whereas a short-circuiting
+  ``std::optional`` coro would be an exact analog of ``Maybe``.

snarkmaster wrote:

What you write makes sense. If I had an authorititative description of 
short-circuiting coroutines and related concepts to link to "for more 
information", I would. However, I'm not aware of one, and I think it's 
genuinely helpful for user comprehension to pull in some of these analogies.

I agree it'd be a good idea to shorten the docs if such a source is later 
published.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread via cfe-commits


@@ -9363,6 +9363,119 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatFunction;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute applies to an
+``await_suspend(std::coroutine_handle)`` member function of a
+coroutine awaiter.  When applied, suspensions into the awaiter use an optimized
+call path that bypasses standard suspend intrinsics, and immediately destroys
+the suspending coro.
+
+Each annotated ``await_suspend`` member must contain a compatibility stub:
+
+.. code-block:: c++
+
+  [[clang::coro_await_suspend_destroy]]
+  void await_suspend(std::coroutine_handle handle) {
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }

snarkmaster wrote:

Here is what is **the same** in both the old and the new version:
  - The compiler does not constrain the implementation of `await_suspend()`.
  - Normal users SHOULD implement a portability `await_suspend()` with those 
two lines -- it should be very uncommon not to do this.
  - Given the attribute, the emitted code only uses `await_suspend_destroy()` 
-- the stub does not run.

The only thing that **differs** is that in the new version, we test-call 
`await_suspend()` to check for the attribute, so as a result:
  - `await_suspend()` must be at least declared, and have the same return type 
as `await_suspend_destroy()`

IMO, having to declare both functions is a good tradeoff in exchange for the 
outcome is that an awaiter can use overloads to select whether or not it is 
short-circuiting in a particular promise.

If you need me to elaborate **why** it's useful to have the overload selection, 
I can write more.

---

As far as the comment, I will make the requirements more explicit in the doc, 
and replace "must" by "should".


https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-19 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> Regarding the "no intrinsics" attribute discussion --

Let's discuss this later in other thread. It may be confusing to discuss two 
things in one thread especially for other readers.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-18 Thread Chuanqi Xu via cfe-commits


@@ -9363,6 +9363,119 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatFunction;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute applies to an
+``await_suspend(std::coroutine_handle)`` member function of a
+coroutine awaiter.  When applied, suspensions into the awaiter use an optimized
+call path that bypasses standard suspend intrinsics, and immediately destroys
+the suspending coro.
+
+Each annotated ``await_suspend`` member must contain a compatibility stub:
+
+.. code-block:: c++
+
+  [[clang::coro_await_suspend_destroy]]
+  void await_suspend(std::coroutine_handle handle) {
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }
+
+An awaiter type may provide both annotated and non-annotated overloads of
+``await_suspend()``, as long as each invocation of an annotated overload has a
+corresponding ``await_suspend_destroy(Promise&)`` overload.
+
+Instead of calling the annotated ``await_suspend()``, the coroutine calls
+``await_suspend_destroy(Promise&)`` and immediately destroys the coroutine
+``await_suspend_destroy()`` must return ``void`` (Note: if desired, it
+would be straightforward to also support the "symmetric transfer"
+``std::coroutine_handle`` return type.)
+
+This optimization improves code speed and size for "short-circuiting"
+coroutines — those that use coroutine syntax **exclusively** for early returns
+and control flow rather than true asynchronous operations.
+
+Specifically, a short-circuiting awaiter is one that either proceeds
+immediately (``await_ready()`` returns ``true``, skipping to
+``await_resume()``) or terminates the coroutine execution.
+
+Then, a short-circuiting coroutine is one where **all** the awaiters (including
+``co_await``, ``co_yield``, initial, and final suspend) are short-circuiting.
+
+The short-circuiting coroutine concept introduced above has close analogs in
+other languages:
+
+- Rust has ``Result`` and a ``?`` operator to unpack it, while
+  ``folly::result`` is a C++ short-circuiting coroutine, within which
+  ``co_await or_unwind(someResult())`` acts just like ``someResult()?``.
+
+- Haskell has ``Maybe`` & ``Error`` monads.  A short-circuiting ``co_await``
+  loosely corresponds to the monadic ``>>=``, whereas a short-circuiting
+  ``std::optional`` coro would be an exact analog of ``Maybe``.

ChuanqiXu9 wrote:

I feel the doc for attributes is not a good place for these ideas. But I don't 
have a strong idea now. We can always update it later.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-18 Thread Chuanqi Xu via cfe-commits


@@ -9363,6 +9363,119 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatFunction;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute applies to an
+``await_suspend(std::coroutine_handle)`` member function of a
+coroutine awaiter.  When applied, suspensions into the awaiter use an optimized
+call path that bypasses standard suspend intrinsics, and immediately destroys
+the suspending coro.
+
+Each annotated ``await_suspend`` member must contain a compatibility stub:
+
+.. code-block:: c++
+
+  [[clang::coro_await_suspend_destroy]]
+  void await_suspend(std::coroutine_handle handle) {
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }

ChuanqiXu9 wrote:

What does it mean? Does it say the function must be implemented as is? If yes, 
I feel bad for that. I prefer the previous design.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-18 Thread Chuanqi Xu via cfe-commits


@@ -5288,13 +5290,22 @@ class CoroutineSuspendExpr : public Expr {
   }
 
   CoroutineSuspendExpr(StmtClass SC, EmptyShell Empty) : Expr(SC, Empty) {
+CoroutineSuspendExprBits.UseAwaitSuspendDestroy = false;
 SubExprs[SubExpr::Operand] = nullptr;
 SubExprs[SubExpr::Common] = nullptr;
 SubExprs[SubExpr::Ready] = nullptr;
 SubExprs[SubExpr::Suspend] = nullptr;
 SubExprs[SubExpr::Resume] = nullptr;
   }
 
+  bool useAwaitSuspendDestroy() const {
+return CoroutineSuspendExprBits.UseAwaitSuspendDestroy;
+  }
+
+  void setUseAwaitSuspendDestroy(bool Use) {

ChuanqiXu9 wrote:

nit
```suggestion
  void setUseAwaitSuspendDestroy(bool Use = true) {
```

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-18 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-18 Thread via cfe-commits

snarkmaster wrote:

@ChuanqiXu9, please take a look at the update. I incorporated all the feedback 
from you and @yuxuanchen1997, as collected in my prior comment / checklist. 

I additionally made a change that Yuxuan mentioned, and that I thought I 
wouldn't be able to make. Specifically, I went back to a member function 
attribute, from a class attribute. The big upside of this is that every 
overload of `await_suspend` can independently decide whether it uses 
`await_suspend_destroy()`. I am actually going to end up using this in a 
near-term revision of `folly::result`, since I am migrating to a `co_await 
or_unwind()` syntax for extracting the value both in short-circuiting **and** 
in suspending coros. 

This commit has the detailed changes: 
https://github.com/llvm/llvm-project/pull/152623/commits/f1e885c45837b0e1bd3925fb69e16c6189ebbb80
 -- but, things look different enough it's probably worth a fresh read-through.

---

Regarding the "no intrinsics" attribute discussion -- 

> I feel confused about [[clang::coro_may_inline_suspend_with_barrier]]. It 
> makes things complex. In my mind, we only need:
>  - A class attribute to hint the compiler to not emit await.suspend 
> intrinsics.
>  - As you said, the __builtin_suspend_barrier is helpful for users to give 
> finer grained control.

More likely, I am confused -- you know the code much better.

The reason I proposed this very specific attribute is that I was worried about 
end users. Without the compiler enforcing barrier usage, people might apply the 
attribute to a production coro task, without understanding the subtle 
miscompilations you might see by dropping the suspend intrinsics. This sort of 
thing can show up in production as a random bug (heisenbug), which can make it 
extremely expensive to track down.

So, my goal in coupling "remove intrinsics" with "force a reordering barrier" 
was to give the end users some margin of safety. I think the high-level goal of 
safety is a good one.

On the other hand, I don't feel like I understand  the internals of LLVM well 
enough to have a good opinion on the implementability of that specific "safer" 
idea, or whether there's a different way to make it safer.

It might also be fine to provide them as separate features, but then the docs 
for the attribute must strongly encourage the users to use the barrier.


https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-18 Thread via cfe-commits

https://github.com/snarkmaster updated 
https://github.com/llvm/llvm-project/pull/152623

>From 9fc3169ea5f1aea2a88b2616b7c9c4f2949139be Mon Sep 17 00:00:00 2001
From: Alexey 
Date: Thu, 7 Aug 2025 12:10:07 -0700
Subject: [PATCH 1/6] Elide suspension points via
 [[clang::coro_await_suspend_destroy]]

Start by reading the detailed user-facing docs in `AttrDocs.td`.

My immediate motivation was that I noticed that short-circuiting coroutines
failed to optimize well.  Interact with the demo program here:
https://godbolt.org/z/E3YK5c45a

If Clang on Compiler Explorer supported [[clang::coro_await_suspend_destroy]],
the assembly for `simple_coro` would be drastically shorter, and would not
contain a call to `operator new`.

Here are a few high-level thoughts that don't belong in the docs:

  - This has `lit` tests, but what gives me real confidence in its correctness
is the integration test in `coro_await_suspend_destroy_test.cpp`.  This
caught all the interesting bugs that I had in earlier revs, and covers
equivalence to the standard code path in far more scenarios.

  - I considered a variety of other designs. Here are some key design points:

* I considered optimizing unmodified `await_suspend()` methods, as long as
  they unconditionally end with an `h.destroy()` call on the current
  handle, or an exception.  However, this would (a) force dynamic dispatch
  for `destroy` -- bloating IR & reducing optimization opportunities, (b)
  require far more complex, delicate, and fragile analysis, (c) retain more
  of the frame setup, so that e.g.  `h.done()` works properly.  The current
  solution shortcuts all these concerns.

* I want to `Promise&`, rather than `std::coroutine_handle` to
  `await_suspend_destroy` -- this is safer, simpler, and more efficient.
  Short-circuiting corotuines should not touch the handle.  This decision
  forces the attribue to go on the class.  Resolving a method attribute
  would have required looking up overloads for both types, and choosing
  one, which is costly and a bad UX to boot.

* `AttrDocs.td` tells portable code to provide a stub `await_suspend()`.
  This portability / compatibility solution avoids dire issues that would
  arise if users relied on `__has_cpp_attribute` and the declaration and
  definition happened to use different toolchains.  In particular, it will
  even be safe for a future compiler release to killswitch this attribute
  by removing its implementation and setting its version to 0.

```
let Spellings = [Clang<"coro_destroy_after_suspend", /*allowInC*/ 0,
 /*Version*/ 0>];
```

  - In the docs, I mention the `HasCoroSuspend` path in `CoroEarly.cpp` as
a further optimization opportunity.  But, I'm sure there are
higher-leverage ways of making these non-suspending coros compile better, I
just don't know the coro optimization pipeline well enough to flag them.

  - IIUC the only interaction of this with `coro_only_destroy_when_complete`
would be that the compiler expends fewer cycles.

  - I ran some benchmarks on [folly::result](
https://github.com/facebook/folly/blob/main/folly/result/docs/result.md).
Heap allocs are definitely elided, the compiled code looks like a function,
not a coroutine, but there's still an optimization gap.  On the plus side,
this results in a 4x speedup (!) in optimized ASAN builds (numbers not
shown for brevity.

```
// Simple result coroutine that adds 1 to the input
result result_coro(result&& r) {
  co_return co_await std::move(r) + 1;
}

// Non-coroutine equivalent using value_or_throw()
result catching_result_func(result&& r) {
  return result_catch_all([&]() -> result {
if (r.has_value()) {
  return r.value_or_throw() + 1;
}
return std::move(r).non_value();
  });
}

// Not QUITE equivalent to the coro -- lacks the exception boundary
result non_catching_result_func(result&& r) {
  if (r.has_value()) {
return r.value_or_throw() + 1;
  }
  return std::move(r).non_value();
}


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

result_coro_success13.61ns73.49M
non_catching_result_func_success3.39ns   295.00M
catching_result_func_success4.41ns   226.88M
result_coro_error  19.55ns51.16M
non_catching_result_func_error  9.15ns   109.26M
catching_result_func_error 10.19ns98.10M


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-17 Thread via cfe-commits

https://github.com/snarkmaster updated 
https://github.com/llvm/llvm-project/pull/152623

>From 9fc3169ea5f1aea2a88b2616b7c9c4f2949139be Mon Sep 17 00:00:00 2001
From: Alexey 
Date: Thu, 7 Aug 2025 12:10:07 -0700
Subject: [PATCH 1/6] Elide suspension points via
 [[clang::coro_await_suspend_destroy]]

Start by reading the detailed user-facing docs in `AttrDocs.td`.

My immediate motivation was that I noticed that short-circuiting coroutines
failed to optimize well.  Interact with the demo program here:
https://godbolt.org/z/E3YK5c45a

If Clang on Compiler Explorer supported [[clang::coro_await_suspend_destroy]],
the assembly for `simple_coro` would be drastically shorter, and would not
contain a call to `operator new`.

Here are a few high-level thoughts that don't belong in the docs:

  - This has `lit` tests, but what gives me real confidence in its correctness
is the integration test in `coro_await_suspend_destroy_test.cpp`.  This
caught all the interesting bugs that I had in earlier revs, and covers
equivalence to the standard code path in far more scenarios.

  - I considered a variety of other designs. Here are some key design points:

* I considered optimizing unmodified `await_suspend()` methods, as long as
  they unconditionally end with an `h.destroy()` call on the current
  handle, or an exception.  However, this would (a) force dynamic dispatch
  for `destroy` -- bloating IR & reducing optimization opportunities, (b)
  require far more complex, delicate, and fragile analysis, (c) retain more
  of the frame setup, so that e.g.  `h.done()` works properly.  The current
  solution shortcuts all these concerns.

* I want to `Promise&`, rather than `std::coroutine_handle` to
  `await_suspend_destroy` -- this is safer, simpler, and more efficient.
  Short-circuiting corotuines should not touch the handle.  This decision
  forces the attribue to go on the class.  Resolving a method attribute
  would have required looking up overloads for both types, and choosing
  one, which is costly and a bad UX to boot.

* `AttrDocs.td` tells portable code to provide a stub `await_suspend()`.
  This portability / compatibility solution avoids dire issues that would
  arise if users relied on `__has_cpp_attribute` and the declaration and
  definition happened to use different toolchains.  In particular, it will
  even be safe for a future compiler release to killswitch this attribute
  by removing its implementation and setting its version to 0.

```
let Spellings = [Clang<"coro_destroy_after_suspend", /*allowInC*/ 0,
 /*Version*/ 0>];
```

  - In the docs, I mention the `HasCoroSuspend` path in `CoroEarly.cpp` as
a further optimization opportunity.  But, I'm sure there are
higher-leverage ways of making these non-suspending coros compile better, I
just don't know the coro optimization pipeline well enough to flag them.

  - IIUC the only interaction of this with `coro_only_destroy_when_complete`
would be that the compiler expends fewer cycles.

  - I ran some benchmarks on [folly::result](
https://github.com/facebook/folly/blob/main/folly/result/docs/result.md).
Heap allocs are definitely elided, the compiled code looks like a function,
not a coroutine, but there's still an optimization gap.  On the plus side,
this results in a 4x speedup (!) in optimized ASAN builds (numbers not
shown for brevity.

```
// Simple result coroutine that adds 1 to the input
result result_coro(result&& r) {
  co_return co_await std::move(r) + 1;
}

// Non-coroutine equivalent using value_or_throw()
result catching_result_func(result&& r) {
  return result_catch_all([&]() -> result {
if (r.has_value()) {
  return r.value_or_throw() + 1;
}
return std::move(r).non_value();
  });
}

// Not QUITE equivalent to the coro -- lacks the exception boundary
result non_catching_result_func(result&& r) {
  if (r.has_value()) {
return r.value_or_throw() + 1;
  }
  return std::move(r).non_value();
}


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

result_coro_success13.61ns73.49M
non_catching_result_func_success3.39ns   295.00M
catching_result_func_success4.41ns   226.88M
result_coro_error  19.55ns51.16M
non_catching_result_func_error  9.15ns   109.26M
catching_result_func_error 10.19ns98.10M


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-11 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> Thanks! One more question @ChuanqiXu9 -- I now have a [pile of smallish 
> fixes](https://github.com/llvm/llvm-project/pull/152623#issuecomment-3177095986)
>  to make. Are you still thinking about the overall design of this PR, or 
> should I go ahead and polish it up for merging?

I feel good with the ideas. Let's address these comments and try to merge it.

> 
> > test for #148380 to show we can cover that in certain cases
> 
> Ok, this sounds simple enough. I added that to my to-do list for the next 
> revision.
> 
> > The core idea is to avoid generating await.suspend intrinsics 
> > conditionally. I am open to the conditions.
> 
> Ah, now I understand. Your idea ties back to the complex (for me, I didn't 
> spend the necessary time reading the relevant PRs and Phabricator thread!) 
> reasons that the `await.suspend.XXX` intrinsics were introduced in the first 
> place.
> 
> It sounds cool in principle, but:
> 
> * That's a separate work-stream, right?

Yes. That is a casual chat. Not a requirement for you or this PR.

> * Without deeply understanding the old mis-optimization bug, I doubt I'd be 
> good at coming up with this conditions or use-cases for this class of 
> attributes.
> 
> If you're thinking about this actively, and you think that my knowledge of 
> e.g. `folly/coro` is useful for your decision-making, then I can make some 
> time to think about it, or VC about it, and/or you could start a Discourse 
> thread about it?

If you want to understand this fully, you can read it at: 
https://github.com/llvm/llvm-project/issues/56301 The root cause is complex as 
it relates to the design of LLVM Coroutine IRs and LLVM middle end 
optimizations. But the higher level key point is, the LLVM awaiter suspend 
intrinsics may affect the performance. The problem we're handling is the 
resumer/destroyer in the await_suspend, especially the coroutine is 
resumed/destroyed in **another thread** while the await_suspend is executing.

As far as I know, this is reported from google's internal repo, so I 
`folly/coro` may not be affected by this. But `folly/coro` will be a good 
candidate in the open source world to show the usefulness of coroutines 
optimization.  Also I think our library 
(https://github.com/alibaba/async_simple) is a good candidate too as it is much 
simpler and cheaper.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-11 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> My current IR test verifies that there's no heap alloc across a few examples. 
> Were you thinking of a different kind of test? Can you be very specific since 
> I'm totally new to LLVM?

I mean to add a test for https://github.com/llvm/llvm-project/issues/148380 to 
show we can cover that in certain cases. Specifically, you can have a test on 
the IR to show it is devirtualized.

> I don't know when I'll next be able to make time to work on this, but if I 
> do, my next target would be std::suspend_never. I hope this one could be done 
> without an attribute, though. Do you have other use-cases in mind?

e.g., add an attribute to ask the compiler don't generate await.suspend 
intrinsics then the compiler may have optimized the await_suspend call better 
by inlining. The cost is the risk if the code resume or destroy the current 
coroutine during await_suspend somehow. But the programmer should be able to 
know that.

The core idea is to avoid generating  await.suspend intrinsics conditionally. I 
am open to the conditions.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-11 Thread Chuanqi Xu via cfe-commits


@@ -174,6 +174,66 @@ static bool StmtCanThrow(const Stmt *S) {
   return false;
 }
 
+// Check if this suspend should be calling `await_suspend_destroy`
+static bool useCoroAwaitSuspendDestroy(const CoroutineSuspendExpr &S) {
+  // This can only be an `await_suspend_destroy` suspend expression if it
+  // returns void -- `buildCoawaitCalls` in `SemaCoroutine.cpp` asserts this.
+  // Moreover, when `await_suspend` returns a handle, the outermost method call
+  // is `.address()` -- making it harder to get the actual class or method.
+  if (S.getSuspendReturnType() !=
+  CoroutineSuspendExpr::SuspendReturnType::SuspendVoid) {
+return false;
+  }
+
+  // `CGCoroutine.cpp` & `SemaCoroutine.cpp` must agree on whether this suspend
+  // expression uses `[[clang::coro_await_suspend_destroy]]`.
+  //
+  // Any mismatch is a serious bug -- we would either double-free, or fail to
+  // destroy the promise type. For this reason, we make our decision based on
+  // the method name, and fatal outside of the happy path -- including on
+  // failure to find a method name.
+  //
+  // As a debug-only check we also try to detect the `AwaiterClass`. This is
+  // secondary, because  detection of the awaiter type can be silently broken 
by
+  // small `buildCoawaitCalls` AST changes.
+  StringRef SuspendMethodName;   // Primary
+  CXXRecordDecl *AwaiterClass = nullptr; // Debug-only, best-effort
+  if (auto *SuspendCall =
+  dyn_cast(S.getSuspendExpr()->IgnoreImplicit())) {
+if (auto *SuspendMember = dyn_cast(SuspendCall->getCallee())) {
+  if (auto *BaseExpr = SuspendMember->getBase()) {
+// `IgnoreImplicitAsWritten` is critical since `await_suspend...` can 
be
+// invoked on the base of the actual awaiter, and the base need not 
have
+// the attribute. In such cases, the AST will show the true awaiter
+// being upcast to the base.
+AwaiterClass = BaseExpr->IgnoreImplicitAsWritten()
+   ->getType()
+   ->getAsCXXRecordDecl();
+  }
+  if (auto *SuspendMethod =
+  dyn_cast(SuspendMember->getMemberDecl())) {
+SuspendMethodName = SuspendMethod->getName();
+  }
+}
+  }
+  if (SuspendMethodName == "await_suspend_destroy") {
+assert(!AwaiterClass ||
+   AwaiterClass->hasAttr());
+return true;
+  } else if (SuspendMethodName == "await_suspend") {
+assert(!AwaiterClass ||
+   !AwaiterClass->hasAttr());
+return false;
+  } else {
+llvm::report_fatal_error(
+"Wrong method in [[clang::coro_await_suspend_destroy]] check: "
+"expected 'await_suspend' or 'await_suspend_destroy', but got '" +
+SuspendMethodName + "'");
+  }

ChuanqiXu9 wrote:

LLVM doesn't guarantee ABI stability across major versions.

Remember to touch Serialization part if you added any new bits to AST nodes

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-11 Thread via cfe-commits


@@ -9270,6 +9270,110 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
+coroutine awaiter type.  When this attribute is present, the awaiter must
+implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
+returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
+called directly.  The coroutine being suspended will then be immediately
+destroyed.
+
+The new behavior is equivalent to this standard code:
+
+.. code-block:: c++
+
+  void await_suspend_destroy(YourPromise&) { ... }
+  void await_suspend(auto handle) {
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }
+
+This enables `await_suspend_destroy()` usage in portable awaiters — just add a
+stub ``await_suspend()`` as above.  Without ``coro_await_suspend_destroy``
+support, the awaiter will behave nearly identically, with the only difference
+being heap allocation instead of stack allocation for the coroutine frame.
+
+This attribute helps optimize short-circuiting coroutines.
+
+A short-circuiting coroutine is one where every ``co_await`` or ``co_yield``
+either immediately produces a value, or exits the coroutine.  In other words,
+they use coroutine syntax to concisely branch out of a synchronous function. 
+Here are close analogs in other languages:
+
+- Rust has ``Result`` and a ``?`` operator to unpack it, while
+  ``folly::result`` is a C++ short-circuiting coroutine, with ``co_await``
+  acting just like ``?``.
+
+- Haskell has ``Maybe`` & ``Error`` monads.  A short-circuiting ``co_await``
+  loosely corresponds to the monadic ``>>=``, whereas a short-circuiting
+  ``std::optional`` coro would be an exact analog of ``Maybe``.
+
+The C++ implementation relies on short-circuiting awaiters.  These either
+resume synchronously, or immediately destroy the awaiting coroutine and return
+control to the parent:
+
+.. code-block:: c++
+
+  T val;
+  if (awaiter.await_ready()) {
+val = awaiter.await_resume();
+  } else {
+awaiter.await_suspend();
+return /* value representing the "execution short-circuited" outcome */;
+  }
+
+Then, a short-ciruiting coroutine is one where all the suspend points are
+either (i) trivial (like ``std::suspend_never``), or (ii) short-circuiting.
+
+Although the coroutine machinery makes them harder to optimize, logically,
+short-circuiting coroutines are like syntax sugar for regular functions where:
+
+- `co_await` allows expressions to return early.
+
+- `unhandled_exception()` lets the coroutine promise type wrap the function
+  body in an implicit try-catch.  This mandatory exception boundary behavior
+  can be desirable in robust, return-value-oriented programs that benefit from
+  short-circuiting coroutines.  If not, the promise can always re-throw.
+
+This attribute improves short-circuiting coroutines in a few ways:
+
+- **Avoid heap allocations for coro frames**: Allocating short-circuiting
+  coros on the stack makes code more predictable under memory pressure.
+  Without this attribute, LLVM cannot elide heap allocation even when all
+  awaiters are short-circuiting.
+
+- **Performance**: Significantly faster execution and smaller code size.
+
+- **Build time**: Faster compilation due to less IR being generated.
+
+Marking your ``await_suspend_destroy`` method as ``noexcept`` can sometimes
+further improve optimization.

snarkmaster wrote:

Lol sure, I'll change it, but ~everyone knows that method is short for "member 
function" :-P

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-11 Thread via cfe-commits


@@ -9270,6 +9270,110 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
+coroutine awaiter type.  When this attribute is present, the awaiter must
+implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
+returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
+called directly.  The coroutine being suspended will then be immediately
+destroyed.
+
+The new behavior is equivalent to this standard code:
+
+.. code-block:: c++
+
+  void await_suspend_destroy(YourPromise&) { ... }
+  void await_suspend(auto handle) {
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }
+
+This enables `await_suspend_destroy()` usage in portable awaiters — just add a
+stub ``await_suspend()`` as above.  Without ``coro_await_suspend_destroy``
+support, the awaiter will behave nearly identically, with the only difference
+being heap allocation instead of stack allocation for the coroutine frame.
+

snarkmaster wrote:

I used this phrasing for brevity. I will rephrase it to be more precise.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-10 Thread via cfe-commits

snarkmaster wrote:

> ... can cover https://github.com/llvm/llvm-project/issues/148380, right?

Above, I mentioned "> `await_suspend` that destroys the coro returns a handle"  
as one of the cases I don't cover.

There's no logical reason this couldn't be supported, but I only needed `void` 
for short-circuiting coros. 

I looked through `folly/coro` and found a destroying suspend that returned a 
handle (see 
[`DetachedBarrierTask`](https://github.com/facebook/folly/blob/main/folly/coro/detail/BarrierTask.h#L147)).
 I then made a [quick-and-dirty 
PoC](https://gist.github.com/snarkmaster/f85ad34dcd0b0f3a810685fc5379948e) of a 
compiler change to support returning handles from `await_suspend_destroy`. 
Unfortunately, it didn't make an obvious difference to the code optimization in 
this case.

So, I don't think the current PR should include the extra code to support 
`await_suspend_destroy` returning handles. We'd need to see some kind of 
use-case where it makes a meaningful difference to perf, and a reason why 
making that optimize the normal way is hard.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-09 Thread via cfe-commits

https://github.com/snarkmaster updated 
https://github.com/llvm/llvm-project/pull/152623

>From 9fc3169ea5f1aea2a88b2616b7c9c4f2949139be Mon Sep 17 00:00:00 2001
From: Alexey 
Date: Thu, 7 Aug 2025 12:10:07 -0700
Subject: [PATCH 1/3] Elide suspension points via
 [[clang::coro_await_suspend_destroy]]

Start by reading the detailed user-facing docs in `AttrDocs.td`.

My immediate motivation was that I noticed that short-circuiting coroutines
failed to optimize well.  Interact with the demo program here:
https://godbolt.org/z/E3YK5c45a

If Clang on Compiler Explorer supported [[clang::coro_await_suspend_destroy]],
the assembly for `simple_coro` would be drastically shorter, and would not
contain a call to `operator new`.

Here are a few high-level thoughts that don't belong in the docs:

  - This has `lit` tests, but what gives me real confidence in its correctness
is the integration test in `coro_await_suspend_destroy_test.cpp`.  This
caught all the interesting bugs that I had in earlier revs, and covers
equivalence to the standard code path in far more scenarios.

  - I considered a variety of other designs. Here are some key design points:

* I considered optimizing unmodified `await_suspend()` methods, as long as
  they unconditionally end with an `h.destroy()` call on the current
  handle, or an exception.  However, this would (a) force dynamic dispatch
  for `destroy` -- bloating IR & reducing optimization opportunities, (b)
  require far more complex, delicate, and fragile analysis, (c) retain more
  of the frame setup, so that e.g.  `h.done()` works properly.  The current
  solution shortcuts all these concerns.

* I want to `Promise&`, rather than `std::coroutine_handle` to
  `await_suspend_destroy` -- this is safer, simpler, and more efficient.
  Short-circuiting corotuines should not touch the handle.  This decision
  forces the attribue to go on the class.  Resolving a method attribute
  would have required looking up overloads for both types, and choosing
  one, which is costly and a bad UX to boot.

* `AttrDocs.td` tells portable code to provide a stub `await_suspend()`.
  This portability / compatibility solution avoids dire issues that would
  arise if users relied on `__has_cpp_attribute` and the declaration and
  definition happened to use different toolchains.  In particular, it will
  even be safe for a future compiler release to killswitch this attribute
  by removing its implementation and setting its version to 0.

```
let Spellings = [Clang<"coro_destroy_after_suspend", /*allowInC*/ 0,
 /*Version*/ 0>];
```

  - In the docs, I mention the `HasCoroSuspend` path in `CoroEarly.cpp` as
a further optimization opportunity.  But, I'm sure there are
higher-leverage ways of making these non-suspending coros compile better, I
just don't know the coro optimization pipeline well enough to flag them.

  - IIUC the only interaction of this with `coro_only_destroy_when_complete`
would be that the compiler expends fewer cycles.

  - I ran some benchmarks on [folly::result](
https://github.com/facebook/folly/blob/main/folly/result/docs/result.md).
Heap allocs are definitely elided, the compiled code looks like a function,
not a coroutine, but there's still an optimization gap.  On the plus side,
this results in a 4x speedup (!) in optimized ASAN builds (numbers not
shown for brevity.

```
// Simple result coroutine that adds 1 to the input
result result_coro(result&& r) {
  co_return co_await std::move(r) + 1;
}

// Non-coroutine equivalent using value_or_throw()
result catching_result_func(result&& r) {
  return result_catch_all([&]() -> result {
if (r.has_value()) {
  return r.value_or_throw() + 1;
}
return std::move(r).non_value();
  });
}

// Not QUITE equivalent to the coro -- lacks the exception boundary
result non_catching_result_func(result&& r) {
  if (r.has_value()) {
return r.value_or_throw() + 1;
  }
  return std::move(r).non_value();
}


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

result_coro_success13.61ns73.49M
non_catching_result_func_success3.39ns   295.00M
catching_result_func_success4.41ns   226.88M
result_coro_error  19.55ns51.16M
non_catching_result_func_error  9.15ns   109.26M
catching_result_func_error 10.19ns98.10M


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-09 Thread via cfe-commits

snarkmaster wrote:

I addressed most of the comments, except that the handful marked "Response 
requested" might still benefit from your input. But, if you are happy, I am 
also happy with the 2 PRs as they are now.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-09 Thread via cfe-commits

https://github.com/snarkmaster updated 
https://github.com/llvm/llvm-project/pull/152623

>From 9fc3169ea5f1aea2a88b2616b7c9c4f2949139be Mon Sep 17 00:00:00 2001
From: Alexey 
Date: Thu, 7 Aug 2025 12:10:07 -0700
Subject: [PATCH 1/6] Elide suspension points via
 [[clang::coro_await_suspend_destroy]]

Start by reading the detailed user-facing docs in `AttrDocs.td`.

My immediate motivation was that I noticed that short-circuiting coroutines
failed to optimize well.  Interact with the demo program here:
https://godbolt.org/z/E3YK5c45a

If Clang on Compiler Explorer supported [[clang::coro_await_suspend_destroy]],
the assembly for `simple_coro` would be drastically shorter, and would not
contain a call to `operator new`.

Here are a few high-level thoughts that don't belong in the docs:

  - This has `lit` tests, but what gives me real confidence in its correctness
is the integration test in `coro_await_suspend_destroy_test.cpp`.  This
caught all the interesting bugs that I had in earlier revs, and covers
equivalence to the standard code path in far more scenarios.

  - I considered a variety of other designs. Here are some key design points:

* I considered optimizing unmodified `await_suspend()` methods, as long as
  they unconditionally end with an `h.destroy()` call on the current
  handle, or an exception.  However, this would (a) force dynamic dispatch
  for `destroy` -- bloating IR & reducing optimization opportunities, (b)
  require far more complex, delicate, and fragile analysis, (c) retain more
  of the frame setup, so that e.g.  `h.done()` works properly.  The current
  solution shortcuts all these concerns.

* I want to `Promise&`, rather than `std::coroutine_handle` to
  `await_suspend_destroy` -- this is safer, simpler, and more efficient.
  Short-circuiting corotuines should not touch the handle.  This decision
  forces the attribue to go on the class.  Resolving a method attribute
  would have required looking up overloads for both types, and choosing
  one, which is costly and a bad UX to boot.

* `AttrDocs.td` tells portable code to provide a stub `await_suspend()`.
  This portability / compatibility solution avoids dire issues that would
  arise if users relied on `__has_cpp_attribute` and the declaration and
  definition happened to use different toolchains.  In particular, it will
  even be safe for a future compiler release to killswitch this attribute
  by removing its implementation and setting its version to 0.

```
let Spellings = [Clang<"coro_destroy_after_suspend", /*allowInC*/ 0,
 /*Version*/ 0>];
```

  - In the docs, I mention the `HasCoroSuspend` path in `CoroEarly.cpp` as
a further optimization opportunity.  But, I'm sure there are
higher-leverage ways of making these non-suspending coros compile better, I
just don't know the coro optimization pipeline well enough to flag them.

  - IIUC the only interaction of this with `coro_only_destroy_when_complete`
would be that the compiler expends fewer cycles.

  - I ran some benchmarks on [folly::result](
https://github.com/facebook/folly/blob/main/folly/result/docs/result.md).
Heap allocs are definitely elided, the compiled code looks like a function,
not a coroutine, but there's still an optimization gap.  On the plus side,
this results in a 4x speedup (!) in optimized ASAN builds (numbers not
shown for brevity.

```
// Simple result coroutine that adds 1 to the input
result result_coro(result&& r) {
  co_return co_await std::move(r) + 1;
}

// Non-coroutine equivalent using value_or_throw()
result catching_result_func(result&& r) {
  return result_catch_all([&]() -> result {
if (r.has_value()) {
  return r.value_or_throw() + 1;
}
return std::move(r).non_value();
  });
}

// Not QUITE equivalent to the coro -- lacks the exception boundary
result non_catching_result_func(result&& r) {
  if (r.has_value()) {
return r.value_or_throw() + 1;
  }
  return std::move(r).non_value();
}


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

result_coro_success13.61ns73.49M
non_catching_result_func_success3.39ns   295.00M
catching_result_func_success4.41ns   226.88M
result_coro_error  19.55ns51.16M
non_catching_result_func_error  9.15ns   109.26M
catching_result_func_error 10.19ns98.10M


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-09 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits

snarkmaster wrote:

I improved the `libcxx`-hosted test and put it up as a separate PR: 
https://github.com/llvm/llvm-project/pull/152820 

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits


@@ -9270,6 +9270,93 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
+coroutine awaiter type.  When this attribute is present, the awaiter must
+implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
+returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
+called directly, bypassing the ``await_suspend(std::coroutine_handle<...>)``
+method.  The coroutine being suspended will then be immediately destroyed.
+
+Logically, the new behavior is equivalent to this standard code:
+
+.. code-block:: c++
+
+  void await_suspend_destroy(YourPromise&) { ... }
+  void await_suspend(auto handle) {
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }
+
+This enables `await_suspend_destroy()` usage in portable awaiters — just add a
+stub ``await_suspend()`` as above.  Without ``coro_await_suspend_destroy``
+support, the awaiter will behave nearly identically, with the only difference
+being heap allocation instead of stack allocation for the coroutine frame.
+
+This attribute exists to optimize short-circuiting coroutines—coroutines whose
+suspend points are either (i) trivial (like ``std::suspend_never``), or (ii)
+short-circuiting (like a ``co_await`` that can be expressed in regular control

snarkmaster wrote:

**Response requested**

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits

snarkmaster wrote:

@ChuanqiXu9, I marked a few of the inline comments "Response requested", please 
take a look.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits


@@ -9270,6 +9270,93 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
+coroutine awaiter type.  When this attribute is present, the awaiter must
+implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
+returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
+called directly, bypassing the ``await_suspend(std::coroutine_handle<...>)``
+method.  The coroutine being suspended will then be immediately destroyed.
+
+Logically, the new behavior is equivalent to this standard code:
+
+.. code-block:: c++
+
+  void await_suspend_destroy(YourPromise&) { ... }
+  void await_suspend(auto handle) {
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }
+
+This enables `await_suspend_destroy()` usage in portable awaiters — just add a
+stub ``await_suspend()`` as above.  Without ``coro_await_suspend_destroy``
+support, the awaiter will behave nearly identically, with the only difference
+being heap allocation instead of stack allocation for the coroutine frame.
+
+This attribute exists to optimize short-circuiting coroutines—coroutines whose
+suspend points are either (i) trivial (like ``std::suspend_never``), or (ii)
+short-circuiting (like a ``co_await`` that can be expressed in regular control
+flow as):
+
+.. code-block:: c++
+
+  T val;
+  if (awaiter.await_ready()) {
+val = awaiter.await_resume();
+  } else {
+awaiter.await_suspend();
+return /* value representing the "execution short-circuited" outcome */;
+  }
+
+The benefits of this attribute are:
+  - **Avoid heap allocations for coro frames**: Allocating short-circuiting

snarkmaster wrote:

**Response requested**

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits


@@ -9270,6 +9270,93 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
+coroutine awaiter type.  When this attribute is present, the awaiter must
+implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
+returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
+called directly, bypassing the ``await_suspend(std::coroutine_handle<...>)``
+method.  The coroutine being suspended will then be immediately destroyed.
+
+Logically, the new behavior is equivalent to this standard code:
+
+.. code-block:: c++
+
+  void await_suspend_destroy(YourPromise&) { ... }
+  void await_suspend(auto handle) {
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }
+
+This enables `await_suspend_destroy()` usage in portable awaiters — just add a
+stub ``await_suspend()`` as above.  Without ``coro_await_suspend_destroy``
+support, the awaiter will behave nearly identically, with the only difference
+being heap allocation instead of stack allocation for the coroutine frame.
+
+This attribute exists to optimize short-circuiting coroutines—coroutines whose
+suspend points are either (i) trivial (like ``std::suspend_never``), or (ii)
+short-circuiting (like a ``co_await`` that can be expressed in regular control
+flow as):
+
+.. code-block:: c++
+
+  T val;
+  if (awaiter.await_ready()) {
+val = awaiter.await_resume();
+  } else {
+awaiter.await_suspend();
+return /* value representing the "execution short-circuited" outcome */;
+  }
+
+The benefits of this attribute are:
+  - **Avoid heap allocations for coro frames**: Allocating short-circuiting

snarkmaster wrote:

I understand the mechanism to be this:
 - If all `co_await` / `co_yield` awaiters use `await_suspend_destroy()`, that 
leaves the coro with just 2 suspend points: initial & final.
 - If those suspend points have `await_ready() { return true; }` like 
`suspend_never`, the remaining two suspends get cut out somewhere in the middle 
end.
 - At that point, there are no suspend intrinsics left for the escape analysis, 
and heap elision kicks in.

If we were able to elide `suspend_never` earlier, I suspect that 
short-circuiting coros could optimize even more like plain functions.

Can you please clarify how you'd like me to edit this portion of the **docs**? 

My reasoning seems a bit too low-level to be user-relevant. I feel like if a 
user cares about allocs, they would just look at the generated code for their 
application...

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits


@@ -174,6 +174,66 @@ static bool StmtCanThrow(const Stmt *S) {
   return false;
 }
 
+// Check if this suspend should be calling `await_suspend_destroy`
+static bool useCoroAwaitSuspendDestroy(const CoroutineSuspendExpr &S) {
+  // This can only be an `await_suspend_destroy` suspend expression if it
+  // returns void -- `buildCoawaitCalls` in `SemaCoroutine.cpp` asserts this.
+  // Moreover, when `await_suspend` returns a handle, the outermost method call
+  // is `.address()` -- making it harder to get the actual class or method.
+  if (S.getSuspendReturnType() !=
+  CoroutineSuspendExpr::SuspendReturnType::SuspendVoid) {
+return false;
+  }
+
+  // `CGCoroutine.cpp` & `SemaCoroutine.cpp` must agree on whether this suspend
+  // expression uses `[[clang::coro_await_suspend_destroy]]`.
+  //
+  // Any mismatch is a serious bug -- we would either double-free, or fail to
+  // destroy the promise type. For this reason, we make our decision based on
+  // the method name, and fatal outside of the happy path -- including on
+  // failure to find a method name.
+  //
+  // As a debug-only check we also try to detect the `AwaiterClass`. This is
+  // secondary, because  detection of the awaiter type can be silently broken 
by
+  // small `buildCoawaitCalls` AST changes.
+  StringRef SuspendMethodName;   // Primary
+  CXXRecordDecl *AwaiterClass = nullptr; // Debug-only, best-effort
+  if (auto *SuspendCall =
+  dyn_cast(S.getSuspendExpr()->IgnoreImplicit())) {
+if (auto *SuspendMember = dyn_cast(SuspendCall->getCallee())) {
+  if (auto *BaseExpr = SuspendMember->getBase()) {
+// `IgnoreImplicitAsWritten` is critical since `await_suspend...` can 
be
+// invoked on the base of the actual awaiter, and the base need not 
have
+// the attribute. In such cases, the AST will show the true awaiter
+// being upcast to the base.
+AwaiterClass = BaseExpr->IgnoreImplicitAsWritten()
+   ->getType()
+   ->getAsCXXRecordDecl();
+  }
+  if (auto *SuspendMethod =
+  dyn_cast(SuspendMember->getMemberDecl())) {
+SuspendMethodName = SuspendMethod->getName();
+  }
+}
+  }
+  if (SuspendMethodName == "await_suspend_destroy") {
+assert(!AwaiterClass ||
+   AwaiterClass->hasAttr());
+return true;
+  } else if (SuspendMethodName == "await_suspend") {
+assert(!AwaiterClass ||
+   !AwaiterClass->hasAttr());
+return false;
+  } else {
+llvm::report_fatal_error(
+"Wrong method in [[clang::coro_await_suspend_destroy]] check: "
+"expected 'await_suspend' or 'await_suspend_destroy', but got '" +
+SuspendMethodName + "'");
+  }

snarkmaster wrote:

**Response requested**

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits


@@ -174,6 +174,66 @@ static bool StmtCanThrow(const Stmt *S) {
   return false;
 }
 
+// Check if this suspend should be calling `await_suspend_destroy`
+static bool useCoroAwaitSuspendDestroy(const CoroutineSuspendExpr &S) {
+  // This can only be an `await_suspend_destroy` suspend expression if it
+  // returns void -- `buildCoawaitCalls` in `SemaCoroutine.cpp` asserts this.
+  // Moreover, when `await_suspend` returns a handle, the outermost method call
+  // is `.address()` -- making it harder to get the actual class or method.
+  if (S.getSuspendReturnType() !=
+  CoroutineSuspendExpr::SuspendReturnType::SuspendVoid) {
+return false;
+  }
+
+  // `CGCoroutine.cpp` & `SemaCoroutine.cpp` must agree on whether this suspend
+  // expression uses `[[clang::coro_await_suspend_destroy]]`.
+  //
+  // Any mismatch is a serious bug -- we would either double-free, or fail to
+  // destroy the promise type. For this reason, we make our decision based on
+  // the method name, and fatal outside of the happy path -- including on
+  // failure to find a method name.
+  //
+  // As a debug-only check we also try to detect the `AwaiterClass`. This is
+  // secondary, because  detection of the awaiter type can be silently broken 
by
+  // small `buildCoawaitCalls` AST changes.
+  StringRef SuspendMethodName;   // Primary
+  CXXRecordDecl *AwaiterClass = nullptr; // Debug-only, best-effort
+  if (auto *SuspendCall =
+  dyn_cast(S.getSuspendExpr()->IgnoreImplicit())) {
+if (auto *SuspendMember = dyn_cast(SuspendCall->getCallee())) {
+  if (auto *BaseExpr = SuspendMember->getBase()) {
+// `IgnoreImplicitAsWritten` is critical since `await_suspend...` can 
be
+// invoked on the base of the actual awaiter, and the base need not 
have
+// the attribute. In such cases, the AST will show the true awaiter
+// being upcast to the base.
+AwaiterClass = BaseExpr->IgnoreImplicitAsWritten()
+   ->getType()
+   ->getAsCXXRecordDecl();
+  }
+  if (auto *SuspendMethod =
+  dyn_cast(SuspendMember->getMemberDecl())) {
+SuspendMethodName = SuspendMethod->getName();
+  }
+}
+  }
+  if (SuspendMethodName == "await_suspend_destroy") {
+assert(!AwaiterClass ||
+   AwaiterClass->hasAttr());
+return true;
+  } else if (SuspendMethodName == "await_suspend") {
+assert(!AwaiterClass ||
+   !AwaiterClass->hasAttr());
+return false;
+  } else {
+llvm::report_fatal_error(
+"Wrong method in [[clang::coro_await_suspend_destroy]] check: "
+"expected 'await_suspend' or 'await_suspend_destroy', but got '" +
+SuspendMethodName + "'");
+  }

snarkmaster wrote:

Yes, the "awaiter" and "method name" checks are redundant, but I did it this 
way for a (good?) reason.

Initially, I just looked for the attribute on the awaiter checks, and I had a 
nasty bug described in a comment higher in that function. Worse yet, I only 
found it by accident because my test **happened** to have a base awaiter that 
didn't have the attribute, and a derived one that did.

Since any class can be an awaiter, it's hard to be sure if we're looking on the 
same CXXRecord that Sema examined. 

For that reason, I made the method name check primary. There are only two valid 
values here, and I can emit a fatal if I don't see either. This makes it clear 
that Sema alone is responsible for checking & interpreting the attribute.

The `AwaiterClass` attribute check is secondary, and debug-only. If you prefer, 
I'm happy to delete that one.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits


@@ -243,95 +321,105 @@ static LValueOrRValue 
emitSuspendExpression(CodeGenFunction &CGF, CGCoroData &Co
   CGF.EmitBlock(SuspendBlock);
 
   auto &Builder = CGF.Builder;
-  llvm::Function *CoroSave = CGF.CGM.getIntrinsic(llvm::Intrinsic::coro_save);
-  auto *NullPtr = llvm::ConstantPointerNull::get(CGF.CGM.Int8PtrTy);
-  auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});
 
   auto SuspendWrapper = CodeGenFunction(CGF.CGM).generateAwaitSuspendWrapper(
   CGF.CurFn->getName(), Prefix, S);
 
-  CGF.CurCoro.InSuspendBlock = true;
-
   assert(CGF.CurCoro.Data && CGF.CurCoro.Data->CoroBegin &&
  "expected to be called in coroutine context");
 
-  SmallVector SuspendIntrinsicCallArgs;
-  SuspendIntrinsicCallArgs.push_back(
-  CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF));
-
-  SuspendIntrinsicCallArgs.push_back(CGF.CurCoro.Data->CoroBegin);
-  SuspendIntrinsicCallArgs.push_back(SuspendWrapper);
-
-  const auto SuspendReturnType = S.getSuspendReturnType();
-  llvm::Intrinsic::ID AwaitSuspendIID;
-
-  switch (SuspendReturnType) {
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendVoid:
-AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_void;
-break;
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendBool:
-AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_bool;
-break;
-  case CoroutineSuspendExpr::SuspendReturnType::SuspendHandle:
-AwaitSuspendIID = llvm::Intrinsic::coro_await_suspend_handle;
-break;
-  }
-
-  llvm::Function *AwaitSuspendIntrinsic = 
CGF.CGM.getIntrinsic(AwaitSuspendIID);
-
   // SuspendHandle might throw since it also resumes the returned handle.
+  const auto SuspendReturnType = S.getSuspendReturnType();
   const bool AwaitSuspendCanThrow =
   SuspendReturnType ==
   CoroutineSuspendExpr::SuspendReturnType::SuspendHandle ||
   StmtCanThrow(S.getSuspendExpr());
 
-  llvm::CallBase *SuspendRet = nullptr;
-  // FIXME: add call attributes?
-  if (AwaitSuspendCanThrow)
-SuspendRet =
-CGF.EmitCallOrInvoke(AwaitSuspendIntrinsic, SuspendIntrinsicCallArgs);
-  else
-SuspendRet = CGF.EmitNounwindRuntimeCall(AwaitSuspendIntrinsic,
- SuspendIntrinsicCallArgs);
+  llvm::Value *Awaiter =
+  CGF.getOrCreateOpaqueLValueMapping(S.getOpaqueValue()).getPointer(CGF);
+  llvm::Value *Frame = CGF.CurCoro.Data->CoroBegin;
 
-  assert(SuspendRet);
-  CGF.CurCoro.InSuspendBlock = false;
+  if (useCoroAwaitSuspendDestroy(S)) { // Call `await_suspend_destroy` & 
cleanup
+emitAwaitSuspendDestroy(CGF, Coro, SuspendWrapper, Awaiter, Frame,
+AwaitSuspendCanThrow);
+  } else { // Normal suspend path -- can actually suspend, uses intrinsics

snarkmaster wrote:

Sure, I'll give it a try. If the argument list isn't too gross, I'd also prefer 
that.

Here, I was trying to change the existing code layout as little as necessary.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits


@@ -9270,6 +9270,93 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
+coroutine awaiter type.  When this attribute is present, the awaiter must
+implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
+returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
+called directly, bypassing the ``await_suspend(std::coroutine_handle<...>)``
+method.  The coroutine being suspended will then be immediately destroyed.
+
+Logically, the new behavior is equivalent to this standard code:
+
+.. code-block:: c++
+
+  void await_suspend_destroy(YourPromise&) { ... }
+  void await_suspend(auto handle) {
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }
+
+This enables `await_suspend_destroy()` usage in portable awaiters — just add a
+stub ``await_suspend()`` as above.  Without ``coro_await_suspend_destroy``
+support, the awaiter will behave nearly identically, with the only difference
+being heap allocation instead of stack allocation for the coroutine frame.
+
+This attribute exists to optimize short-circuiting coroutines—coroutines whose
+suspend points are either (i) trivial (like ``std::suspend_never``), or (ii)
+short-circuiting (like a ``co_await`` that can be expressed in regular control
+flow as):
+
+.. code-block:: c++
+
+  T val;
+  if (awaiter.await_ready()) {
+val = awaiter.await_resume();
+  } else {
+awaiter.await_suspend();
+return /* value representing the "execution short-circuited" outcome */;
+  }
+
+The benefits of this attribute are:
+  - **Avoid heap allocations for coro frames**: Allocating short-circuiting
+coros on the stack makes code more predictable under memory pressure.
+Without this attribute, LLVM cannot elide heap allocation even when all
+awaiters are short-circuiting.
+  - **Performance**: Significantly faster execution and smaller code size.
+  - **Build time**: Faster compilation due to less IR being generated.
+
+Marking your ``await_suspend_destroy`` method as ``noexcept`` can sometimes
+further improve optimization.
+
+Here is a toy example of a portable short-circuiting awaiter:
+
+.. code-block:: c++
+
+  template 
+  struct [[clang::coro_await_suspend_destroy]] optional_awaitable {
+std::optional opt_;
+bool await_ready() const noexcept { return opt_.has_value(); }
+T await_resume() { return std::move(opt_).value(); }
+void await_suspend_destroy(auto& promise) {
+  // Assume the return object of the outer coro defaults to "empty".
+}
+// Fallback for when `coro_await_suspend_destroy` is unavailable.
+void await_suspend(auto handle) {
+  await_suspend_destroy(handle.promise());
+  handle.destroy();
+}
+  };
+
+If all suspension points use (i) trivial or (ii) short-circuiting awaiters,
+then the coroutine optimizes more like a plain function, with 2 caveats:
+  - **Behavior:** The coroutine promise provides an implicit exception boundary
+(as if wrapping the function in ``try {} catch { unhandled_exception(); 
}``).
+This exception handling behavior is usually desirable in robust,
+return-value-oriented programs that need short-circuiting coroutines.
+Otherwise, the promise can always re-throw.
+  - **Speed:** As of 2025, there is still an optimization gap between a
+realistic short-circuiting coro, and the equivalent (but much more verbose)
+function.  For a guesstimate, expect 4-5ns per call on x86.  One idea for
+improvement is to also elide trivial suspends like `std::suspend_never`, in
+order to hit the `HasCoroSuspend` path in `CoroEarly.cpp`.

snarkmaster wrote:

Hmm, I'll look for an in-code place for this. I just didn't want to oversell 
the speedup people should expect from this.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits


@@ -9270,6 +9270,93 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
+coroutine awaiter type.  When this attribute is present, the awaiter must
+implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
+returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
+called directly, bypassing the ``await_suspend(std::coroutine_handle<...>)``
+method.  The coroutine being suspended will then be immediately destroyed.
+
+Logically, the new behavior is equivalent to this standard code:
+
+.. code-block:: c++
+
+  void await_suspend_destroy(YourPromise&) { ... }
+  void await_suspend(auto handle) {
+await_suspend_destroy(handle.promise());
+handle.destroy();
+  }
+
+This enables `await_suspend_destroy()` usage in portable awaiters — just add a
+stub ``await_suspend()`` as above.  Without ``coro_await_suspend_destroy``
+support, the awaiter will behave nearly identically, with the only difference
+being heap allocation instead of stack allocation for the coroutine frame.
+
+This attribute exists to optimize short-circuiting coroutines—coroutines whose
+suspend points are either (i) trivial (like ``std::suspend_never``), or (ii)
+short-circuiting (like a ``co_await`` that can be expressed in regular control

snarkmaster wrote:

Next update, I will start with something like the below. WDYT? I'm happy to 
linkify the various concepts if that's not frowned upon.

A short-circuiting coroutine is one where every `co_await` or `co_yield` either 
immediately produces a value, or exits the coroutine. In other words, they use 
coroutine syntax to concisely branch out of a synchronous function. Here are 
analogies to other languages:

- Rust has `Result` and a `?` operator to unpack it, while 
`folly::result` is a C++ short-circuiting coroutine, where `co_await` acts 
just like `?`.

- Haskell has `Maybe` & `Error` monads. A short-circuiting `co_await` loosely 
corresponds to the monadic `>>=`, whereas a short-circuiting `std::optional` 
would be an exact analog of `Maybe`.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits


@@ -9270,6 +9270,93 @@ Example:
 }];
 }
 
+def CoroAwaitSuspendDestroyDoc : Documentation {
+  let Category = DocCatDecl;
+  let Content = [{
+
+The ``[[clang::coro_await_suspend_destroy]]`` attribute may be applied to a C++
+coroutine awaiter type.  When this attribute is present, the awaiter must
+implement ``void await_suspend_destroy(Promise&)``.  If ``await_ready()``
+returns ``false`` at a suspension point, ``await_suspend_destroy`` will be
+called directly, bypassing the ``await_suspend(std::coroutine_handle<...>)``

snarkmaster wrote:

Will include in next update.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits

snarkmaster wrote:

Thanks for the initial review! I'm glad you like the idea. I'll finish 
polishing the e2e test, and address your inline comments 1-by-1 above.

> split out libcxx change

Okay, I'll do that for the next update, no problem.

> ... can cover https://github.com/llvm/llvm-project/issues/148380, right? 

For the trivial example of `void await_suspend(auto handle) { handle.destroy(); 
}`, yes, the user could use the new attribute.

But, I think it's possible to have coros that **conditionally** destroy the 
passed-in handle. So, if the user had something more complicated (e.g. coro 
that returns a handle, that also destroys the current coro), then my thing 
wouldn't help.

> Could you have a test for this?

My current IR test verifies that there's no heap alloc across a few examples. 
Were you thinking of a different kind of test? Can you be very specific since 
I'm totally new to LLVM?

> if you want, may be you can implement similar attributes to not invoke 
> AwaitSuspend intrinsic in different conditions.

I don't know when I'll next be able to make time to work on this, but if I do, 
my next target would be `std::suspend_never`. I hope this one could be done 
without an attribute, though.  Do you have other use-cases in mind?


https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread Chuanqi Xu via cfe-commits

ChuanqiXu9 wrote:

> @ChuanqiXu9,
> 
> (1) I'm not actually changing `libc++`. I just added an end-to-end test for 
> this compiler feature. It's backwards- and forwards-compatible, so I don't 
> think there's any risk to bundling it with this PR.

This is not about risks. It is simply about developing processes. In another 
word, it will be faster to merge this if you split it.

> 
> (2) My patch isn't a fully general solution to #148380. Rather, it's a 
> (probably) better solution to the more narrow class of problems of 
> short-circuiting suspends.
> 
> To use my thing, two things have to be true:
> 
> * You actually have to opt into the new attribute and define 
> `await_suspend_destroy` to get the perf benefits.
> * You can only do so if your `await_suspend` would destroy the coro 
> **unconditionally**.
> 
> When both of those are true (opt-in & unconditional deestroy), it results in 
> a much simpler compilation setup -- HALO, less IR, less optimized code, 
> better perf.

In another word, if users want, this patch can cover 
https://github.com/llvm/llvm-project/issues/148380, right? Could you have a 
test for this?



https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits

snarkmaster wrote:

@ChuanqiXu9,

(1) I'm not actually changing `libc++`. I just added an end-to-end test for 
this compiler feature. It's backwards- and forwards-compatible, so I don't 
think there's any risk to bundling it with this PR.

(2) My patch isn't a fully general solution to 
https://github.com/llvm/llvm-project/issues/148380. Rather, it's a (probably) 
better solution to the more narrow class of problems of short-circuiting 
suspends.

To use my thing, two things have to be true:
 - You actually have to opt into the new attribute and define 
`await_suspend_destroy` to get the perf benefits.
 -  You can only do so if your `await_suspend` would destroy the coro 
**unconditionally**.

When both of those are true (opt-in & unconditional deestroy), it results in a 
much simpler compilation setup -- HALO, less IR, less optimized code, better 
perf.


https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits

https://github.com/snarkmaster updated 
https://github.com/llvm/llvm-project/pull/152623

>From 9fc3169ea5f1aea2a88b2616b7c9c4f2949139be Mon Sep 17 00:00:00 2001
From: Alexey 
Date: Thu, 7 Aug 2025 12:10:07 -0700
Subject: [PATCH 1/3] Elide suspension points via
 [[clang::coro_await_suspend_destroy]]

Start by reading the detailed user-facing docs in `AttrDocs.td`.

My immediate motivation was that I noticed that short-circuiting coroutines
failed to optimize well.  Interact with the demo program here:
https://godbolt.org/z/E3YK5c45a

If Clang on Compiler Explorer supported [[clang::coro_await_suspend_destroy]],
the assembly for `simple_coro` would be drastically shorter, and would not
contain a call to `operator new`.

Here are a few high-level thoughts that don't belong in the docs:

  - This has `lit` tests, but what gives me real confidence in its correctness
is the integration test in `coro_await_suspend_destroy_test.cpp`.  This
caught all the interesting bugs that I had in earlier revs, and covers
equivalence to the standard code path in far more scenarios.

  - I considered a variety of other designs. Here are some key design points:

* I considered optimizing unmodified `await_suspend()` methods, as long as
  they unconditionally end with an `h.destroy()` call on the current
  handle, or an exception.  However, this would (a) force dynamic dispatch
  for `destroy` -- bloating IR & reducing optimization opportunities, (b)
  require far more complex, delicate, and fragile analysis, (c) retain more
  of the frame setup, so that e.g.  `h.done()` works properly.  The current
  solution shortcuts all these concerns.

* I want to `Promise&`, rather than `std::coroutine_handle` to
  `await_suspend_destroy` -- this is safer, simpler, and more efficient.
  Short-circuiting corotuines should not touch the handle.  This decision
  forces the attribue to go on the class.  Resolving a method attribute
  would have required looking up overloads for both types, and choosing
  one, which is costly and a bad UX to boot.

* `AttrDocs.td` tells portable code to provide a stub `await_suspend()`.
  This portability / compatibility solution avoids dire issues that would
  arise if users relied on `__has_cpp_attribute` and the declaration and
  definition happened to use different toolchains.  In particular, it will
  even be safe for a future compiler release to killswitch this attribute
  by removing its implementation and setting its version to 0.

```
let Spellings = [Clang<"coro_destroy_after_suspend", /*allowInC*/ 0,
 /*Version*/ 0>];
```

  - In the docs, I mention the `HasCoroSuspend` path in `CoroEarly.cpp` as
a further optimization opportunity.  But, I'm sure there are
higher-leverage ways of making these non-suspending coros compile better, I
just don't know the coro optimization pipeline well enough to flag them.

  - IIUC the only interaction of this with `coro_only_destroy_when_complete`
would be that the compiler expends fewer cycles.

  - I ran some benchmarks on [folly::result](
https://github.com/facebook/folly/blob/main/folly/result/docs/result.md).
Heap allocs are definitely elided, the compiled code looks like a function,
not a coroutine, but there's still an optimization gap.  On the plus side,
this results in a 4x speedup (!) in optimized ASAN builds (numbers not
shown for brevity.

```
// Simple result coroutine that adds 1 to the input
result result_coro(result&& r) {
  co_return co_await std::move(r) + 1;
}

// Non-coroutine equivalent using value_or_throw()
result catching_result_func(result&& r) {
  return result_catch_all([&]() -> result {
if (r.has_value()) {
  return r.value_or_throw() + 1;
}
return std::move(r).non_value();
  });
}

// Not QUITE equivalent to the coro -- lacks the exception boundary
result non_catching_result_func(result&& r) {
  if (r.has_value()) {
return r.value_or_throw() + 1;
  }
  return std::move(r).non_value();
}


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

result_coro_success13.61ns73.49M
non_catching_result_func_success3.39ns   295.00M
catching_result_func_success4.41ns   226.88M
result_coro_error  19.55ns51.16M
non_catching_result_func_error  9.15ns   109.26M
catching_result_func_error 10.19ns98.10M


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits

https://github.com/snarkmaster updated 
https://github.com/llvm/llvm-project/pull/152623

>From 9fc3169ea5f1aea2a88b2616b7c9c4f2949139be Mon Sep 17 00:00:00 2001
From: Alexey 
Date: Thu, 7 Aug 2025 12:10:07 -0700
Subject: [PATCH 1/3] Elide suspension points via
 [[clang::coro_await_suspend_destroy]]

Start by reading the detailed user-facing docs in `AttrDocs.td`.

My immediate motivation was that I noticed that short-circuiting coroutines
failed to optimize well.  Interact with the demo program here:
https://godbolt.org/z/E3YK5c45a

If Clang on Compiler Explorer supported [[clang::coro_await_suspend_destroy]],
the assembly for `simple_coro` would be drastically shorter, and would not
contain a call to `operator new`.

Here are a few high-level thoughts that don't belong in the docs:

  - This has `lit` tests, but what gives me real confidence in its correctness
is the integration test in `coro_await_suspend_destroy_test.cpp`.  This
caught all the interesting bugs that I had in earlier revs, and covers
equivalence to the standard code path in far more scenarios.

  - I considered a variety of other designs. Here are some key design points:

* I considered optimizing unmodified `await_suspend()` methods, as long as
  they unconditionally end with an `h.destroy()` call on the current
  handle, or an exception.  However, this would (a) force dynamic dispatch
  for `destroy` -- bloating IR & reducing optimization opportunities, (b)
  require far more complex, delicate, and fragile analysis, (c) retain more
  of the frame setup, so that e.g.  `h.done()` works properly.  The current
  solution shortcuts all these concerns.

* I want to `Promise&`, rather than `std::coroutine_handle` to
  `await_suspend_destroy` -- this is safer, simpler, and more efficient.
  Short-circuiting corotuines should not touch the handle.  This decision
  forces the attribue to go on the class.  Resolving a method attribute
  would have required looking up overloads for both types, and choosing
  one, which is costly and a bad UX to boot.

* `AttrDocs.td` tells portable code to provide a stub `await_suspend()`.
  This portability / compatibility solution avoids dire issues that would
  arise if users relied on `__has_cpp_attribute` and the declaration and
  definition happened to use different toolchains.  In particular, it will
  even be safe for a future compiler release to killswitch this attribute
  by removing its implementation and setting its version to 0.

```
let Spellings = [Clang<"coro_destroy_after_suspend", /*allowInC*/ 0,
 /*Version*/ 0>];
```

  - In the docs, I mention the `HasCoroSuspend` path in `CoroEarly.cpp` as
a further optimization opportunity.  But, I'm sure there are
higher-leverage ways of making these non-suspending coros compile better, I
just don't know the coro optimization pipeline well enough to flag them.

  - IIUC the only interaction of this with `coro_only_destroy_when_complete`
would be that the compiler expends fewer cycles.

  - I ran some benchmarks on [folly::result](
https://github.com/facebook/folly/blob/main/folly/result/docs/result.md).
Heap allocs are definitely elided, the compiled code looks like a function,
not a coroutine, but there's still an optimization gap.  On the plus side,
this results in a 4x speedup (!) in optimized ASAN builds (numbers not
shown for brevity.

```
// Simple result coroutine that adds 1 to the input
result result_coro(result&& r) {
  co_return co_await std::move(r) + 1;
}

// Non-coroutine equivalent using value_or_throw()
result catching_result_func(result&& r) {
  return result_catch_all([&]() -> result {
if (r.has_value()) {
  return r.value_or_throw() + 1;
}
return std::move(r).non_value();
  });
}

// Not QUITE equivalent to the coro -- lacks the exception boundary
result non_catching_result_func(result&& r) {
  if (r.has_value()) {
return r.value_or_throw() + 1;
  }
  return std::move(r).non_value();
}


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

result_coro_success13.61ns73.49M
non_catching_result_func_success3.39ns   295.00M
catching_result_func_success4.41ns   226.88M
result_coro_error  19.55ns51.16M
non_catching_result_func_error  9.15ns   109.26M
catching_result_func_error 10.19ns98.10M


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-08 Thread via cfe-commits

https://github.com/snarkmaster updated 
https://github.com/llvm/llvm-project/pull/152623

>From 9fc3169ea5f1aea2a88b2616b7c9c4f2949139be Mon Sep 17 00:00:00 2001
From: Alexey 
Date: Thu, 7 Aug 2025 12:10:07 -0700
Subject: [PATCH 1/2] Elide suspension points via
 [[clang::coro_await_suspend_destroy]]

Start by reading the detailed user-facing docs in `AttrDocs.td`.

My immediate motivation was that I noticed that short-circuiting coroutines
failed to optimize well.  Interact with the demo program here:
https://godbolt.org/z/E3YK5c45a

If Clang on Compiler Explorer supported [[clang::coro_await_suspend_destroy]],
the assembly for `simple_coro` would be drastically shorter, and would not
contain a call to `operator new`.

Here are a few high-level thoughts that don't belong in the docs:

  - This has `lit` tests, but what gives me real confidence in its correctness
is the integration test in `coro_await_suspend_destroy_test.cpp`.  This
caught all the interesting bugs that I had in earlier revs, and covers
equivalence to the standard code path in far more scenarios.

  - I considered a variety of other designs. Here are some key design points:

* I considered optimizing unmodified `await_suspend()` methods, as long as
  they unconditionally end with an `h.destroy()` call on the current
  handle, or an exception.  However, this would (a) force dynamic dispatch
  for `destroy` -- bloating IR & reducing optimization opportunities, (b)
  require far more complex, delicate, and fragile analysis, (c) retain more
  of the frame setup, so that e.g.  `h.done()` works properly.  The current
  solution shortcuts all these concerns.

* I want to `Promise&`, rather than `std::coroutine_handle` to
  `await_suspend_destroy` -- this is safer, simpler, and more efficient.
  Short-circuiting corotuines should not touch the handle.  This decision
  forces the attribue to go on the class.  Resolving a method attribute
  would have required looking up overloads for both types, and choosing
  one, which is costly and a bad UX to boot.

* `AttrDocs.td` tells portable code to provide a stub `await_suspend()`.
  This portability / compatibility solution avoids dire issues that would
  arise if users relied on `__has_cpp_attribute` and the declaration and
  definition happened to use different toolchains.  In particular, it will
  even be safe for a future compiler release to killswitch this attribute
  by removing its implementation and setting its version to 0.

```
let Spellings = [Clang<"coro_destroy_after_suspend", /*allowInC*/ 0,
 /*Version*/ 0>];
```

  - In the docs, I mention the `HasCoroSuspend` path in `CoroEarly.cpp` as
a further optimization opportunity.  But, I'm sure there are
higher-leverage ways of making these non-suspending coros compile better, I
just don't know the coro optimization pipeline well enough to flag them.

  - IIUC the only interaction of this with `coro_only_destroy_when_complete`
would be that the compiler expends fewer cycles.

  - I ran some benchmarks on [folly::result](
https://github.com/facebook/folly/blob/main/folly/result/docs/result.md).
Heap allocs are definitely elided, the compiled code looks like a function,
not a coroutine, but there's still an optimization gap.  On the plus side,
this results in a 4x speedup (!) in optimized ASAN builds (numbers not
shown for brevity.

```
// Simple result coroutine that adds 1 to the input
result result_coro(result&& r) {
  co_return co_await std::move(r) + 1;
}

// Non-coroutine equivalent using value_or_throw()
result catching_result_func(result&& r) {
  return result_catch_all([&]() -> result {
if (r.has_value()) {
  return r.value_or_throw() + 1;
}
return std::move(r).non_value();
  });
}

// Not QUITE equivalent to the coro -- lacks the exception boundary
result non_catching_result_func(result&& r) {
  if (r.has_value()) {
return r.value_or_throw() + 1;
  }
  return std::move(r).non_value();
}


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

result_coro_success13.61ns73.49M
non_catching_result_func_success3.39ns   295.00M
catching_result_func_success4.41ns   226.88M
result_coro_error  19.55ns51.16M
non_catching_result_func_error  9.15ns   109.26M
catching_result_func_error 10.19ns98.10M


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-07 Thread via cfe-commits

snarkmaster wrote:

@NewSigma, thanks! Two things:
 - I'm moving some code around, but the `await_suspend` path is 100% unchanged. 
I effectively just added a branch where a bunch of that stuff isn't needed.
 - I quickly ran my `folly::result` benchmark from the description under 
clang-17 rather than clang-19, and ... its performance is even worse on 17 than 
19. So, I don't think that issue will help here.
 
 The fundamental problem I'm addressing isn't lack of devirtualization, but 
lack of heap alloc elision.

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-07 Thread via cfe-commits

https://github.com/snarkmaster edited 
https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-07 Thread via cfe-commits

llvmbot wrote:



@llvm/pr-subscribers-libcxx

@llvm/pr-subscribers-coroutines

Author: None (snarkmaster)


Changes

Start by reading the detailed user-facing docs in `AttrDocs.td`.

My immediate motivation was that I noticed that short-circuiting coroutines 
failed to optimize well.  Interact with the demo program here: 
https://godbolt.org/z/E3YK5c45a

If Clang on Compiler Explorer supported [[clang::coro_await_suspend_destroy]], 
the assembly for `simple_coro` would be drastically shorter, and would not 
contain a call to `operator new`.

Here are a few high-level thoughts that don't belong in the docs:

  - This has `lit` tests, but what gives me real confidence in its correctness 
is the integration test in `coro_await_suspend_destroy_test.cpp`.  This caught 
all the interesting bugs that I had in earlier revs, and covers equivalence to 
the standard code path in far more scenarios.

  - I considered a variety of other designs. Here are some key design points:

* I considered optimizing unmodified `await_suspend()` methods, as long as 
they unconditionally end with an `h.destroy()` call on the current handle, or 
an exception.  However, this would (a) force dynamic dispatch for `destroy` -- 
bloating IR & reducing optimization opportunities, (b) require far more 
complex, delicate, and fragile analysis, (c) retain more of the frame setup, so 
that e.g.  `h.done()` works properly.  The current solution shortcuts all these 
concerns.

* I want to `Promise&`, rather than `std::coroutine_handle` to 
`await_suspend_destroy` -- this is safer, simpler, and more efficient. 
Short-circuiting corotuines should not touch the handle.  This decision forces 
the attribue to go on the class.  Resolving a method attribute would have 
required looking up overloads for both types, and choosing one, which is costly 
and a bad UX to boot.

* `AttrDocs.td` tells portable code to provide a stub `await_suspend()`. 
This portability / compatibility solution avoids dire issues that would arise 
if users relied on `__has_cpp_attribute` and the declaration and definition 
happened to use different toolchains.  In particular, it will even be safe for 
a future compiler release to killswitch this attribute by removing its 
implementation and setting its version to 0.

```
let Spellings = [Clang<"coro_destroy_after_suspend", /*allowInC*/ 0,
 /*Version*/ 0>];
```

  - In the docs, I mention the `HasCoroSuspend` path in `CoroEarly.cpp` as a 
further optimization opportunity.  But, I'm sure there are higher-leverage ways 
of making these non-suspending coros compile better, I just don't know the coro 
optimization pipeline well enough to flag them.

  - IIUC the only interaction of this with `coro_only_destroy_when_complete` 
would be that the compiler expends fewer cycles.

  - I ran some benchmarks on [folly::result]( 
https://github.com/facebook/folly/blob/main/folly/result/docs/result.md). Heap 
allocs are definitely elided, the compiled code looks like a function, not a 
coroutine, but there's still an optimization gap.  On the plus side, this 
results in a 4x speedup (!) in optimized ASAN builds (numbers not shown for 
brevity.

```cpp
// Simple result coroutine that adds 1 to the input
result result_coro(result&& r) {
  co_return co_await std::move(r) + 1;
}

// Non-coroutine equivalent using value_or_throw()
result catching_result_func(result&& r) {
  return result_catch_all([&]() -> result {
if (r.has_value()) {
  return r.value_or_throw() + 1;
}
return std::move(r).non_value();
  });
}

// Not QUITE equivalent to the coro -- lacks the exception boundary
result non_catching_result_func(result&& r) {
  if (r.has_value()) {
return r.value_or_throw() + 1;
  }
  return std::move(r).non_value();
}
```

```

[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

result_coro_success13.61ns73.49M
non_catching_result_func_success3.39ns   295.00M
catching_result_func_success4.41ns   226.88M
result_coro_error  19.55ns51.16M
non_catching_result_func_error  9.15ns   109.26M
catching_result_func_error 10.19ns98.10M


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

result_coro_success10.59ns94.39M
non_catching_result_func_success3.39ns   295.00M
catching_result_func_success4.07ns   245.81M
result

[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-07 Thread via cfe-commits

github-actions[bot] wrote:



Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this 
page.

If this is not working for you, it is probably because you do not have write 
permissions for the repository. In which case you can instead tag reviewers by 
name in a comment by using `@` followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a 
review by "ping"ing the PR by adding a comment “Ping”. The common courtesy 
"ping" rate is once a week. Please remember that you are asking for valuable 
time from other developers.

If you have further questions, they may be answered by the [LLVM GitHub User 
Guide](https://llvm.org/docs/GitHub.html).

You can also ask questions in a comment on this PR, on the [LLVM 
Discord](https://discord.com/invite/xS7Z362) or on the 
[forums](https://discourse.llvm.org/).

https://github.com/llvm/llvm-project/pull/152623
___
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [libcxx] Elide suspension points via [[clang::coro_await_suspend_destroy]] (PR #152623)

2025-08-07 Thread via cfe-commits

https://github.com/snarkmaster created 
https://github.com/llvm/llvm-project/pull/152623

Start by reading the detailed user-facing docs in `AttrDocs.td`.

My immediate motivation was that I noticed that short-circuiting coroutines 
failed to optimize well.  Interact with the demo program here: 
https://godbolt.org/z/E3YK5c45a

If Clang on Compiler Explorer supported [[clang::coro_await_suspend_destroy]], 
the assembly for `simple_coro` would be drastically shorter, and would not 
contain a call to `operator new`.

Here are a few high-level thoughts that don't belong in the docs:

  - This has `lit` tests, but what gives me real confidence in its correctness 
is the integration test in `coro_await_suspend_destroy_test.cpp`.  This caught 
all the interesting bugs that I had in earlier revs, and covers equivalence to 
the standard code path in far more scenarios.

  - I considered a variety of other designs. Here are some key design points:

* I considered optimizing unmodified `await_suspend()` methods, as long as 
they unconditionally end with an `h.destroy()` call on the current handle, or 
an exception.  However, this would (a) force dynamic dispatch for `destroy` -- 
bloating IR & reducing optimization opportunities, (b) require far more 
complex, delicate, and fragile analysis, (c) retain more of the frame setup, so 
that e.g.  `h.done()` works properly.  The current solution shortcuts all these 
concerns.

* I want to `Promise&`, rather than `std::coroutine_handle` to 
`await_suspend_destroy` -- this is safer, simpler, and more efficient. 
Short-circuiting corotuines should not touch the handle.  This decision forces 
the attribue to go on the class.  Resolving a method attribute would have 
required looking up overloads for both types, and choosing one, which is costly 
and a bad UX to boot.

* `AttrDocs.td` tells portable code to provide a stub `await_suspend()`. 
This portability / compatibility solution avoids dire issues that would arise 
if users relied on `__has_cpp_attribute` and the declaration and definition 
happened to use different toolchains.  In particular, it will even be safe for 
a future compiler release to killswitch this attribute by removing its 
implementation and setting its version to 0.

```
let Spellings = [Clang<"coro_destroy_after_suspend", /*allowInC*/ 0,
 /*Version*/ 0>];
```

  - In the docs, I mention the `HasCoroSuspend` path in `CoroEarly.cpp` as a 
further optimization opportunity.  But, I'm sure there are higher-leverage ways 
of making these non-suspending coros compile better, I just don't know the coro 
optimization pipeline well enough to flag them.

  - IIUC the only interaction of this with `coro_only_destroy_when_complete` 
would be that the compiler expends fewer cycles.

  - I ran some benchmarks on [folly::result]( 
https://github.com/facebook/folly/blob/main/folly/result/docs/result.md). Heap 
allocs are definitely elided, the compiled code looks like a function, not a 
coroutine, but there's still an optimization gap.  On the plus side, this 
results in a 4x speedup (!) in optimized ASAN builds (numbers not shown for 
brevity.

```cpp
// Simple result coroutine that adds 1 to the input
result result_coro(result&& r) {
  co_return co_await std::move(r) + 1;
}

// Non-coroutine equivalent using value_or_throw()
result catching_result_func(result&& r) {
  return result_catch_all([&]() -> result {
if (r.has_value()) {
  return r.value_or_throw() + 1;
}
return std::move(r).non_value();
  });
}

// Not QUITE equivalent to the coro -- lacks the exception boundary
result non_catching_result_func(result&& r) {
  if (r.has_value()) {
return r.value_or_throw() + 1;
  }
  return std::move(r).non_value();
}
```

```

[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

result_coro_success13.61ns73.49M
non_catching_result_func_success3.39ns   295.00M
catching_result_func_success4.41ns   226.88M
result_coro_error  19.55ns51.16M
non_catching_result_func_error  9.15ns   109.26M
catching_result_func_error 10.19ns98.10M


[...]lly/result/test/result_coro_bench.cpp relative  time/iter   iters/s

result_coro_success10.59ns94.39M
non_catching_result_func_success3.39ns   295.00M
catching_result_func_success4.07ns   245.81M
result_coro_error  13.66ns