Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
Hello, Joel! > Hi Uladzislau, > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > > Switch for using of get_state_synchronize_rcu_full() and > > poll_state_synchronize_rcu_full() pair to debug a normal > > synchronize_rcu() call. > > > > Just using "not" full APIs to identify if a grace period is > > passed or not might lead to a false-positive kernel splat. > > > > It can happen, because get_state_synchronize_rcu() compresses > > both normal and expedited states into one single unsigned long > > value, so a poll_state_synchronize_rcu() can miss GP-completion > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > > run. > > Agreed, I provided a scenario below but let me know if I missed anything. > > > To address this, switch to poll_state_synchronize_rcu_full() and > > get_state_synchronize_rcu_full() APIs, which use separate variables > > for expedited and normal states. > > Reviewed-by: Joel Fernandes > > For completeness and just to clarify how this may happen, firstly as noted: > rcu_poll_gp_seq_start/end() is called for both begin/end of normal and exp > GPs thus compressing the use of the rcu_state.gp_seq_polled counter for > both normal and exp GPs. > > Then if we intersperse synchronize_rcu() with synchronize_rcu_expedited(), > something like the following may happen. > > CPU 0 CPU 1 > > synchronize_rcu_expedited() > // -> rcu_poll_gp_seq_start() > // This does rcu_seq_start on the > // gp_seq_polled and > // notes the started gp_seq_polled > // (say its 5) > synchronize_rcu() > -> synchronize_rcu_normal() > -> rs.head.func = > get_state_synchronize_rcu(); > // saves the value 12 > > > -> rcu_gp_init() > -> rcu_poll_gp_seq_start() > // rcu_seq_start does nothing > // but notes the pre-started > // gp_seq_polled (value 5) > > -> rcu_gp_cleanup() > // -> rcu_poll_gp_seq_end() > // ends the gp_seq_polled since it > // matches prior saved gp_seq_polled (5) > // new gp_seq_polled is 8. > > /* NORMAL GP COMPLETES */ > > rcu_gp_cleanup() > -> rcu_sr_normal_gp_cleanup() >-> rcu_sr_normal_complete() > -> poll_state_synchronize_rcu() >-> returns FALSE because gp_seq_polled is still 8. >-> Warning (false positive) > > Thank you for clarification, this is good for archive :) -- Uladzislau Rezki
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
Hi Uladzislau, On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote: > Switch for using of get_state_synchronize_rcu_full() and > poll_state_synchronize_rcu_full() pair to debug a normal > synchronize_rcu() call. > > Just using "not" full APIs to identify if a grace period is > passed or not might lead to a false-positive kernel splat. > > It can happen, because get_state_synchronize_rcu() compresses > both normal and expedited states into one single unsigned long > value, so a poll_state_synchronize_rcu() can miss GP-completion > when synchronize_rcu()/synchronize_rcu_expedited() concurrently > run. Agreed, I provided a scenario below but let me know if I missed anything. > To address this, switch to poll_state_synchronize_rcu_full() and > get_state_synchronize_rcu_full() APIs, which use separate variables > for expedited and normal states. Reviewed-by: Joel Fernandes For completeness and just to clarify how this may happen, firstly as noted: rcu_poll_gp_seq_start/end() is called for both begin/end of normal and exp GPs thus compressing the use of the rcu_state.gp_seq_polled counter for both normal and exp GPs. Then if we intersperse synchronize_rcu() with synchronize_rcu_expedited(), something like the following may happen. CPU 0 CPU 1 synchronize_rcu_expedited() // -> rcu_poll_gp_seq_start() // This does rcu_seq_start on the // gp_seq_polled and // notes the started gp_seq_polled // (say its 5) synchronize_rcu() -> synchronize_rcu_normal() -> rs.head.func = get_state_synchronize_rcu(); // saves the value 12 -> rcu_gp_init() -> rcu_poll_gp_seq_start() // rcu_seq_start does nothing // but notes the pre-started // gp_seq_polled (value 5) -> rcu_gp_cleanup() // -> rcu_poll_gp_seq_end() // ends the gp_seq_polled since it // matches prior saved gp_seq_polled (5) // new gp_seq_polled is 8. /* NORMAL GP COMPLETES */ rcu_gp_cleanup() -> rcu_sr_normal_gp_cleanup() -> rcu_sr_normal_complete() -> poll_state_synchronize_rcu() -> returns FALSE because gp_seq_polled is still 8. -> Warning (false positive) thanks, - Joel
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On 3/4/2025 9:54 PM, Boqun Feng wrote:
> On Tue, Mar 04, 2025 at 11:56:18AM +0100, Uladzislau Rezki wrote:
>> On Tue, Mar 04, 2025 at 11:52:26AM +0100, Uladzislau Rezki wrote:
> Did I get that right?
>
Other than I'm unable to follow what do you mean "WH has not been
injected, so nothing to wait on", maybe because I am missing some
terminology from you ;-) I think it's a good analysis, thank you!
> I think this is a real bug AFAICS, hoping all the memory barriers are in
> place to make sure the code reordering also correctly orders the accesses.
> I'll double check that.
>
> I also feel its 'theoretical', because as long as rcu_gp_init() and
> rcu_gp_cleanup() are properly ordered WRT pre-existing readers, then
> synchronize_rcu_normal() still waits for pre-existing readers even though
> its
> a bit confused about the value of the cookies.
>
> For the fix,
> Reviewed-by: Joel Fernandes (Google)
>
> (If possible, include a Link: to my (this) post so that the sequence of
> events is further clarified.)
>
Will add the tag (with the email you really want ;-)) and a link to this
email to the patch. Thanks!
>>>
>>> CPU_1:| CPU_2:
>>> # Increase a seq-number |
>>> rcu_seq_start(&rcu_state.gp_seq); |
>>> | add_client() {
>>> | # Record a gp-sec state
>>> |
>>> get_state_synchronize_rcu_full(&rs.oldstate);
>>> | }
>>> |
>>> | rcu_sr_normal_gp_init() {
>>> | add a dummy-wait-head;
>>> | }
>>>
>>>
>>> A client has been added with already updated gp-sec number, i.e.
>>> "oldstate" would refer to this GP, not to previous. A
>>> poll_state_synchronize_rcu_full()
>>> will complain because this GP is not passed, it will on a next iteration.
>>>
>>> This is how i see this.
>>>
>> Updated the plain-text, removed tabs:
>>
>> CPU_1: | CPU_2:
>># Increase a seq-number |
>>rcu_seq_start(&rcu_state.gp_seq); |
>>| add_client() {
>>| # Record a gp-sec state
>>|
>> get_state_synchronize_rcu_full(&rs.oldstate);
>>| }
>>|
>>| rcu_sr_normal_gp_init() {
>>| add a dummy-wait-head;
>>| }
>>
>
> Thank you. I added links from you and Joel as the detailed explanation
> to the commit log, and the comment I proposed[1].
>
> [1]: https://lore.kernel.org/rcu/Z8SnhS_LnzN_wvxr@tardis/
>
Yep, I am in line with Vlad's explanation as well, and add links to both
explanations sounds perfect, thanks!
- Joel
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On 3/3/2025 10:23 PM, Boqun Feng wrote:
> On Sun, Mar 02, 2025 at 07:15:07PM -0500, Joel Fernandes wrote:
>> On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote:
>>> On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> Hello, Paul!
>
>>
>> Except that I got this from overnight testing of rcu/dev on the
>> shared
>> RCU tree:
>>
>> WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
>> rcu_sr_normal_complete+0x5c/0x80
>>
>> I see this only on TREE05. Which should not be too surprising, given
>> that this is the scenario that tests it. It happened within five
>> minutes
>> on all 14 of the TREE05 runs.
>>
> Hm.. This is not fun. I tested this on my system and i did not manage
> to
> trigger this whereas you do. Something is wrong.
If you have a debug patch, I would be happy to give it a go.
>>> I can trigger it. But.
>>>
>>> Some background. I tested those patches during many hours on the stable
>>> kernel which is 6.13. On that kernel i was not able to trigger it.
>>> Running
>>> the rcutorture on the our shared "dev" tree, which i did now, triggers
>>> this
>>> right away.
>>
>> Bisection? (Hey, you knew that was coming!)
>>
> Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start
> detection
>
> After revert in the dev, rcutorture passes TREE05, 16 instances.
Huh. We sure don't get to revert that one...
Do we have a problem with the ordering in rcu_gp_init() between the calls
to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example,
do we need to capture the relevant portion of the list before the call
to rcu_seq_start(), and do the grace-period-start work afterwards?
>>>
>>> I tried moving the call to rcu_sr_normal_gp_init() before the call to
>>> rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
>>> Which does not necessarily mean that this is the correct fix, but I
>>> figured that it might at least provide food for thought.
>>>
>>> Thanx, Paul
>>>
>>>
>>>
>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>>> index 48384fa2eaeb8..d3efeff7740e7 100644
>>> --- a/kernel/rcu/tree.c
>>> +++ b/kernel/rcu/tree.c
>>> @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void)
>>>
>>>/* Advance to a new grace period and initialize state. */
>>>record_gp_stall_check_time();
>>> + start_new_poll = rcu_sr_normal_gp_init();
>>>/* Record GP times before starting GP, hence rcu_seq_start(). */
>>>rcu_seq_start(&rcu_state.gp_seq);
>>>ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
>>> - start_new_poll = rcu_sr_normal_gp_init();
>>>trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
>>> TPS("start"));
>>
>> Oh... so the bug is this? Good catch...
>>
>>
>> CPU 0 CPU 1
>>
>> rcu_gp_init()
>>
>> rcu_seq_start(rcu_state.gp_seq)
>> sychronize_rcu_normal()
>> rs.head.func
>> = (void *) get_state_synchronize_rcu();
>>// save rcu_state.gp_seq
>> rcu_sr_normal_add_req() ->
>> llist_add(rcu_state.srs_next)
>> (void) start_poll_synchronize_rcu();
>>
>>
>
> This means synchronize_rcu_normal() got the new state of gp_seq, but
> its wait request was inserted before the new wait_head, therefore..
>
>> sr_normal_gp_init()
>>
>> llist_add(wait_head, &rcu_state.srs_next);
>> // pick up the
>> // injected WH
>>
>> rcu_state.srs_wait_tail = wait_head;
>>
>> rcu_gp_cleanup()
>>
>> rcu_seq_end(&rcu_state.gp_seq);
>> sr_normal_complete()
>
> at rcu_gp_cleanup() time, rcu_sr_normal_complete() complete the
> corresponding wait request, however, the sychronize_rcu_normal()
> observed the new gp_seq, its poll state will expect the next gp, hence
> the WARN_ONCE().
>
> Yes, I believe this is the scenario for the bug.
>
>>
>> WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) &&
>>
>> !poll_sta
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Tue, Mar 04, 2025 at 11:56:18AM +0100, Uladzislau Rezki wrote:
> On Tue, Mar 04, 2025 at 11:52:26AM +0100, Uladzislau Rezki wrote:
> > > > Did I get that right?
> > > >
> > >
> > > Other than I'm unable to follow what do you mean "WH has not been
> > > injected, so nothing to wait on", maybe because I am missing some
> > > terminology from you ;-) I think it's a good analysis, thank you!
> > >
> > > > I think this is a real bug AFAICS, hoping all the memory barriers are in
> > > > place to make sure the code reordering also correctly orders the
> > > > accesses.
> > > > I'll double check that.
> > > >
> > > > I also feel its 'theoretical', because as long as rcu_gp_init() and
> > > > rcu_gp_cleanup() are properly ordered WRT pre-existing readers, then
> > > > synchronize_rcu_normal() still waits for pre-existing readers even
> > > > though its
> > > > a bit confused about the value of the cookies.
> > > >
> > > > For the fix,
> > > > Reviewed-by: Joel Fernandes (Google)
> > > >
> > > > (If possible, include a Link: to my (this) post so that the sequence of
> > > > events is further clarified.)
> > > >
> > >
> > > Will add the tag (with the email you really want ;-)) and a link to this
> > > email to the patch. Thanks!
> > >
> >
> > CPU_1:| CPU_2:
> > # Increase a seq-number |
> > rcu_seq_start(&rcu_state.gp_seq); |
> > | add_client() {
> > | # Record a gp-sec state
> > |
> > get_state_synchronize_rcu_full(&rs.oldstate);
> > | }
> > |
> > | rcu_sr_normal_gp_init() {
> > | add a dummy-wait-head;
> > | }
> >
> >
> > A client has been added with already updated gp-sec number, i.e.
> > "oldstate" would refer to this GP, not to previous. A
> > poll_state_synchronize_rcu_full()
> > will complain because this GP is not passed, it will on a next iteration.
> >
> > This is how i see this.
> >
> Updated the plain-text, removed tabs:
>
> CPU_1: | CPU_2:
># Increase a seq-number |
>rcu_seq_start(&rcu_state.gp_seq); |
>| add_client() {
>| # Record a gp-sec state
>|
> get_state_synchronize_rcu_full(&rs.oldstate);
>| }
>|
>| rcu_sr_normal_gp_init() {
>| add a dummy-wait-head;
>| }
>
Thank you. I added links from you and Joel as the detailed explanation
to the commit log, and the comment I proposed[1].
[1]: https://lore.kernel.org/rcu/Z8SnhS_LnzN_wvxr@tardis/
Regards,
Boqun
> --
> Uladzislau Rezki
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Tue, Mar 04, 2025 at 11:52:26AM +0100, Uladzislau Rezki wrote:
> > > Did I get that right?
> > >
> >
> > Other than I'm unable to follow what do you mean "WH has not been
> > injected, so nothing to wait on", maybe because I am missing some
> > terminology from you ;-) I think it's a good analysis, thank you!
> >
> > > I think this is a real bug AFAICS, hoping all the memory barriers are in
> > > place to make sure the code reordering also correctly orders the accesses.
> > > I'll double check that.
> > >
> > > I also feel its 'theoretical', because as long as rcu_gp_init() and
> > > rcu_gp_cleanup() are properly ordered WRT pre-existing readers, then
> > > synchronize_rcu_normal() still waits for pre-existing readers even though
> > > its
> > > a bit confused about the value of the cookies.
> > >
> > > For the fix,
> > > Reviewed-by: Joel Fernandes (Google)
> > >
> > > (If possible, include a Link: to my (this) post so that the sequence of
> > > events is further clarified.)
> > >
> >
> > Will add the tag (with the email you really want ;-)) and a link to this
> > email to the patch. Thanks!
> >
>
> CPU_1:| CPU_2:
> # Increase a seq-number |
> rcu_seq_start(&rcu_state.gp_seq); |
> | add_client() {
> | # Record a gp-sec state
> |
> get_state_synchronize_rcu_full(&rs.oldstate);
> | }
> |
> | rcu_sr_normal_gp_init() {
> | add a dummy-wait-head;
> | }
>
>
> A client has been added with already updated gp-sec number, i.e.
> "oldstate" would refer to this GP, not to previous. A
> poll_state_synchronize_rcu_full()
> will complain because this GP is not passed, it will on a next iteration.
>
> This is how i see this.
>
Updated the plain-text, removed tabs:
CPU_1: | CPU_2:
# Increase a seq-number |
rcu_seq_start(&rcu_state.gp_seq); |
| add_client() {
| # Record a gp-sec state
|
get_state_synchronize_rcu_full(&rs.oldstate);
| }
|
| rcu_sr_normal_gp_init() {
| add a dummy-wait-head;
| }
--
Uladzislau Rezki
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
> > Did I get that right?
> >
>
> Other than I'm unable to follow what do you mean "WH has not been
> injected, so nothing to wait on", maybe because I am missing some
> terminology from you ;-) I think it's a good analysis, thank you!
>
> > I think this is a real bug AFAICS, hoping all the memory barriers are in
> > place to make sure the code reordering also correctly orders the accesses.
> > I'll double check that.
> >
> > I also feel its 'theoretical', because as long as rcu_gp_init() and
> > rcu_gp_cleanup() are properly ordered WRT pre-existing readers, then
> > synchronize_rcu_normal() still waits for pre-existing readers even though
> > its
> > a bit confused about the value of the cookies.
> >
> > For the fix,
> > Reviewed-by: Joel Fernandes (Google)
> >
> > (If possible, include a Link: to my (this) post so that the sequence of
> > events is further clarified.)
> >
>
> Will add the tag (with the email you really want ;-)) and a link to this
> email to the patch. Thanks!
>
CPU_1:| CPU_2:
# Increase a seq-number |
rcu_seq_start(&rcu_state.gp_seq); |
| add_client() {
| # Record a gp-sec state
|
get_state_synchronize_rcu_full(&rs.oldstate);
| }
|
| rcu_sr_normal_gp_init() {
| add a dummy-wait-head;
| }
A client has been added with already updated gp-sec number, i.e.
"oldstate" would refer to this GP, not to previous. A
poll_state_synchronize_rcu_full()
will complain because this GP is not passed, it will on a next iteration.
This is how i see this.
--
Uladzislau Rezki
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Sun, Mar 02, 2025 at 07:15:07PM -0500, Joel Fernandes wrote:
> On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> > > > Hello, Paul!
> > > >
> > > > > > > > >
> > > > > > > > > Except that I got this from overnight testing of rcu/dev on
> > > > > > > > > the shared
> > > > > > > > > RCU tree:
> > > > > > > > >
> > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > > > > > > > rcu_sr_normal_complete+0x5c/0x80
> > > > > > > > >
> > > > > > > > > I see this only on TREE05. Which should not be too
> > > > > > > > > surprising, given
> > > > > > > > > that this is the scenario that tests it. It happened within
> > > > > > > > > five minutes
> > > > > > > > > on all 14 of the TREE05 runs.
> > > > > > > > >
> > > > > > > > Hm.. This is not fun. I tested this on my system and i did not
> > > > > > > > manage to
> > > > > > > > trigger this whereas you do. Something is wrong.
> > > > > > >
> > > > > > > If you have a debug patch, I would be happy to give it a go.
> > > > > > >
> > > > > > I can trigger it. But.
> > > > > >
> > > > > > Some background. I tested those patches during many hours on the
> > > > > > stable
> > > > > > kernel which is 6.13. On that kernel i was not able to trigger it.
> > > > > > Running
> > > > > > the rcutorture on the our shared "dev" tree, which i did now,
> > > > > > triggers this
> > > > > > right away.
> > > > >
> > > > > Bisection? (Hey, you knew that was coming!)
> > > > >
> > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start
> > > > detection
> > > >
> > > > After revert in the dev, rcutorture passes TREE05, 16 instances.
> > >
> > > Huh. We sure don't get to revert that one...
> > >
> > > Do we have a problem with the ordering in rcu_gp_init() between the calls
> > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example,
> > > do we need to capture the relevant portion of the list before the call
> > > to rcu_seq_start(), and do the grace-period-start work afterwards?
> >
> > I tried moving the call to rcu_sr_normal_gp_init() before the call to
> > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
> > Which does not necessarily mean that this is the correct fix, but I
> > figured that it might at least provide food for thought.
> >
> > Thanx, Paul
> >
> >
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 48384fa2eaeb8..d3efeff7740e7 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void)
> >
> >/* Advance to a new grace period and initialize state. */
> >record_gp_stall_check_time();
> > + start_new_poll = rcu_sr_normal_gp_init();
> >/* Record GP times before starting GP, hence rcu_seq_start(). */
> >rcu_seq_start(&rcu_state.gp_seq);
> >ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > - start_new_poll = rcu_sr_normal_gp_init();
> >trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > TPS("start"));
>
> Oh... so the bug is this? Good catch...
>
>
> CPU 0 CPU 1
>
> rcu_gp_init()
>
> rcu_seq_start(rcu_state.gp_seq)
> sychronize_rcu_normal()
> rs.head.func
> = (void *) get_state_synchronize_rcu();
>// save rcu_state.gp_seq
> rcu_sr_normal_add_req() ->
> llist_add(rcu_state.srs_next)
> (void) start_poll_synchronize_rcu();
>
>
This means synchronize_rcu_normal() got the new state of gp_seq, but
its wait request was inserted before the new wait_head, therefore..
> sr_normal_gp_init()
>
> llist_add(wait_head, &rcu_state.srs_next);
> // pick up the
> // injected WH
>
> rcu_state.srs_wait_tail = wait_head;
>
> rcu_gp_cleanup()
>
> rcu_seq_end(&rcu_state.gp_seq);
> sr_normal_complete()
at rcu_gp_cleanup() time, rcu_sr_normal_complete() complete the
corresponding wait request, however, the sychronize_rcu_normal()
observed the new gp_seq, its poll state will expect the next gp, hence
the WARN_ONCE().
Yes, I believe this is the scenario for the bug.
>
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On 3/3/2025 12:07 PM, Boqun Feng wrote: > On Mon, Mar 03, 2025 at 12:00:40PM -0500, Joel Fernandes wrote: > [...] >> >> I see the original patch "rcu: Fix get_state_synchronize_rcu_full() GP-start >> detection" is not yet on -next. Once we are convinced about the fix, do we >> want to squash the fix into this patch and have Boqun take it? >> > > Which "-next" are you talking about? The original patch and the fix is > already in next-20250303 of linux-next: > > > https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?h=next-20250303&qt=range&q=153fc45000e0058435ec0609258fb16e7ea257d2 I see it now during manual inspection, but I'm confused why git cherry tells me otherwise. I tried the following command and it shows the patch in question in the first line of output. Basically the question that the command asks is "What is in Paul's dev branch that is not in RCU tree's -next branch". This question is asked for the obvious raisins. So I am obviously missing something in the command. Thoughts? (rcugit is the RCU tree, and paul/dev is Paul's dev branch) git cherry --abbrev -v rcugit/next paul/dev | grep "^+" | cut -d' ' -f2,3- 012f47f0f806 rcu: Fix get_state_synchronize_rcu_full() GP-start detection 2ada0addbdb6 tools/memory-model: Add atomic_and()/or()/xor() and add_negative e176ebffc3f4 tools/memory-model: Add atomic_andnot() with its variants de6f99723392 tools/memory-model: Legitimize current use of tags in LKMM macros 723177d71224 tools/memory-model: Define applicable tags on operation in tools/... 29279349a566 tools/memory-model: Define effect of Mb tags on RMWs in tools/... d80a8e016433 MAINTAINERS: Update Joel's email address fafa18068359 tools/memory-model: Switch to softcoded herd7 tags dcc5197839f2 tools/memory-model: Distinguish between syntactic and semantic tags fa9e35a0772a tools/memory-model/README: Fix typo a2bfbf847c96 tools/memory-model: glossary.txt: Fix indents 3839dbb05869 rcutorture: Make srcu_lockdep.sh check kernel Kconfig b5aa1c489085 rcutorture: Make srcu_lockdep.sh check reader-conflict handling 9a5720bad9ed rcu: Remove swake_up_one_online() bandaid 04159042a62b Revert "rcu/nocb: Fix rcuog wake-up from offline softirq" fdc37fed1c81 rcutorture: Split out beginning and end from rcu_torture_one_read() 3c6b1925361e rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RT fadc715785cc rcutorture: Add tests for SRCU up/down reader primitives 90a8f490324c rcutorture: Pull rcu_torture_updown() loop body into new function 5fbaa5179f6a rcutorture: Comment invocations of tick_dep_set_task() 461810471faa rcutorture: Complain if an ->up_read() is delayed more than 10 seconds 35e1a180319d rcutorture: Check for ->up_read() without matching ->down_read() 0676ba797dfa EXP srcu: Enable Tiny SRCU On all CONFIG_SMP=n kernels c8fff13fd2fd EXP rcutorture: Add SRCU-V scenario for preemptible Tiny SRCU 910a5f9ebf5f EXP rcutorture: Limit callback flooding for Tiny SRCU in preemptible kernels 8979a891a365 EXP hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On 3/3/2025 1:55 PM, Paul E. McKenney wrote: >> I tried the following command and it shows the patch in question in the first >> line of output. Basically the question that the command asks is "What is in >> Paul's dev branch that is not in RCU tree's -next branch". This question is >> asked for the obvious raisins. >> So I am obviously missing something in the command. Thoughts? >> >> (rcugit is the RCU tree, and paul/dev is Paul's dev branch) >> >> git cherry --abbrev -v rcugit/next paul/dev | grep "^+" | cut -d' ' -f2,3- > I must defer to others on this one. I must confess that I have not yet > found a good use case for "git cherry". In this case git cherry was seemingly correct because it detected a difference between the same patch in your dev branch versus the next branch [1]. It seems the next branch has the fix with the reordering of the statements. So it thought that your branch had a patch that wasn't there in next, where in reality the one in next was "newer". [1] https://web.git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git/commit/?h=misc.2025.03.02a&id=153fc45000e0058435ec0609258fb16e7ea257d2 thanks, - Joel
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Mon, Mar 03, 2025 at 12:30:51PM -0500, Joel Fernandes wrote: > > > On 3/3/2025 12:07 PM, Boqun Feng wrote: > > On Mon, Mar 03, 2025 at 12:00:40PM -0500, Joel Fernandes wrote: > > [...] > >> > >> I see the original patch "rcu: Fix get_state_synchronize_rcu_full() > >> GP-start > >> detection" is not yet on -next. Once we are convinced about the fix, do we > >> want to squash the fix into this patch and have Boqun take it? > >> > > > > Which "-next" are you talking about? The original patch and the fix is > > already in next-20250303 of linux-next: > > > > > > https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?h=next-20250303&qt=range&q=153fc45000e0058435ec0609258fb16e7ea257d2 > I see it now during manual inspection, but I'm confused why git cherry tells > me > otherwise. The gitk command for the win! ;-) Or, in non-GUI environments, tig. > I tried the following command and it shows the patch in question in the first > line of output. Basically the question that the command asks is "What is in > Paul's dev branch that is not in RCU tree's -next branch". This question is > asked for the obvious raisins. > So I am obviously missing something in the command. Thoughts? > > (rcugit is the RCU tree, and paul/dev is Paul's dev branch) > > git cherry --abbrev -v rcugit/next paul/dev | grep "^+" | cut -d' ' -f2,3- I must defer to others on this one. I must confess that I have not yet found a good use case for "git cherry". Thanx, Paul > 012f47f0f806 rcu: Fix get_state_synchronize_rcu_full() GP-start detection > 2ada0addbdb6 tools/memory-model: Add atomic_and()/or()/xor() and add_negative > e176ebffc3f4 tools/memory-model: Add atomic_andnot() with its variants > de6f99723392 tools/memory-model: Legitimize current use of tags in LKMM macros > 723177d71224 tools/memory-model: Define applicable tags on operation in > tools/... > 29279349a566 tools/memory-model: Define effect of Mb tags on RMWs in tools/... > d80a8e016433 MAINTAINERS: Update Joel's email address > fafa18068359 tools/memory-model: Switch to softcoded herd7 tags > dcc5197839f2 tools/memory-model: Distinguish between syntactic and semantic > tags > fa9e35a0772a tools/memory-model/README: Fix typo > a2bfbf847c96 tools/memory-model: glossary.txt: Fix indents > 3839dbb05869 rcutorture: Make srcu_lockdep.sh check kernel Kconfig > b5aa1c489085 rcutorture: Make srcu_lockdep.sh check reader-conflict handling > 9a5720bad9ed rcu: Remove swake_up_one_online() bandaid > 04159042a62b Revert "rcu/nocb: Fix rcuog wake-up from offline softirq" > fdc37fed1c81 rcutorture: Split out beginning and end from > rcu_torture_one_read() > 3c6b1925361e rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RT > fadc715785cc rcutorture: Add tests for SRCU up/down reader primitives > 90a8f490324c rcutorture: Pull rcu_torture_updown() loop body into new function > 5fbaa5179f6a rcutorture: Comment invocations of tick_dep_set_task() > 461810471faa rcutorture: Complain if an ->up_read() is delayed more than 10 > seconds > 35e1a180319d rcutorture: Check for ->up_read() without matching ->down_read() > 0676ba797dfa EXP srcu: Enable Tiny SRCU On all CONFIG_SMP=n kernels > c8fff13fd2fd EXP rcutorture: Add SRCU-V scenario for preemptible Tiny SRCU > 910a5f9ebf5f EXP rcutorture: Limit callback flooding for Tiny SRCU in > preemptible kernels > 8979a891a365 EXP hrtimers: Force migrate away hrtimers queued after > CPUHP_AP_HRTIMERS_DYING > >
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On 3/3/2025 12:30 PM, Joel Fernandes wrote: > > > On 3/3/2025 12:07 PM, Boqun Feng wrote: >> On Mon, Mar 03, 2025 at 12:00:40PM -0500, Joel Fernandes wrote: >> [...] >>> >>> I see the original patch "rcu: Fix get_state_synchronize_rcu_full() GP-start >>> detection" is not yet on -next. Once we are convinced about the fix, do we >>> want to squash the fix into this patch and have Boqun take it? >>> >> >> Which "-next" are you talking about? The original patch and the fix is >> already in next-20250303 of linux-next: >> >> >> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?h=next-20250303&qt=range&q=153fc45000e0058435ec0609258fb16e7ea257d2 > I see it now during manual inspection, but I'm confused why git cherry tells > me > otherwise. > > I tried the following command and it shows the patch in question in the first > line of output. Basically the question that the command asks is "What is in > Paul's dev branch that is not in RCU tree's -next branch". This question is > asked for the obvious raisins. > So I am obviously missing something in the command. Thoughts? > > (rcugit is the RCU tree, and paul/dev is Paul's dev branch) > > git cherry --abbrev -v rcugit/next paul/dev | grep "^+" | cut -d' ' -f2,3- > > 012f47f0f806 rcu: Fix get_state_synchronize_rcu_full() GP-start detection Ah, its because Paul's dev branch has an older version of the patch I think (without the fix being discussed here) :-D So that explains it... thanks, - Joel
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Mon, Mar 03, 2025 at 12:00:40PM -0500, Joel Fernandes wrote: [...] > > I see the original patch "rcu: Fix get_state_synchronize_rcu_full() GP-start > detection" is not yet on -next. Once we are convinced about the fix, do we > want to squash the fix into this patch and have Boqun take it? > Which "-next" are you talking about? The original patch and the fix is already in next-20250303 of linux-next: https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/log/?h=next-20250303&qt=range&q=153fc45000e0058435ec0609258fb16e7ea257d2 Regards, Boqun > Yet another idea is to apply it for 6.15-rc cycle if more time is needed. > > Alternatively, we could squash it and I queue it for 6.16 instead of 6.15. > > And I'm guessing Vlad's series is also for 6.16. > > thanks, > > - Joel >
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Sun, Mar 02, 2025 at 07:17:10PM -0500, Joel Fernandes wrote:
> On Sun, Mar 02, 2025 at 07:15:07PM -0500, Joel Fernandes wrote:
> > On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> > > > > Hello, Paul!
> > > > >
> > > > > > > > > >
> > > > > > > > > > Except that I got this from overnight testing of rcu/dev on
> > > > > > > > > > the shared
> > > > > > > > > > RCU tree:
> > > > > > > > > >
> > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > > > > > > > > rcu_sr_normal_complete+0x5c/0x80
> > > > > > > > > >
> > > > > > > > > > I see this only on TREE05. Which should not be too
> > > > > > > > > > surprising, given
> > > > > > > > > > that this is the scenario that tests it. It happened
> > > > > > > > > > within five minutes
> > > > > > > > > > on all 14 of the TREE05 runs.
> > > > > > > > > >
> > > > > > > > > Hm.. This is not fun. I tested this on my system and i did
> > > > > > > > > not manage to
> > > > > > > > > trigger this whereas you do. Something is wrong.
> > > > > > > >
> > > > > > > > If you have a debug patch, I would be happy to give it a go.
> > > > > > > >
> > > > > > > I can trigger it. But.
> > > > > > >
> > > > > > > Some background. I tested those patches during many hours on the
> > > > > > > stable
> > > > > > > kernel which is 6.13. On that kernel i was not able to trigger
> > > > > > > it. Running
> > > > > > > the rcutorture on the our shared "dev" tree, which i did now,
> > > > > > > triggers this
> > > > > > > right away.
> > > > > >
> > > > > > Bisection? (Hey, you knew that was coming!)
> > > > > >
> > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start
> > > > > detection
> > > > >
> > > > > After revert in the dev, rcutorture passes TREE05, 16 instances.
> > > >
> > > > Huh. We sure don't get to revert that one...
> > > >
> > > > Do we have a problem with the ordering in rcu_gp_init() between the
> > > > calls
> > > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For
> > > > example,
> > > > do we need to capture the relevant portion of the list before the call
> > > > to rcu_seq_start(), and do the grace-period-start work afterwards?
> > >
> > > I tried moving the call to rcu_sr_normal_gp_init() before the call to
> > > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
> > > Which does not necessarily mean that this is the correct fix, but I
> > > figured that it might at least provide food for thought.
> > >
> > > Thanx, Paul
> > >
> > >
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 48384fa2eaeb8..d3efeff7740e7 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void)
> > >
> > >/* Advance to a new grace period and initialize state. */
> > >record_gp_stall_check_time();
> > > + start_new_poll = rcu_sr_normal_gp_init();
> > >/* Record GP times before starting GP, hence rcu_seq_start(). */
> > >rcu_seq_start(&rcu_state.gp_seq);
> > >ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > - start_new_poll = rcu_sr_normal_gp_init();
> > >trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > TPS("start"));
> >
> > Oh... so the bug is this? Good catch...
> >
> >
> > CPU 0 CPU 1
> >
> > rcu_gp_init()
> >
> > rcu_seq_start(rcu_state.gp_seq)
> > sychronize_rcu_normal()
> > rs.head.func
> > = (void *) get_state_synchronize_rcu();
> >// save rcu_state.gp_seq
> > rcu_sr_normal_add_req() ->
> > llist_add(rcu_state.srs_next)
> > (void) start_poll_synchronize_rcu();
> >
> >
> > sr_normal_gp_init()
> >
> > llist_add(wait_head, &rcu_state.srs_next);
> > // pick up the
> > // injected WH
> >
> > rcu_state.srs_wait_tail = wait_head;
> >
> > rcu_gp_cleanup()
> >
> > rcu_seq_end(&rcu_state.gp_seq);
> > sr_normal_complete()
> >
> > WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) &&
> >
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Sun, Mar 02, 2025 at 12:36:51PM -0800, Paul E. McKenney wrote:
> On Sun, Mar 02, 2025 at 10:46:29AM -0800, Boqun Feng wrote:
> > On Sun, Mar 02, 2025 at 09:39:44AM -0800, Paul E. McKenney wrote:
> > > On Sun, Mar 02, 2025 at 11:19:44AM +0100, Uladzislau Rezki wrote:
> > > > On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote:
> > > > > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
> > > > > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> > > > > > > Hello, Paul!
> > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Except that I got this from overnight testing of
> > > > > > > > > > > > rcu/dev on the shared
> > > > > > > > > > > > RCU tree:
> > > > > > > > > > > >
> > > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > > > > > > > > > > rcu_sr_normal_complete+0x5c/0x80
> > > > > > > > > > > >
> > > > > > > > > > > > I see this only on TREE05. Which should not be too
> > > > > > > > > > > > surprising, given
> > > > > > > > > > > > that this is the scenario that tests it. It happened
> > > > > > > > > > > > within five minutes
> > > > > > > > > > > > on all 14 of the TREE05 runs.
> > > > > > > > > > > >
> > > > > > > > > > > Hm.. This is not fun. I tested this on my system and i
> > > > > > > > > > > did not manage to
> > > > > > > > > > > trigger this whereas you do. Something is wrong.
> > > > > > > > > >
> > > > > > > > > > If you have a debug patch, I would be happy to give it a go.
> > > > > > > > > >
> > > > > > > > > I can trigger it. But.
> > > > > > > > >
> > > > > > > > > Some background. I tested those patches during many hours on
> > > > > > > > > the stable
> > > > > > > > > kernel which is 6.13. On that kernel i was not able to
> > > > > > > > > trigger it. Running
> > > > > > > > > the rcutorture on the our shared "dev" tree, which i did now,
> > > > > > > > > triggers this
> > > > > > > > > right away.
> > > > > > > >
> > > > > > > > Bisection? (Hey, you knew that was coming!)
> > > > > > > >
> > > > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full()
> > > > > > > GP-start detection
> > > > > > >
> > > > > > > After revert in the dev, rcutorture passes TREE05, 16 instances.
> > > > > >
> > > > > > Huh. We sure don't get to revert that one...
> > > > > >
> > > > > > Do we have a problem with the ordering in rcu_gp_init() between the
> > > > > > calls
> > > > > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For
> > > > > > example,
> > > > > > do we need to capture the relevant portion of the list before the
> > > > > > call
> > > > > > to rcu_seq_start(), and do the grace-period-start work afterwards?
> > > > >
> > > > > I tried moving the call to rcu_sr_normal_gp_init() before the call to
> > > > > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
> > > > > Which does not necessarily mean that this is the correct fix, but I
> > > > > figured that it might at least provide food for thought.
> > > > >
> > > > > Thanx, Paul
> > > > >
> > > > >
> > > > >
> > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > index 48384fa2eaeb8..d3efeff7740e7 100644
> > > > > --- a/kernel/rcu/tree.c
> > > > > +++ b/kernel/rcu/tree.c
> > > > > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool
> > > > > rcu_gp_init(void)
> > > > >
> > > > > /* Advance to a new grace period and initialize state. */
> > > > > record_gp_stall_check_time();
> > > > > + start_new_poll = rcu_sr_normal_gp_init();
> > > > > /* Record GP times before starting GP, hence rcu_seq_start(). */
> > > > > rcu_seq_start(&rcu_state.gp_seq);
> > > > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > > > - start_new_poll = rcu_sr_normal_gp_init();
> > > > > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > > TPS("start"));
> > > > > rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap);
> > > > > raw_spin_unlock_irq_rcu_node(rnp);
> > > > >
> > > > Running this 24 hours already. TREE05 * 16 scenario. I do not see any
> > > > warnings yet. There is a race, indeed. The gp_seq is moved forward,
> > > > wheres clients can still come until rcu_sr_normal_gp_init() places a
> > > > dummy-wait-head for this GP.
> > > >
> > > > Thank you for testing Paul and looking to this :)
> > >
> > > Very good! This is a bug in this commit of mine:
> > >
> > > 012f47f0f806 ("rcu: Fix get_state_synchronize_rcu_full() GP-start
> > > detection")
> > >
> > > Boqun, could you please fold this into that commit with something like
> > > this added to the commit log just before the paragraph starting with
> > > "Although this fixes 91a967fd6934"?
> > >
> > > However, simply changing get_state_synchronize_rcu_full() function
> > > to use rcu_state.gp_seq instead of the root rcu_nod
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Sun, Mar 02, 2025 at 07:15:07PM -0500, Joel Fernandes wrote:
> On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> > > > Hello, Paul!
> > > >
> > > > > > > > >
> > > > > > > > > Except that I got this from overnight testing of rcu/dev on
> > > > > > > > > the shared
> > > > > > > > > RCU tree:
> > > > > > > > >
> > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > > > > > > > rcu_sr_normal_complete+0x5c/0x80
> > > > > > > > >
> > > > > > > > > I see this only on TREE05. Which should not be too
> > > > > > > > > surprising, given
> > > > > > > > > that this is the scenario that tests it. It happened within
> > > > > > > > > five minutes
> > > > > > > > > on all 14 of the TREE05 runs.
> > > > > > > > >
> > > > > > > > Hm.. This is not fun. I tested this on my system and i did not
> > > > > > > > manage to
> > > > > > > > trigger this whereas you do. Something is wrong.
> > > > > > >
> > > > > > > If you have a debug patch, I would be happy to give it a go.
> > > > > > >
> > > > > > I can trigger it. But.
> > > > > >
> > > > > > Some background. I tested those patches during many hours on the
> > > > > > stable
> > > > > > kernel which is 6.13. On that kernel i was not able to trigger it.
> > > > > > Running
> > > > > > the rcutorture on the our shared "dev" tree, which i did now,
> > > > > > triggers this
> > > > > > right away.
> > > > >
> > > > > Bisection? (Hey, you knew that was coming!)
> > > > >
> > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start
> > > > detection
> > > >
> > > > After revert in the dev, rcutorture passes TREE05, 16 instances.
> > >
> > > Huh. We sure don't get to revert that one...
> > >
> > > Do we have a problem with the ordering in rcu_gp_init() between the calls
> > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example,
> > > do we need to capture the relevant portion of the list before the call
> > > to rcu_seq_start(), and do the grace-period-start work afterwards?
> >
> > I tried moving the call to rcu_sr_normal_gp_init() before the call to
> > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
> > Which does not necessarily mean that this is the correct fix, but I
> > figured that it might at least provide food for thought.
> >
> > Thanx, Paul
> >
> >
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 48384fa2eaeb8..d3efeff7740e7 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void)
> >
> >/* Advance to a new grace period and initialize state. */
> >record_gp_stall_check_time();
> > + start_new_poll = rcu_sr_normal_gp_init();
> >/* Record GP times before starting GP, hence rcu_seq_start(). */
> >rcu_seq_start(&rcu_state.gp_seq);
> >ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > - start_new_poll = rcu_sr_normal_gp_init();
> >trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > TPS("start"));
>
> Oh... so the bug is this? Good catch...
>
>
> CPU 0 CPU 1
>
> rcu_gp_init()
>
> rcu_seq_start(rcu_state.gp_seq)
> sychronize_rcu_normal()
> rs.head.func
> = (void *) get_state_synchronize_rcu();
>// save rcu_state.gp_seq
> rcu_sr_normal_add_req() ->
> llist_add(rcu_state.srs_next)
> (void) start_poll_synchronize_rcu();
>
>
> sr_normal_gp_init()
>
> llist_add(wait_head, &rcu_state.srs_next);
> // pick up the
> // injected WH
>
> rcu_state.srs_wait_tail = wait_head;
>
> rcu_gp_cleanup()
>
> rcu_seq_end(&rcu_state.gp_seq);
> sr_normal_complete()
>
> WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) &&
>
> !poll_state_synchronize_rcu(oldstate),
>
> Where as reordering sr_normal_gp_init() prevents this:
>
> rcu_gp_init()
>
> sr_normal_gp_init()
>
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote:
> On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> > > Hello, Paul!
> > >
> > > > > > > >
> > > > > > > > Except that I got this from overnight testing of rcu/dev on the
> > > > > > > > shared
> > > > > > > > RCU tree:
> > > > > > > >
> > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > > > > > > rcu_sr_normal_complete+0x5c/0x80
> > > > > > > >
> > > > > > > > I see this only on TREE05. Which should not be too surprising,
> > > > > > > > given
> > > > > > > > that this is the scenario that tests it. It happened within
> > > > > > > > five minutes
> > > > > > > > on all 14 of the TREE05 runs.
> > > > > > > >
> > > > > > > Hm.. This is not fun. I tested this on my system and i did not
> > > > > > > manage to
> > > > > > > trigger this whereas you do. Something is wrong.
> > > > > >
> > > > > > If you have a debug patch, I would be happy to give it a go.
> > > > > >
> > > > > I can trigger it. But.
> > > > >
> > > > > Some background. I tested those patches during many hours on the
> > > > > stable
> > > > > kernel which is 6.13. On that kernel i was not able to trigger it.
> > > > > Running
> > > > > the rcutorture on the our shared "dev" tree, which i did now,
> > > > > triggers this
> > > > > right away.
> > > >
> > > > Bisection? (Hey, you knew that was coming!)
> > > >
> > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start
> > > detection
> > >
> > > After revert in the dev, rcutorture passes TREE05, 16 instances.
> >
> > Huh. We sure don't get to revert that one...
> >
> > Do we have a problem with the ordering in rcu_gp_init() between the calls
> > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example,
> > do we need to capture the relevant portion of the list before the call
> > to rcu_seq_start(), and do the grace-period-start work afterwards?
>
> I tried moving the call to rcu_sr_normal_gp_init() before the call to
> rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
> Which does not necessarily mean that this is the correct fix, but I
> figured that it might at least provide food for thought.
>
> Thanx, Paul
>
>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 48384fa2eaeb8..d3efeff7740e7 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void)
>
>/* Advance to a new grace period and initialize state. */
>record_gp_stall_check_time();
> + start_new_poll = rcu_sr_normal_gp_init();
>/* Record GP times before starting GP, hence rcu_seq_start(). */
>rcu_seq_start(&rcu_state.gp_seq);
>ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> - start_new_poll = rcu_sr_normal_gp_init();
>trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
Oh... so the bug is this? Good catch...
CPU 0 CPU 1
rcu_gp_init()
rcu_seq_start(rcu_state.gp_seq)
sychronize_rcu_normal()
rs.head.func
= (void *) get_state_synchronize_rcu();
// save rcu_state.gp_seq
rcu_sr_normal_add_req() ->
llist_add(rcu_state.srs_next)
(void) start_poll_synchronize_rcu();
sr_normal_gp_init()
llist_add(wait_head, &rcu_state.srs_next);
// pick up the
// injected WH
rcu_state.srs_wait_tail = wait_head;
rcu_gp_cleanup()
rcu_seq_end(&rcu_state.gp_seq);
sr_normal_complete()
WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) &&
!poll_state_synchronize_rcu(oldstate),
Where as reordering sr_normal_gp_init() prevents this:
rcu_gp_init()
sr_normal_gp_init()
// WH has not
// been injected
// so nothing to
// wait on
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Sun, Mar 02, 2025 at 10:46:29AM -0800, Boqun Feng wrote:
> On Sun, Mar 02, 2025 at 09:39:44AM -0800, Paul E. McKenney wrote:
> > On Sun, Mar 02, 2025 at 11:19:44AM +0100, Uladzislau Rezki wrote:
> > > On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote:
> > > > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
> > > > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> > > > > > Hello, Paul!
> > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Except that I got this from overnight testing of rcu/dev
> > > > > > > > > > > on the shared
> > > > > > > > > > > RCU tree:
> > > > > > > > > > >
> > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > > > > > > > > > rcu_sr_normal_complete+0x5c/0x80
> > > > > > > > > > >
> > > > > > > > > > > I see this only on TREE05. Which should not be too
> > > > > > > > > > > surprising, given
> > > > > > > > > > > that this is the scenario that tests it. It happened
> > > > > > > > > > > within five minutes
> > > > > > > > > > > on all 14 of the TREE05 runs.
> > > > > > > > > > >
> > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did
> > > > > > > > > > not manage to
> > > > > > > > > > trigger this whereas you do. Something is wrong.
> > > > > > > > >
> > > > > > > > > If you have a debug patch, I would be happy to give it a go.
> > > > > > > > >
> > > > > > > > I can trigger it. But.
> > > > > > > >
> > > > > > > > Some background. I tested those patches during many hours on
> > > > > > > > the stable
> > > > > > > > kernel which is 6.13. On that kernel i was not able to trigger
> > > > > > > > it. Running
> > > > > > > > the rcutorture on the our shared "dev" tree, which i did now,
> > > > > > > > triggers this
> > > > > > > > right away.
> > > > > > >
> > > > > > > Bisection? (Hey, you knew that was coming!)
> > > > > > >
> > > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start
> > > > > > detection
> > > > > >
> > > > > > After revert in the dev, rcutorture passes TREE05, 16 instances.
> > > > >
> > > > > Huh. We sure don't get to revert that one...
> > > > >
> > > > > Do we have a problem with the ordering in rcu_gp_init() between the
> > > > > calls
> > > > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For
> > > > > example,
> > > > > do we need to capture the relevant portion of the list before the call
> > > > > to rcu_seq_start(), and do the grace-period-start work afterwards?
> > > >
> > > > I tried moving the call to rcu_sr_normal_gp_init() before the call to
> > > > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
> > > > Which does not necessarily mean that this is the correct fix, but I
> > > > figured that it might at least provide food for thought.
> > > >
> > > > Thanx, Paul
> > > >
> > > >
> > > >
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index 48384fa2eaeb8..d3efeff7740e7 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void)
> > > >
> > > > /* Advance to a new grace period and initialize state. */
> > > > record_gp_stall_check_time();
> > > > + start_new_poll = rcu_sr_normal_gp_init();
> > > > /* Record GP times before starting GP, hence rcu_seq_start(). */
> > > > rcu_seq_start(&rcu_state.gp_seq);
> > > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > > - start_new_poll = rcu_sr_normal_gp_init();
> > > > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq,
> > > > TPS("start"));
> > > > rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap);
> > > > raw_spin_unlock_irq_rcu_node(rnp);
> > > >
> > > Running this 24 hours already. TREE05 * 16 scenario. I do not see any
> > > warnings yet. There is a race, indeed. The gp_seq is moved forward,
> > > wheres clients can still come until rcu_sr_normal_gp_init() places a
> > > dummy-wait-head for this GP.
> > >
> > > Thank you for testing Paul and looking to this :)
> >
> > Very good! This is a bug in this commit of mine:
> >
> > 012f47f0f806 ("rcu: Fix get_state_synchronize_rcu_full() GP-start
> > detection")
> >
> > Boqun, could you please fold this into that commit with something like
> > this added to the commit log just before the paragraph starting with
> > "Although this fixes 91a967fd6934"?
> >
> > However, simply changing get_state_synchronize_rcu_full() function
> > to use rcu_state.gp_seq instead of the root rcu_node structure's
> > ->gp_seq field results in a theoretical bug in kernels booted
> > with rcutree.rcu_normal_wake_from_gp=1 due to the following
> > sequence of events:
> >
> > o The rcu_gp_init() function invokes rcu_seq_start()
> >
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Sun, Mar 02, 2025 at 09:39:44AM -0800, Paul E. McKenney wrote:
> On Sun, Mar 02, 2025 at 11:19:44AM +0100, Uladzislau Rezki wrote:
> > On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> > > > > Hello, Paul!
> > > > >
> > > > > > > > > >
> > > > > > > > > > Except that I got this from overnight testing of rcu/dev on
> > > > > > > > > > the shared
> > > > > > > > > > RCU tree:
> > > > > > > > > >
> > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > > > > > > > > rcu_sr_normal_complete+0x5c/0x80
> > > > > > > > > >
> > > > > > > > > > I see this only on TREE05. Which should not be too
> > > > > > > > > > surprising, given
> > > > > > > > > > that this is the scenario that tests it. It happened
> > > > > > > > > > within five minutes
> > > > > > > > > > on all 14 of the TREE05 runs.
> > > > > > > > > >
> > > > > > > > > Hm.. This is not fun. I tested this on my system and i did
> > > > > > > > > not manage to
> > > > > > > > > trigger this whereas you do. Something is wrong.
> > > > > > > >
> > > > > > > > If you have a debug patch, I would be happy to give it a go.
> > > > > > > >
> > > > > > > I can trigger it. But.
> > > > > > >
> > > > > > > Some background. I tested those patches during many hours on the
> > > > > > > stable
> > > > > > > kernel which is 6.13. On that kernel i was not able to trigger
> > > > > > > it. Running
> > > > > > > the rcutorture on the our shared "dev" tree, which i did now,
> > > > > > > triggers this
> > > > > > > right away.
> > > > > >
> > > > > > Bisection? (Hey, you knew that was coming!)
> > > > > >
> > > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start
> > > > > detection
> > > > >
> > > > > After revert in the dev, rcutorture passes TREE05, 16 instances.
> > > >
> > > > Huh. We sure don't get to revert that one...
> > > >
> > > > Do we have a problem with the ordering in rcu_gp_init() between the
> > > > calls
> > > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For
> > > > example,
> > > > do we need to capture the relevant portion of the list before the call
> > > > to rcu_seq_start(), and do the grace-period-start work afterwards?
> > >
> > > I tried moving the call to rcu_sr_normal_gp_init() before the call to
> > > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
> > > Which does not necessarily mean that this is the correct fix, but I
> > > figured that it might at least provide food for thought.
> > >
> > > Thanx, Paul
> > >
> > >
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 48384fa2eaeb8..d3efeff7740e7 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void)
> > >
> > > /* Advance to a new grace period and initialize state. */
> > > record_gp_stall_check_time();
> > > + start_new_poll = rcu_sr_normal_gp_init();
> > > /* Record GP times before starting GP, hence rcu_seq_start(). */
> > > rcu_seq_start(&rcu_state.gp_seq);
> > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > - start_new_poll = rcu_sr_normal_gp_init();
> > > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> > > rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap);
> > > raw_spin_unlock_irq_rcu_node(rnp);
> > >
> > Running this 24 hours already. TREE05 * 16 scenario. I do not see any
> > warnings yet. There is a race, indeed. The gp_seq is moved forward,
> > wheres clients can still come until rcu_sr_normal_gp_init() places a
> > dummy-wait-head for this GP.
> >
> > Thank you for testing Paul and looking to this :)
>
> Very good! This is a bug in this commit of mine:
>
> 012f47f0f806 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
>
> Boqun, could you please fold this into that commit with something like
> this added to the commit log just before the paragraph starting with
> "Although this fixes 91a967fd6934"?
>
> However, simply changing get_state_synchronize_rcu_full() function
> to use rcu_state.gp_seq instead of the root rcu_node structure's
> ->gp_seq field results in a theoretical bug in kernels booted
> with rcutree.rcu_normal_wake_from_gp=1 due to the following
> sequence of events:
>
> o The rcu_gp_init() function invokes rcu_seq_start()
> to officially start a new grace period.
>
> o A new RCU reader begins, referencing X from some
> RCU-protected list. The new grace period is not
> obligated to wait for this reader.
>
> o An updater removes X, then calls synchronize_rcu(),
> which queues
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Sun, Mar 02, 2025 at 11:19:44AM +0100, Uladzislau Rezki wrote:
> On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> > > > Hello, Paul!
> > > >
> > > > > > > > >
> > > > > > > > > Except that I got this from overnight testing of rcu/dev on
> > > > > > > > > the shared
> > > > > > > > > RCU tree:
> > > > > > > > >
> > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > > > > > > > rcu_sr_normal_complete+0x5c/0x80
> > > > > > > > >
> > > > > > > > > I see this only on TREE05. Which should not be too
> > > > > > > > > surprising, given
> > > > > > > > > that this is the scenario that tests it. It happened within
> > > > > > > > > five minutes
> > > > > > > > > on all 14 of the TREE05 runs.
> > > > > > > > >
> > > > > > > > Hm.. This is not fun. I tested this on my system and i did not
> > > > > > > > manage to
> > > > > > > > trigger this whereas you do. Something is wrong.
> > > > > > >
> > > > > > > If you have a debug patch, I would be happy to give it a go.
> > > > > > >
> > > > > > I can trigger it. But.
> > > > > >
> > > > > > Some background. I tested those patches during many hours on the
> > > > > > stable
> > > > > > kernel which is 6.13. On that kernel i was not able to trigger it.
> > > > > > Running
> > > > > > the rcutorture on the our shared "dev" tree, which i did now,
> > > > > > triggers this
> > > > > > right away.
> > > > >
> > > > > Bisection? (Hey, you knew that was coming!)
> > > > >
> > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start
> > > > detection
> > > >
> > > > After revert in the dev, rcutorture passes TREE05, 16 instances.
> > >
> > > Huh. We sure don't get to revert that one...
> > >
> > > Do we have a problem with the ordering in rcu_gp_init() between the calls
> > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example,
> > > do we need to capture the relevant portion of the list before the call
> > > to rcu_seq_start(), and do the grace-period-start work afterwards?
> >
> > I tried moving the call to rcu_sr_normal_gp_init() before the call to
> > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
> > Which does not necessarily mean that this is the correct fix, but I
> > figured that it might at least provide food for thought.
> >
> > Thanx, Paul
> >
> >
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 48384fa2eaeb8..d3efeff7740e7 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void)
> >
> > /* Advance to a new grace period and initialize state. */
> > record_gp_stall_check_time();
> > + start_new_poll = rcu_sr_normal_gp_init();
> > /* Record GP times before starting GP, hence rcu_seq_start(). */
> > rcu_seq_start(&rcu_state.gp_seq);
> > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > - start_new_poll = rcu_sr_normal_gp_init();
> > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> > rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap);
> > raw_spin_unlock_irq_rcu_node(rnp);
> >
> Running this 24 hours already. TREE05 * 16 scenario. I do not see any
> warnings yet. There is a race, indeed. The gp_seq is moved forward,
> wheres clients can still come until rcu_sr_normal_gp_init() places a
> dummy-wait-head for this GP.
>
> Thank you for testing Paul and looking to this :)
Very good! This is a bug in this commit of mine:
012f47f0f806 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
Boqun, could you please fold this into that commit with something like
this added to the commit log just before the paragraph starting with
"Although this fixes 91a967fd6934"?
However, simply changing get_state_synchronize_rcu_full() function
to use rcu_state.gp_seq instead of the root rcu_node structure's
->gp_seq field results in a theoretical bug in kernels booted
with rcutree.rcu_normal_wake_from_gp=1 due to the following
sequence of events:
o The rcu_gp_init() function invokes rcu_seq_start()
to officially start a new grace period.
o A new RCU reader begins, referencing X from some
RCU-protected list. The new grace period is not
obligated to wait for this reader.
o An updater removes X, then calls synchronize_rcu(),
which queues a wait element.
o The grace period ends, awakening the updater, which
frees X while the reader is still referencing it.
The reason that this is theoretical is that although the
grace period has official
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote:
> On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> > > Hello, Paul!
> > >
> > > > > > > >
> > > > > > > > Except that I got this from overnight testing of rcu/dev on the
> > > > > > > > shared
> > > > > > > > RCU tree:
> > > > > > > >
> > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > > > > > > rcu_sr_normal_complete+0x5c/0x80
> > > > > > > >
> > > > > > > > I see this only on TREE05. Which should not be too surprising,
> > > > > > > > given
> > > > > > > > that this is the scenario that tests it. It happened within
> > > > > > > > five minutes
> > > > > > > > on all 14 of the TREE05 runs.
> > > > > > > >
> > > > > > > Hm.. This is not fun. I tested this on my system and i did not
> > > > > > > manage to
> > > > > > > trigger this whereas you do. Something is wrong.
> > > > > >
> > > > > > If you have a debug patch, I would be happy to give it a go.
> > > > > >
> > > > > I can trigger it. But.
> > > > >
> > > > > Some background. I tested those patches during many hours on the
> > > > > stable
> > > > > kernel which is 6.13. On that kernel i was not able to trigger it.
> > > > > Running
> > > > > the rcutorture on the our shared "dev" tree, which i did now,
> > > > > triggers this
> > > > > right away.
> > > >
> > > > Bisection? (Hey, you knew that was coming!)
> > > >
> > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start
> > > detection
> > >
> > > After revert in the dev, rcutorture passes TREE05, 16 instances.
> >
> > Huh. We sure don't get to revert that one...
> >
> > Do we have a problem with the ordering in rcu_gp_init() between the calls
> > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example,
> > do we need to capture the relevant portion of the list before the call
> > to rcu_seq_start(), and do the grace-period-start work afterwards?
>
> I tried moving the call to rcu_sr_normal_gp_init() before the call to
> rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
> Which does not necessarily mean that this is the correct fix, but I
> figured that it might at least provide food for thought.
>
> Thanx, Paul
>
>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 48384fa2eaeb8..d3efeff7740e7 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void)
>
> /* Advance to a new grace period and initialize state. */
> record_gp_stall_check_time();
> + start_new_poll = rcu_sr_normal_gp_init();
> /* Record GP times before starting GP, hence rcu_seq_start(). */
> rcu_seq_start(&rcu_state.gp_seq);
> ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> - start_new_poll = rcu_sr_normal_gp_init();
> trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap);
> raw_spin_unlock_irq_rcu_node(rnp);
>
Running this 24 hours already. TREE05 * 16 scenario. I do not see any
warnings yet. There is a race, indeed. The gp_seq is moved forward,
wheres clients can still come until rcu_sr_normal_gp_init() places a
dummy-wait-head for this GP.
Thank you for testing Paul and looking to this :)
--
Uladzislau Rezki
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Fri, Feb 28, 2025 at 05:36:47PM +0100, Uladzislau Rezki wrote:
> On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote:
> > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote:
> > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote:
> > > > > Hi Ulad,
> > > > >
> > > > > I put these three patches into next (and misc.2025.02.27a) for some
> > > > > testing, hopefully it all goes well and they can make it v6.15.
> > > > >
> > > > > A few tag changed below:
> > > > >
> > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony)
> > > > > wrote:
> > > > > > Switch for using of get_state_synchronize_rcu_full() and
> > > > > > poll_state_synchronize_rcu_full() pair to debug a normal
> > > > > > synchronize_rcu() call.
> > > > > >
> > > > > > Just using "not" full APIs to identify if a grace period is
> > > > > > passed or not might lead to a false-positive kernel splat.
> > > > > >
> > > > > > It can happen, because get_state_synchronize_rcu() compresses
> > > > > > both normal and expedited states into one single unsigned long
> > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion
> > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> > > > > > run.
> > > > > >
> > > > > > To address this, switch to poll_state_synchronize_rcu_full() and
> > > > > > get_state_synchronize_rcu_full() APIs, which use separate variables
> > > > > > for expedited and normal states.
> > > > > >
> > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
> > > > >
> > > > > I switch this into "Closes:" per checkpatch.
> > > > >
> > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> > > > > > Reported-by: cheung wall
> > > > > > Signed-off-by: Uladzislau Rezki (Sony)
> > > > >
> > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next.
> > > > > Would you or Paul double-check the Reviewed-by should be here?
> > > >
> > > > I am good with keeping my Reviewed-by tags.
> > > >
> > > Thanks Paul!
> >
> > Except that I got this from overnight testing of rcu/dev on the shared
> > RCU tree:
> >
> > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > rcu_sr_normal_complete+0x5c/0x80
> >
> > I see this only on TREE05. Which should not be too surprising, given
> > that this is the scenario that tests it. It happened within five minutes
> > on all 14 of the TREE05 runs.
> >
> Hm.. This is not fun. I tested this on my system and i did not manage to
> trigger this whereas you do. Something is wrong.
If you have a debug patch, I would be happy to give it a go.
Thanx, Paul
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Fri, Feb 28, 2025 at 10:25:58AM -0800, Paul E. McKenney wrote:
> On Fri, Feb 28, 2025 at 06:08:19PM +0100, Uladzislau Rezki wrote:
> > On Fri, Feb 28, 2025 at 05:36:47PM +0100, Uladzislau Rezki wrote:
> > > On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote:
> > > > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote:
> > > > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote:
> > > > > > > Hi Ulad,
> > > > > > >
> > > > > > > I put these three patches into next (and misc.2025.02.27a) for
> > > > > > > some
> > > > > > > testing, hopefully it all goes well and they can make it v6.15.
> > > > > > >
> > > > > > > A few tag changed below:
> > > > > > >
> > > > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony)
> > > > > > > wrote:
> > > > > > > > Switch for using of get_state_synchronize_rcu_full() and
> > > > > > > > poll_state_synchronize_rcu_full() pair to debug a normal
> > > > > > > > synchronize_rcu() call.
> > > > > > > >
> > > > > > > > Just using "not" full APIs to identify if a grace period is
> > > > > > > > passed or not might lead to a false-positive kernel splat.
> > > > > > > >
> > > > > > > > It can happen, because get_state_synchronize_rcu() compresses
> > > > > > > > both normal and expedited states into one single unsigned long
> > > > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion
> > > > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> > > > > > > > run.
> > > > > > > >
> > > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and
> > > > > > > > get_state_synchronize_rcu_full() APIs, which use separate
> > > > > > > > variables
> > > > > > > > for expedited and normal states.
> > > > > > > >
> > > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
> > > > > > >
> > > > > > > I switch this into "Closes:" per checkpatch.
> > > > > > >
> > > > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> > > > > > > > Reported-by: cheung wall
> > > > > > > > Signed-off-by: Uladzislau Rezki (Sony)
> > > > > > >
> > > > > > > You seem to forget add Paul's Reviewed-by, so I add it in
> > > > > > > rcu/next.
> > > > > > > Would you or Paul double-check the Reviewed-by should be here?
> > > > > >
> > > > > > I am good with keeping my Reviewed-by tags.
> > > > > >
> > > > > Thanks Paul!
> > > >
> > > > Except that I got this from overnight testing of rcu/dev on the shared
> > > > RCU tree:
> > > >
> > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > > rcu_sr_normal_complete+0x5c/0x80
> > > >
> > > > I see this only on TREE05. Which should not be too surprising, given
> > > > that this is the scenario that tests it. It happened within five
> > > > minutes
> > > > on all 14 of the TREE05 runs.
> > > >
> > > Hm.. This is not fun. I tested this on my system and i did not manage to
> > > trigger this whereas you do. Something is wrong.
> > >
> > We have below code to start a new GP, if we detect that processing is
> > starved:
> >
> >
> > /*
> > * The "start_new_poll" is set to true, only when this GP is not able
> > * to handle anything and there are outstanding users. It happens when
> > * the rcu_sr_normal_gp_init() function was not able to insert a dummy
> > * separator to the llist, because there were no left any dummy-nodes.
> > *
> > * Number of dummy-nodes is fixed, it could be that we are run out of
> > * them, if so we start a new pool request to repeat a try. It is rare
> > * and it means that a system is doing a slow processing of callbacks.
> > */
> > if (start_new_poll)
> > (void) start_poll_synchronize_rcu();
> >
> >
> > we do not use a _full() version, since we need to inform rcu-gp-kthread
> > to initiate a new GP.
> >
> > Any thoughts?
>
> My kneejerk not-to-be-trusted take is that it does not matter which type
> of grace period gets started so long as a grace period does get started.
> Presumably you have done the get_state_synchronize_rcu_full() before
> this point?
>
Yes, of course. That code is located in the gp-kthread.
get_state_synchronize_rcu_full()
is done before.
--
Uladzislau Rezki
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
> On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> > Hello, Paul!
> >
> > > > > > >
> > > > > > > Except that I got this from overnight testing of rcu/dev on the
> > > > > > > shared
> > > > > > > RCU tree:
> > > > > > >
> > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > > > > > rcu_sr_normal_complete+0x5c/0x80
> > > > > > >
> > > > > > > I see this only on TREE05. Which should not be too surprising,
> > > > > > > given
> > > > > > > that this is the scenario that tests it. It happened within five
> > > > > > > minutes
> > > > > > > on all 14 of the TREE05 runs.
> > > > > > >
> > > > > > Hm.. This is not fun. I tested this on my system and i did not
> > > > > > manage to
> > > > > > trigger this whereas you do. Something is wrong.
> > > > >
> > > > > If you have a debug patch, I would be happy to give it a go.
> > > > >
> > > > I can trigger it. But.
> > > >
> > > > Some background. I tested those patches during many hours on the stable
> > > > kernel which is 6.13. On that kernel i was not able to trigger it.
> > > > Running
> > > > the rcutorture on the our shared "dev" tree, which i did now, triggers
> > > > this
> > > > right away.
> > >
> > > Bisection? (Hey, you knew that was coming!)
> > >
> > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start
> > detection
> >
> > After revert in the dev, rcutorture passes TREE05, 16 instances.
>
> Huh. We sure don't get to revert that one...
>
> Do we have a problem with the ordering in rcu_gp_init() between the calls
> to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example,
> do we need to capture the relevant portion of the list before the call
> to rcu_seq_start(), and do the grace-period-start work afterwards?
I tried moving the call to rcu_sr_normal_gp_init() before the call to
rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
Which does not necessarily mean that this is the correct fix, but I
figured that it might at least provide food for thought.
Thanx, Paul
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 48384fa2eaeb8..d3efeff7740e7 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void)
/* Advance to a new grace period and initialize state. */
record_gp_stall_check_time();
+ start_new_poll = rcu_sr_normal_gp_init();
/* Record GP times before starting GP, hence rcu_seq_start(). */
rcu_seq_start(&rcu_state.gp_seq);
ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
- start_new_poll = rcu_sr_normal_gp_init();
trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap);
raw_spin_unlock_irq_rcu_node(rnp);
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote: > Hello, Paul! > > > > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the > > > > > > shared > > > > > > RCU tree: > > > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 > > > > > > rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, > > > > > > given > > > > > > that this is the scenario that tests it. It happened within five > > > > > > minutes > > > > > > on all 14 of the TREE05 runs. > > > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage > > > > > to > > > > > trigger this whereas you do. Something is wrong. > > > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > > > I can trigger it. But. > > > > > > Some background. I tested those patches during many hours on the stable > > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > > the rcutorture on the our shared "dev" tree, which i did now, triggers > > > this > > > right away. > > > > Bisection? (Hey, you knew that was coming!) > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection > > After revert in the dev, rcutorture passes TREE05, 16 instances. Huh. We sure don't get to revert that one... Do we have a problem with the ordering in rcu_gp_init() between the calls to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example, do we need to capture the relevant portion of the list before the call to rcu_seq_start(), and do the grace-period-start work afterwards? My kneejerk (and possibibly completely wrong) guess is that rcu_gp_init() calls rcu_gp_start(), then there is a call to synchronize_rcu() whose cookie says wait for the end of the next grace period, then we capture the lists including this one that needs to wait longer. Then when we look at the end of the grace period, boom! This would be a real bug due to some CPU coming online between the time of the call to rcu_gp_start() and synchronize_rcu(). Or is there some other way that this can happen? Thanx, Paul
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Fri, Feb 28, 2025 at 10:21:57AM -0800, Paul E. McKenney wrote:
> On Fri, Feb 28, 2025 at 05:36:47PM +0100, Uladzislau Rezki wrote:
> > On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote:
> > > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote:
> > > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote:
> > > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote:
> > > > > > Hi Ulad,
> > > > > >
> > > > > > I put these three patches into next (and misc.2025.02.27a) for some
> > > > > > testing, hopefully it all goes well and they can make it v6.15.
> > > > > >
> > > > > > A few tag changed below:
> > > > > >
> > > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony)
> > > > > > wrote:
> > > > > > > Switch for using of get_state_synchronize_rcu_full() and
> > > > > > > poll_state_synchronize_rcu_full() pair to debug a normal
> > > > > > > synchronize_rcu() call.
> > > > > > >
> > > > > > > Just using "not" full APIs to identify if a grace period is
> > > > > > > passed or not might lead to a false-positive kernel splat.
> > > > > > >
> > > > > > > It can happen, because get_state_synchronize_rcu() compresses
> > > > > > > both normal and expedited states into one single unsigned long
> > > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion
> > > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> > > > > > > run.
> > > > > > >
> > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and
> > > > > > > get_state_synchronize_rcu_full() APIs, which use separate
> > > > > > > variables
> > > > > > > for expedited and normal states.
> > > > > > >
> > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
> > > > > >
> > > > > > I switch this into "Closes:" per checkpatch.
> > > > > >
> > > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> > > > > > > Reported-by: cheung wall
> > > > > > > Signed-off-by: Uladzislau Rezki (Sony)
> > > > > >
> > > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next.
> > > > > > Would you or Paul double-check the Reviewed-by should be here?
> > > > >
> > > > > I am good with keeping my Reviewed-by tags.
> > > > >
> > > > Thanks Paul!
> > >
> > > Except that I got this from overnight testing of rcu/dev on the shared
> > > RCU tree:
> > >
> > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > rcu_sr_normal_complete+0x5c/0x80
> > >
> > > I see this only on TREE05. Which should not be too surprising, given
> > > that this is the scenario that tests it. It happened within five minutes
> > > on all 14 of the TREE05 runs.
> > >
> > Hm.. This is not fun. I tested this on my system and i did not manage to
> > trigger this whereas you do. Something is wrong.
>
> If you have a debug patch, I would be happy to give it a go.
>
I can trigger it. But.
Some background. I tested those patches during many hours on the stable
kernel which is 6.13. On that kernel i was not able to trigger it. Running
the rcutorture on the our shared "dev" tree, which i did now, triggers this
right away.
--
Uladzislau Rezki
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
Hello, Paul! > > > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared > > > > > RCU tree: > > > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 > > > > > rcu_sr_normal_complete+0x5c/0x80 > > > > > > > > > > I see this only on TREE05. Which should not be too surprising, given > > > > > that this is the scenario that tests it. It happened within five > > > > > minutes > > > > > on all 14 of the TREE05 runs. > > > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to > > > > trigger this whereas you do. Something is wrong. > > > > > > If you have a debug patch, I would be happy to give it a go. > > > > > I can trigger it. But. > > > > Some background. I tested those patches during many hours on the stable > > kernel which is 6.13. On that kernel i was not able to trigger it. Running > > the rcutorture on the our shared "dev" tree, which i did now, triggers this > > right away. > > Bisection? (Hey, you knew that was coming!) > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection After revert in the dev, rcutorture passes TREE05, 16 instances. -- Uladzislau Rezki
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Fri, Feb 28, 2025 at 07:24:41PM +0100, Uladzislau Rezki wrote:
> On Fri, Feb 28, 2025 at 10:21:57AM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 28, 2025 at 05:36:47PM +0100, Uladzislau Rezki wrote:
> > > On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote:
> > > > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote:
> > > > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote:
> > > > > > > Hi Ulad,
> > > > > > >
> > > > > > > I put these three patches into next (and misc.2025.02.27a) for
> > > > > > > some
> > > > > > > testing, hopefully it all goes well and they can make it v6.15.
> > > > > > >
> > > > > > > A few tag changed below:
> > > > > > >
> > > > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony)
> > > > > > > wrote:
> > > > > > > > Switch for using of get_state_synchronize_rcu_full() and
> > > > > > > > poll_state_synchronize_rcu_full() pair to debug a normal
> > > > > > > > synchronize_rcu() call.
> > > > > > > >
> > > > > > > > Just using "not" full APIs to identify if a grace period is
> > > > > > > > passed or not might lead to a false-positive kernel splat.
> > > > > > > >
> > > > > > > > It can happen, because get_state_synchronize_rcu() compresses
> > > > > > > > both normal and expedited states into one single unsigned long
> > > > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion
> > > > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> > > > > > > > run.
> > > > > > > >
> > > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and
> > > > > > > > get_state_synchronize_rcu_full() APIs, which use separate
> > > > > > > > variables
> > > > > > > > for expedited and normal states.
> > > > > > > >
> > > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
> > > > > > >
> > > > > > > I switch this into "Closes:" per checkpatch.
> > > > > > >
> > > > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> > > > > > > > Reported-by: cheung wall
> > > > > > > > Signed-off-by: Uladzislau Rezki (Sony)
> > > > > > >
> > > > > > > You seem to forget add Paul's Reviewed-by, so I add it in
> > > > > > > rcu/next.
> > > > > > > Would you or Paul double-check the Reviewed-by should be here?
> > > > > >
> > > > > > I am good with keeping my Reviewed-by tags.
> > > > > >
> > > > > Thanks Paul!
> > > >
> > > > Except that I got this from overnight testing of rcu/dev on the shared
> > > > RCU tree:
> > > >
> > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > > rcu_sr_normal_complete+0x5c/0x80
> > > >
> > > > I see this only on TREE05. Which should not be too surprising, given
> > > > that this is the scenario that tests it. It happened within five
> > > > minutes
> > > > on all 14 of the TREE05 runs.
> > > >
> > > Hm.. This is not fun. I tested this on my system and i did not manage to
> > > trigger this whereas you do. Something is wrong.
> >
> > If you have a debug patch, I would be happy to give it a go.
> >
> I can trigger it. But.
>
> Some background. I tested those patches during many hours on the stable
> kernel which is 6.13. On that kernel i was not able to trigger it. Running
> the rcutorture on the our shared "dev" tree, which i did now, triggers this
> right away.
Bisection? (Hey, you knew that was coming!)
Thanx, Paul
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Fri, Feb 28, 2025 at 06:08:19PM +0100, Uladzislau Rezki wrote:
> On Fri, Feb 28, 2025 at 05:36:47PM +0100, Uladzislau Rezki wrote:
> > On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote:
> > > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote:
> > > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote:
> > > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote:
> > > > > > Hi Ulad,
> > > > > >
> > > > > > I put these three patches into next (and misc.2025.02.27a) for some
> > > > > > testing, hopefully it all goes well and they can make it v6.15.
> > > > > >
> > > > > > A few tag changed below:
> > > > > >
> > > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony)
> > > > > > wrote:
> > > > > > > Switch for using of get_state_synchronize_rcu_full() and
> > > > > > > poll_state_synchronize_rcu_full() pair to debug a normal
> > > > > > > synchronize_rcu() call.
> > > > > > >
> > > > > > > Just using "not" full APIs to identify if a grace period is
> > > > > > > passed or not might lead to a false-positive kernel splat.
> > > > > > >
> > > > > > > It can happen, because get_state_synchronize_rcu() compresses
> > > > > > > both normal and expedited states into one single unsigned long
> > > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion
> > > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> > > > > > > run.
> > > > > > >
> > > > > > > To address this, switch to poll_state_synchronize_rcu_full() and
> > > > > > > get_state_synchronize_rcu_full() APIs, which use separate
> > > > > > > variables
> > > > > > > for expedited and normal states.
> > > > > > >
> > > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
> > > > > >
> > > > > > I switch this into "Closes:" per checkpatch.
> > > > > >
> > > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> > > > > > > Reported-by: cheung wall
> > > > > > > Signed-off-by: Uladzislau Rezki (Sony)
> > > > > >
> > > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next.
> > > > > > Would you or Paul double-check the Reviewed-by should be here?
> > > > >
> > > > > I am good with keeping my Reviewed-by tags.
> > > > >
> > > > Thanks Paul!
> > >
> > > Except that I got this from overnight testing of rcu/dev on the shared
> > > RCU tree:
> > >
> > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > > rcu_sr_normal_complete+0x5c/0x80
> > >
> > > I see this only on TREE05. Which should not be too surprising, given
> > > that this is the scenario that tests it. It happened within five minutes
> > > on all 14 of the TREE05 runs.
> > >
> > Hm.. This is not fun. I tested this on my system and i did not manage to
> > trigger this whereas you do. Something is wrong.
> >
> We have below code to start a new GP, if we detect that processing is
> starved:
>
>
> /*
> * The "start_new_poll" is set to true, only when this GP is not able
> * to handle anything and there are outstanding users. It happens when
> * the rcu_sr_normal_gp_init() function was not able to insert a dummy
> * separator to the llist, because there were no left any dummy-nodes.
> *
> * Number of dummy-nodes is fixed, it could be that we are run out of
> * them, if so we start a new pool request to repeat a try. It is rare
> * and it means that a system is doing a slow processing of callbacks.
> */
> if (start_new_poll)
> (void) start_poll_synchronize_rcu();
>
>
> we do not use a _full() version, since we need to inform rcu-gp-kthread
> to initiate a new GP.
>
> Any thoughts?
My kneejerk not-to-be-trusted take is that it does not matter which type
of grace period gets started so long as a grace period does get started.
Presumably you have done the get_state_synchronize_rcu_full() before
this point?
Thanx, Paul
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Fri, Feb 28, 2025 at 05:36:47PM +0100, Uladzislau Rezki wrote:
> On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote:
> > On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote:
> > > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote:
> > > > > Hi Ulad,
> > > > >
> > > > > I put these three patches into next (and misc.2025.02.27a) for some
> > > > > testing, hopefully it all goes well and they can make it v6.15.
> > > > >
> > > > > A few tag changed below:
> > > > >
> > > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony)
> > > > > wrote:
> > > > > > Switch for using of get_state_synchronize_rcu_full() and
> > > > > > poll_state_synchronize_rcu_full() pair to debug a normal
> > > > > > synchronize_rcu() call.
> > > > > >
> > > > > > Just using "not" full APIs to identify if a grace period is
> > > > > > passed or not might lead to a false-positive kernel splat.
> > > > > >
> > > > > > It can happen, because get_state_synchronize_rcu() compresses
> > > > > > both normal and expedited states into one single unsigned long
> > > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion
> > > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> > > > > > run.
> > > > > >
> > > > > > To address this, switch to poll_state_synchronize_rcu_full() and
> > > > > > get_state_synchronize_rcu_full() APIs, which use separate variables
> > > > > > for expedited and normal states.
> > > > > >
> > > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
> > > > >
> > > > > I switch this into "Closes:" per checkpatch.
> > > > >
> > > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> > > > > > Reported-by: cheung wall
> > > > > > Signed-off-by: Uladzislau Rezki (Sony)
> > > > >
> > > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next.
> > > > > Would you or Paul double-check the Reviewed-by should be here?
> > > >
> > > > I am good with keeping my Reviewed-by tags.
> > > >
> > > Thanks Paul!
> >
> > Except that I got this from overnight testing of rcu/dev on the shared
> > RCU tree:
> >
> > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> > rcu_sr_normal_complete+0x5c/0x80
> >
> > I see this only on TREE05. Which should not be too surprising, given
> > that this is the scenario that tests it. It happened within five minutes
> > on all 14 of the TREE05 runs.
> >
> Hm.. This is not fun. I tested this on my system and i did not manage to
> trigger this whereas you do. Something is wrong.
>
We have below code to start a new GP, if we detect that processing is
starved:
/*
* The "start_new_poll" is set to true, only when this GP is not able
* to handle anything and there are outstanding users. It happens when
* the rcu_sr_normal_gp_init() function was not able to insert a dummy
* separator to the llist, because there were no left any dummy-nodes.
*
* Number of dummy-nodes is fixed, it could be that we are run out of
* them, if so we start a new pool request to repeat a try. It is rare
* and it means that a system is doing a slow processing of callbacks.
*/
if (start_new_poll)
(void) start_poll_synchronize_rcu();
we do not use a _full() version, since we need to inform rcu-gp-kthread
to initiate a new GP.
Any thoughts?
--
Uladzislau Rezki
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Fri, Feb 28, 2025 at 07:41:40AM -0800, Paul E. McKenney wrote:
> On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote:
> > On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote:
> > > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote:
> > > > Hi Ulad,
> > > >
> > > > I put these three patches into next (and misc.2025.02.27a) for some
> > > > testing, hopefully it all goes well and they can make it v6.15.
> > > >
> > > > A few tag changed below:
> > > >
> > > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > > Switch for using of get_state_synchronize_rcu_full() and
> > > > > poll_state_synchronize_rcu_full() pair to debug a normal
> > > > > synchronize_rcu() call.
> > > > >
> > > > > Just using "not" full APIs to identify if a grace period is
> > > > > passed or not might lead to a false-positive kernel splat.
> > > > >
> > > > > It can happen, because get_state_synchronize_rcu() compresses
> > > > > both normal and expedited states into one single unsigned long
> > > > > value, so a poll_state_synchronize_rcu() can miss GP-completion
> > > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> > > > > run.
> > > > >
> > > > > To address this, switch to poll_state_synchronize_rcu_full() and
> > > > > get_state_synchronize_rcu_full() APIs, which use separate variables
> > > > > for expedited and normal states.
> > > > >
> > > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
> > > >
> > > > I switch this into "Closes:" per checkpatch.
> > > >
> > > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> > > > > Reported-by: cheung wall
> > > > > Signed-off-by: Uladzislau Rezki (Sony)
> > > >
> > > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next.
> > > > Would you or Paul double-check the Reviewed-by should be here?
> > >
> > > I am good with keeping my Reviewed-by tags.
> > >
> > Thanks Paul!
>
> Except that I got this from overnight testing of rcu/dev on the shared
> RCU tree:
>
> WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
> rcu_sr_normal_complete+0x5c/0x80
>
> I see this only on TREE05. Which should not be too surprising, given
> that this is the scenario that tests it. It happened within five minutes
> on all 14 of the TREE05 runs.
>
Hm.. This is not fun. I tested this on my system and i did not manage to
trigger this whereas you do. Something is wrong.
--
Uladzislau Rezki
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Thu, Feb 27, 2025 at 06:44:15PM +0100, Uladzislau Rezki wrote:
> On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote:
> > On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote:
> > > Hi Ulad,
> > >
> > > I put these three patches into next (and misc.2025.02.27a) for some
> > > testing, hopefully it all goes well and they can make it v6.15.
> > >
> > > A few tag changed below:
> > >
> > > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > Switch for using of get_state_synchronize_rcu_full() and
> > > > poll_state_synchronize_rcu_full() pair to debug a normal
> > > > synchronize_rcu() call.
> > > >
> > > > Just using "not" full APIs to identify if a grace period is
> > > > passed or not might lead to a false-positive kernel splat.
> > > >
> > > > It can happen, because get_state_synchronize_rcu() compresses
> > > > both normal and expedited states into one single unsigned long
> > > > value, so a poll_state_synchronize_rcu() can miss GP-completion
> > > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> > > > run.
> > > >
> > > > To address this, switch to poll_state_synchronize_rcu_full() and
> > > > get_state_synchronize_rcu_full() APIs, which use separate variables
> > > > for expedited and normal states.
> > > >
> > > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
> > >
> > > I switch this into "Closes:" per checkpatch.
> > >
> > > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> > > > Reported-by: cheung wall
> > > > Signed-off-by: Uladzislau Rezki (Sony)
> > >
> > > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next.
> > > Would you or Paul double-check the Reviewed-by should be here?
> >
> > I am good with keeping my Reviewed-by tags.
> >
> Thanks Paul!
Except that I got this from overnight testing of rcu/dev on the shared
RCU tree:
WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636
rcu_sr_normal_complete+0x5c/0x80
I see this only on TREE05. Which should not be too surprising, given
that this is the scenario that tests it. It happened within five minutes
on all 14 of the TREE05 runs.
This commit, just to avoid any ambiguity:
7cb48b64a563 ("MAINTAINERS: Update Joel's email address")
Thanx, Paul
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
Hello, Boqun!
> Hi Ulad,
>
> I put these three patches into next (and misc.2025.02.27a) for some
> testing, hopefully it all goes well and they can make it v6.15.
>
> A few tag changed below:
>
> On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote:
> > Switch for using of get_state_synchronize_rcu_full() and
> > poll_state_synchronize_rcu_full() pair to debug a normal
> > synchronize_rcu() call.
> >
> > Just using "not" full APIs to identify if a grace period is
> > passed or not might lead to a false-positive kernel splat.
> >
> > It can happen, because get_state_synchronize_rcu() compresses
> > both normal and expedited states into one single unsigned long
> > value, so a poll_state_synchronize_rcu() can miss GP-completion
> > when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> > run.
> >
> > To address this, switch to poll_state_synchronize_rcu_full() and
> > get_state_synchronize_rcu_full() APIs, which use separate variables
> > for expedited and normal states.
> >
> > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
>
> I switch this into "Closes:" per checkpatch.
>
Thanks!
> > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> > Reported-by: cheung wall
> > Signed-off-by: Uladzislau Rezki (Sony)
>
> You seem to forget add Paul's Reviewed-by, so I add it in rcu/next.
> Would you or Paul double-check the Reviewed-by should be here?
>
Probably. But i do not remember that i got a review from Paul. Might
be wrong and have missed that :)
Anyway, thank you for adding.
--
Uladzislau Rezki
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote:
> Hi Ulad,
>
> I put these three patches into next (and misc.2025.02.27a) for some
> testing, hopefully it all goes well and they can make it v6.15.
>
> A few tag changed below:
>
> On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote:
> > Switch for using of get_state_synchronize_rcu_full() and
> > poll_state_synchronize_rcu_full() pair to debug a normal
> > synchronize_rcu() call.
> >
> > Just using "not" full APIs to identify if a grace period is
> > passed or not might lead to a false-positive kernel splat.
> >
> > It can happen, because get_state_synchronize_rcu() compresses
> > both normal and expedited states into one single unsigned long
> > value, so a poll_state_synchronize_rcu() can miss GP-completion
> > when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> > run.
> >
> > To address this, switch to poll_state_synchronize_rcu_full() and
> > get_state_synchronize_rcu_full() APIs, which use separate variables
> > for expedited and normal states.
> >
> > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
>
> I switch this into "Closes:" per checkpatch.
>
> > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> > Reported-by: cheung wall
> > Signed-off-by: Uladzislau Rezki (Sony)
>
> You seem to forget add Paul's Reviewed-by, so I add it in rcu/next.
> Would you or Paul double-check the Reviewed-by should be here?
I am good with keeping my Reviewed-by tags.
Thanx, Paul
> Regards,
> Boqun
>
> > ---
> > include/linux/rcupdate_wait.h | 3 +++
> > kernel/rcu/tree.c | 8 +++-
> > 2 files changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/rcupdate_wait.h b/include/linux/rcupdate_wait.h
> > index f9bed3d3f78d..4c92d4291cce 100644
> > --- a/include/linux/rcupdate_wait.h
> > +++ b/include/linux/rcupdate_wait.h
> > @@ -16,6 +16,9 @@
> > struct rcu_synchronize {
> > struct rcu_head head;
> > struct completion completion;
> > +
> > + /* This is for debugging. */
> > + struct rcu_gp_oldstate oldstate;
> > };
> > void wakeme_after_rcu(struct rcu_head *head);
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 8625f616c65a..48384fa2eaeb 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1632,12 +1632,10 @@ static void rcu_sr_normal_complete(struct
> > llist_node *node)
> > {
> > struct rcu_synchronize *rs = container_of(
> > (struct rcu_head *) node, struct rcu_synchronize, head);
> > - unsigned long oldstate = (unsigned long) rs->head.func;
> >
> > WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) &&
> > - !poll_state_synchronize_rcu(oldstate),
> > - "A full grace period is not passed yet: %lu",
> > - rcu_seq_diff(get_state_synchronize_rcu(), oldstate));
> > + !poll_state_synchronize_rcu_full(&rs->oldstate),
> > + "A full grace period is not passed yet!\n");
> >
> > /* Finally. */
> > complete(&rs->completion);
> > @@ -3247,7 +3245,7 @@ static void synchronize_rcu_normal(void)
> > * snapshot before adding a request.
> > */
> > if (IS_ENABLED(CONFIG_PROVE_RCU))
> > - rs.head.func = (void *) get_state_synchronize_rcu();
> > + get_state_synchronize_rcu_full(&rs.oldstate);
> >
> > rcu_sr_normal_add_req(&rs);
> >
> > --
> > 2.39.5
> >
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote:
> On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote:
> > Hi Ulad,
> >
> > I put these three patches into next (and misc.2025.02.27a) for some
> > testing, hopefully it all goes well and they can make it v6.15.
> >
> > A few tag changed below:
> >
> > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote:
> > > Switch for using of get_state_synchronize_rcu_full() and
> > > poll_state_synchronize_rcu_full() pair to debug a normal
> > > synchronize_rcu() call.
> > >
> > > Just using "not" full APIs to identify if a grace period is
> > > passed or not might lead to a false-positive kernel splat.
> > >
> > > It can happen, because get_state_synchronize_rcu() compresses
> > > both normal and expedited states into one single unsigned long
> > > value, so a poll_state_synchronize_rcu() can miss GP-completion
> > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> > > run.
> > >
> > > To address this, switch to poll_state_synchronize_rcu_full() and
> > > get_state_synchronize_rcu_full() APIs, which use separate variables
> > > for expedited and normal states.
> > >
> > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
> >
> > I switch this into "Closes:" per checkpatch.
> >
> > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> > > Reported-by: cheung wall
> > > Signed-off-by: Uladzislau Rezki (Sony)
> >
> > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next.
> > Would you or Paul double-check the Reviewed-by should be here?
>
> I am good with keeping my Reviewed-by tags.
>
Thank you!
Regards,
Boqun
> Thanx, Paul
>
[...]
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Thu, Feb 27, 2025 at 09:26:40AM -0800, Paul E. McKenney wrote:
> On Thu, Feb 27, 2025 at 09:12:39AM -0800, Boqun Feng wrote:
> > Hi Ulad,
> >
> > I put these three patches into next (and misc.2025.02.27a) for some
> > testing, hopefully it all goes well and they can make it v6.15.
> >
> > A few tag changed below:
> >
> > On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote:
> > > Switch for using of get_state_synchronize_rcu_full() and
> > > poll_state_synchronize_rcu_full() pair to debug a normal
> > > synchronize_rcu() call.
> > >
> > > Just using "not" full APIs to identify if a grace period is
> > > passed or not might lead to a false-positive kernel splat.
> > >
> > > It can happen, because get_state_synchronize_rcu() compresses
> > > both normal and expedited states into one single unsigned long
> > > value, so a poll_state_synchronize_rcu() can miss GP-completion
> > > when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> > > run.
> > >
> > > To address this, switch to poll_state_synchronize_rcu_full() and
> > > get_state_synchronize_rcu_full() APIs, which use separate variables
> > > for expedited and normal states.
> > >
> > > Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
> >
> > I switch this into "Closes:" per checkpatch.
> >
> > > Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> > > Reported-by: cheung wall
> > > Signed-off-by: Uladzislau Rezki (Sony)
> >
> > You seem to forget add Paul's Reviewed-by, so I add it in rcu/next.
> > Would you or Paul double-check the Reviewed-by should be here?
>
> I am good with keeping my Reviewed-by tags.
>
Thanks Paul!
--
Uladzislau Rezki
Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
Hi Ulad,
I put these three patches into next (and misc.2025.02.27a) for some
testing, hopefully it all goes well and they can make it v6.15.
A few tag changed below:
On Thu, Feb 27, 2025 at 02:16:13PM +0100, Uladzislau Rezki (Sony) wrote:
> Switch for using of get_state_synchronize_rcu_full() and
> poll_state_synchronize_rcu_full() pair to debug a normal
> synchronize_rcu() call.
>
> Just using "not" full APIs to identify if a grace period is
> passed or not might lead to a false-positive kernel splat.
>
> It can happen, because get_state_synchronize_rcu() compresses
> both normal and expedited states into one single unsigned long
> value, so a poll_state_synchronize_rcu() can miss GP-completion
> when synchronize_rcu()/synchronize_rcu_expedited() concurrently
> run.
>
> To address this, switch to poll_state_synchronize_rcu_full() and
> get_state_synchronize_rcu_full() APIs, which use separate variables
> for expedited and normal states.
>
> Link: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/
I switch this into "Closes:" per checkpatch.
> Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency")
> Reported-by: cheung wall
> Signed-off-by: Uladzislau Rezki (Sony)
You seem to forget add Paul's Reviewed-by, so I add it in rcu/next.
Would you or Paul double-check the Reviewed-by should be here?
Regards,
Boqun
> ---
> include/linux/rcupdate_wait.h | 3 +++
> kernel/rcu/tree.c | 8 +++-
> 2 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/rcupdate_wait.h b/include/linux/rcupdate_wait.h
> index f9bed3d3f78d..4c92d4291cce 100644
> --- a/include/linux/rcupdate_wait.h
> +++ b/include/linux/rcupdate_wait.h
> @@ -16,6 +16,9 @@
> struct rcu_synchronize {
> struct rcu_head head;
> struct completion completion;
> +
> + /* This is for debugging. */
> + struct rcu_gp_oldstate oldstate;
> };
> void wakeme_after_rcu(struct rcu_head *head);
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8625f616c65a..48384fa2eaeb 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1632,12 +1632,10 @@ static void rcu_sr_normal_complete(struct llist_node
> *node)
> {
> struct rcu_synchronize *rs = container_of(
> (struct rcu_head *) node, struct rcu_synchronize, head);
> - unsigned long oldstate = (unsigned long) rs->head.func;
>
> WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) &&
> - !poll_state_synchronize_rcu(oldstate),
> - "A full grace period is not passed yet: %lu",
> - rcu_seq_diff(get_state_synchronize_rcu(), oldstate));
> + !poll_state_synchronize_rcu_full(&rs->oldstate),
> + "A full grace period is not passed yet!\n");
>
> /* Finally. */
> complete(&rs->completion);
> @@ -3247,7 +3245,7 @@ static void synchronize_rcu_normal(void)
>* snapshot before adding a request.
>*/
> if (IS_ENABLED(CONFIG_PROVE_RCU))
> - rs.head.func = (void *) get_state_synchronize_rcu();
> + get_state_synchronize_rcu_full(&rs.oldstate);
>
> rcu_sr_normal_add_req(&rs);
>
> --
> 2.39.5
>

