On Mon, Jun 06, 2022 at 06:47:32AM +1000, David Gwynne wrote:
> On Sun, Jun 05, 2022 at 03:57:39PM +0000, Visa Hankala wrote:
> > On Sun, Jun 05, 2022 at 12:27:32PM +0200, Martin Pieuchot wrote:
> > > On 05/06/22(Sun) 05:20, Visa Hankala wrote:
> > > > Encountered the following panic:
> > > > 
> > > > panic: kernel diagnostic assertion "(p->p_flag & P_TIMEOUT) == 0" 
> > > > failed: file "/usr/src/sys/kern/kern_synch.c", line 373
> > > > Stopped at      db_enter+0x10:  popq    %rbp
> > > >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > > >  423109  57118     55         0x3          0    2  link
> > > >  330695  30276     55         0x3          0    3  link
> > > > * 46366  85501     55      0x1003  0x4080400    1  link
> > > >  188803  85501     55      0x1003  0x4082000    0K link
> > > > db_enter() at db_enter+0x10
> > > > panic(ffffffff81f25d2b) at panic+0xbf
> > > > __assert(ffffffff81f9a186,ffffffff81f372c8,175,ffffffff81f87c6c) at 
> > > > __assert+0x25
> > > > sleep_setup(ffff800022d64bf8,ffff800022d64c98,20,ffffffff81f66ac6,0) at 
> > > > sleep_setup+0x1d8
> > > > cond_wait(ffff800022d64c98,ffffffff81f66ac6) at cond_wait+0x46
> > > > timeout_barrier(ffff8000228a28b0) at timeout_barrier+0x109
> > > > timeout_del_barrier(ffff8000228a28b0) at timeout_del_barrier+0xa2
> > > > sleep_finish(ffff800022d64d90,1) at sleep_finish+0x16d
> > > > tsleep(ffffffff823a5130,120,ffffffff81f0b730,2) at tsleep+0xb2
> > > > sys_nanosleep(ffff8000228a27f0,ffff800022d64ea0,ffff800022d64ef0) at 
> > > > sys_nanosleep+0x12d
> > > > syscall(ffff800022d64f60) at syscall+0x374
> > > > 
> > > > The panic is a regression of sys/kern/kern_timeout.c r1.84. Previously,
> > > > soft-interrupt-driven timeouts could be deleted synchronously without
> > > > blocking. Now, timeout_del_barrier() can sleep regardless of the type
> > > > of the timeout.
> > > > 
> > > > It looks that with small adjustments timeout_del_barrier() can sleep
> > > > in sleep_finish(). The management of run queues is not affected because
> > > > the timeout clearing happens after it. As timeout_del_barrier() does not
> > > > rely on a timeout or signal catching, there should be no risk of
> > > > unbounded recursion or unwanted signal side effects within the sleep
> > > > machinery. In a way, a sleep with a timeout is higher-level than
> > > > one without.
> > > 
> > > I trust you on the analysis.  However this looks very fragile to me.
> > > 
> > > The use of timeout_del_barrier() which can sleep using the global sleep
> > > queue is worrying me.  
> > 
> > I think the queue handling ends in sleep_finish() when SCHED_LOCK()
> > is released. The timeout clearing is done outside of it.
> 
> That's ok.
> 
> > The extra sleeping point inside sleep_finish() is subtle. It should not
> > matter in typical use. But is it permissible with the API? Also, if
> > timeout_del_barrier() sleeps, the thread's priority can change.
> 
> What other options do we have at this point? Spin? Allocate the timeout
> dynamically so sleep_finish doesn't have to wait for it and let the
> handler clean up? How would you stop the timeout handler waking up the
> thread if it's gone back to sleep again for some other reason?

In principle, each thread could have a sleep serial number. If the
serial number was somehow associated with the timeout, the handler
could bail out if the thread has moved on. However, implementing that
association looks tricky. Dynamic allocation could provide a suitable
context struct, but memory allocation in this code would be awful.
Maybe there should be an extended version of timeout_add() that allows
controlled updating of the timeout argument.. Does not sound appealing.

Spinning might be an option in the future. The kernel lock complicates
things, however. The spinning wait would have to release the kernel lock
to avoid possible deadlocking.

> Sleeping here is the least worst option.

So it seems.

> As for timeout_del_barrier, if prio is a worry we can provide an
> advanced version of it that lets you pass the prio in. I'd also
> like to change timeout_barrier so it queues the barrier task at the
> head of the pending lists rather than at the tail.

I think it is not important to preserve the priority here. Later
scheduling events will override it anyway.

> 
> > Note that sleep_finish() already can take an additional nap when
> > signal catching is enabled.
> > 
> > > > Note that endtsleep() can run and set P_TIMEOUT during
> > > > timeout_del_barrier() when the thread is blocked in cond_wait().
> > > > To avoid unnecessary atomic read-modify-write operations, the clearing
> > > > of P_TIMEOUT could be conditional, but maybe that is an unnecessary
> > > > optimization at this point.
> > > 
> > > I agree this optimization seems unnecessary at the moment.
> > > 
> > > > While it should be possible to make the code use timeout_del() instead
> > > > of timeout_del_barrier(), the outcome might not be outright better. For
> > > > example, sleep_setup() and endtsleep() would have to coordinate so that
> > > > a late-running timeout from previous sleep cycle would not disturb the
> > > > new cycle.
> > > 
> > > So that's the price for not having to sleep in sleep_finish(), right?
> > 
> > That is correct. Some synchronization is needed in any case.
> > 
> > > > To test the barrier path reliably, I made the code call
> > > > timeout_del_barrier() twice in a row. The second call is guaranteed
> > > > to sleep. Of course, this is not part of the patch.
> > > 
> > > ok mpi@
> > > 
> > > > Index: kern/kern_synch.c
> > > > ===================================================================
> > > > RCS file: src/sys/kern/kern_synch.c,v
> > > > retrieving revision 1.187
> > > > diff -u -p -r1.187 kern_synch.c
> > > > --- kern/kern_synch.c   13 May 2022 15:32:00 -0000      1.187
> > > > +++ kern/kern_synch.c   5 Jun 2022 05:04:45 -0000
> > > > @@ -370,8 +370,8 @@ sleep_setup(struct sleep_state *sls, con
> > > >         p->p_slppri = prio & PRIMASK;
> > > >         TAILQ_INSERT_TAIL(&slpque[LOOKUP(ident)], p, p_runq);
> > > >  
> > > > -       KASSERT((p->p_flag & P_TIMEOUT) == 0);
> > > >         if (timo) {
> > > > +               KASSERT((p->p_flag & P_TIMEOUT) == 0);
> > > >                 sls->sls_timeout = 1;
> > > >                 timeout_add(&p->p_sleep_to, timo);
> > > >         }
> > > > @@ -432,13 +432,12 @@ sleep_finish(struct sleep_state *sls, in
> > > >  
> > > >         if (sls->sls_timeout) {
> > > >                 if (p->p_flag & P_TIMEOUT) {
> > > > -                       atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
> > > >                         error1 = EWOULDBLOCK;
> > > >                 } else {
> > > > -                       /* This must not sleep. */
> > > > +                       /* This can sleep. It must not use timeouts. */
> > > >                         timeout_del_barrier(&p->p_sleep_to);
> > > > -                       KASSERT((p->p_flag & P_TIMEOUT) == 0);
> > > >                 }
> > > > +               atomic_clearbits_int(&p->p_flag, P_TIMEOUT);
> > > >         }
> > > >  
> > > >         /* Check if thread was woken up because of a unwind or signal */
> > > > 
> > > 
> > 
> 

Reply via email to