On 26/10/20(Mon) 11:57, Scott Cheloha wrote:
> On Mon, Oct 12, 2020 at 11:11:36AM +0200, Martin Pieuchot wrote:
> [...]
> > +/*
> > + * Scan the kqueue, blocking if necessary until the target time is reached.
> > + * If tsp is NULL we block indefinitely. If tsp->ts_secs/nsecs are both
> > + * 0 we do not block at all.
> > + */
> > int
> > kqueue_scan(struct kqueue_scan_state *scan, int maxevents,
> > - struct kevent *ulistp, struct timespec *tsp, struct kevent *kev,
> > - struct proc *p, int *retval)
> > + struct kevent *kevp, struct timespec *tsp, struct proc *p, int *errorp)
>
> Is there any reason to change these argument names?
The array is no longer a user specified list because the interface is
now used by other kernel consumers.
> > @@ -618,6 +631,8 @@ dopselect(struct proc *p, int nd, fd_set
> > pobits[2] = (fd_set *)&bits[5];
> > }
> >
> > + kqpoll_init(p);
> > +
> > #define getbits(name, x) \
> > if (name && (error = copyin(name, pibits[x], ni))) \
> > goto done;
> > @@ -636,43 +651,63 @@ dopselect(struct proc *p, int nd, fd_set
> > if (sigmask)
> > dosigsuspend(p, *sigmask &~ sigcantmask);
> >
> > -retry:
> > - ncoll = nselcoll;
> > - atomic_setbits_int(&p->p_flag, P_SELECT);
> > - error = selscan(p, pibits[0], pobits[0], nd, ni, retval);
> > - if (error || *retval)
> > + /* Register kqueue events */
> > + if ((error = pselregister(p, pibits[0], nd, ni, &nevents) != 0))
> > goto done;
> > - if (timeout == NULL || timespecisset(timeout)) {
> > - if (timeout != NULL) {
> > - getnanouptime(&start);
> > - nsecs = MIN(TIMESPEC_TO_NSEC(timeout), MAXTSLP);
> > - } else
> > - nsecs = INFSLP;
> > - s = splhigh();
> > - if ((p->p_flag & P_SELECT) == 0 || nselcoll != ncoll) {
> > - splx(s);
> > - goto retry;
> > - }
> > - atomic_clearbits_int(&p->p_flag, P_SELECT);
> > - error = tsleep_nsec(&selwait, PSOCK | PCATCH, "select", nsecs);
>
> I need to clarify something.
>
> My understanding of the current state of poll/select is that all
> threads wait on the same channel, selwait. Sometimes, a thread will
> wakeup(9) *all* threads waiting on that channel. When this happens,
> most of the sleeping threads will recheck their conditions, see that
> nothing has changed, and go back to sleep.
>
> Right?
>
> Is that spurious wakeup case going away with this diff? That is, when
> a thread is sleeping in poll/select it will only be woken up by
> another thread if the condition for one of the descriptors of interest
> changes.
Your understanding matches mine, the removal of spurious wakeup might
be beneficial to some programs. I see this as a pro of this
implementation.
>
> > - splx(s);
> > - if (timeout != NULL) {
> > - getnanouptime(&stop);
> > - timespecsub(&stop, &start, &elapsed);
> > - timespecsub(timeout, &elapsed, timeout);
> > - if (timeout->tv_sec < 0)
> > - timespecclear(timeout);
> > - }
> > - if (error == 0 || error == EWOULDBLOCK)
> > - goto retry;
> > +
> > + /*
> > + * The poll/select family of syscalls has been designed to
> > + * block when file descriptors are not available, even if
> > + * there's nothing to wait for.
> > + */
> > + if (nevents == 0) {
> > + uint64_t nsecs = INFSLP;
> > +
> > + if (timeout != NULL)
> > + nsecs = MAX(1, MIN(TIMESPEC_TO_NSEC(timeout), MAXTSLP));
> > +
> > + error = tsleep_nsec(&p->p_kq, PSOCK | PCATCH, "kqsel", nsecs);
> > + /* select is not restarted after signals... */
> > + if (error == ERESTART)
> > + error = EINTR;
> > + if (error == EWOULDBLOCK)
> > + error = 0;
> > + goto done;
> > + }
>
> This is still wrong. If you want to isolate this case you can't block
> when the timeout is empty:
Thanks, fixed.
> > Index: sys/proc.h
> > ===================================================================
> > RCS file: /cvs/src/sys/sys/proc.h,v
> > retrieving revision 1.300
> > diff -u -p -r1.300 proc.h
> > --- sys/proc.h 16 Sep 2020 08:01:15 -0000 1.300
> > +++ sys/proc.h 12 Oct 2020 08:56:21 -0000
> > @@ -320,6 +320,7 @@ struct process {
> >
> > struct kcov_dev;
> > struct lock_list_entry;
> > +struct kqueue;
> >
> > struct p_inentry {
> > u_long ie_serial;
> > @@ -382,6 +383,8 @@ struct proc {
> > struct plimit *p_limit; /* [l] read ref. of p_p->ps_limit */
> > struct kcov_dev *p_kd; /* kcov device handle */
> > struct lock_list_entry *p_sleeplocks; /* WITNESS lock tracking */
> > + struct kqueue *p_kq; /* for select/poll */
> > + unsigned long p_kq_serial; /* for select/poll */
>
> arc4random(9) returns a uint32_t. Why do you want an unsigned long here?
To make sure the field has the same size as "void *". See how it is
used in EV_SET().