On 26/02/15(Thu) 01:51, Martin Pieuchot wrote:
> On 22/02/15(Sun) 00:54, Martin Pieuchot wrote:
> > On 20/02/15(Fri) 23:15, Stefan Sperling wrote:
> > > In the following configuration I can pretty easily trigger endless
> > > screenfulls of scrolling "ehci_idone: ex=%p is done!" messages,
> > > where %p is to a constant pointer value (same in each message).
> > > 
> > > [ehci host]<--[usb extension cable (hub 1)]<--[usb keyboard (hub 
> > > 2)]<--[usb mouse]
> > > 
> > > So there's a USB extension cable (hub 1) that I plug a keyboard into.
> > > The keyboard in turn has a built-in hub that has a mouse plugged into it.
> > > The problem happens almost every time I pull the keyboard out of hub 1.
> > > 
> > > I've also seen it happen without any intervention on my part (In fact
> > > I was in a different city and couldn't use the box until I got home
> > > several hours later and hit the reset switch... that's no fun)
> > > 
> > > So if a transfer is cancelled (e.g. as a result of pulling the plug), 
> > > then:
> > > 
> > > - usbd_abort_pipe wants to abort a related transfer
> > > - ehci_abort_xfer schedules and waits for ehci_softintr, expecting
> > >   the softintr routine to deal with the cancelled transfer:
> > > 
> > >   /*
> > >    * Step 3: Make sure the soft interrupt routine has run. This
> > >    * should remove any completed items off the queue.
> > >    * The hardware has no reference to completed items (TDs).
> > >    * It's safe to remove them at any time.
> > >    */
> > >   s = splusb();
> > >   sc->sc_softwake = 1;
> > >   usb_schedsoftintr(&sc->sc_bus);
> > >   tsleep(&sc->sc_softwake, PZERO, "ehciab", 0);
> > > 
> > > - ehci_softintr gets scheduled
> > > - ehci_softintr loops over xfers on the sc_intrhead TAILQ and
> > >   invokes ehci_check_intr on each
> > > - ehci_check_intr eventually ends up calling ehci_idone
> > > - ehci_idone does nothing for cancelled transfers... ?!?
> > > 
> > >   if (xfer->status == USBD_CANCELLED ||
> > >       xfer->status == USBD_TIMEOUT) {
> > >           DPRINTF(("ehci_idone: aborted xfer=%p\n", xfer));
> > >           return;
> > >   }
> > > 
> > > - something else happens
> > > 
> > > - ehci_abort_xfer awakes from tsleep and sets ex->isdone, since it
> > >   expects the softinterrupt routine to have dealt with the xfer
> > > 
> > > - something else happens
> > > 
> > > - the host controller sends an INTERR interrupt
> > > - ehci_intr1 schedules ehci_softintr
> > > - ehci_softintr loops over xfers on the sc_intrhead TAILQ and
> > >   invokes ehci_check_intr on each
> > > - the cancelled xfer is still in the intrhead TAILQ and ends up in 
> > > ehci_idone
> > > - ehci_idone looks for the isdone flag which is now set, then it
> > >   complains and does nothing
> > > 
> > > - the host controller sends an INTERR interrupt
> > > ... same story again, we get an endless loop
> > > 
> > > This diff breaks the chain of events and fixes the endless loop for me.
> > > I can't reproduce the problem anymore by pulling the keyboard out.
> > > I don't quite understand how this prevents the flood of INTERR interrupts
> > > but it seems to work.
> > > 
> > > I assume there are nasty tentacles in USB land which I'm unfamiliar with.
> > > Is there any reason this could be a bad idea?
> > 
> > Stefan that's a really good analysis.  I think the diff might not be
> > completely correct though.
> > 
> > So basically you're removing the transfer from the active list.  That's
> > generally done after the USB callback has been executed, in your case in
> > ehci_device_intr_done().
> > 
> > For interrupt transfers (pipe->repeat is 1) the transfer is kept on the
> > list while the descriptors are freed/reallocated.  That should be safe
> > since we should be reusing the sames.  
> > 
> > So I don't know if we are missing a spl protection of if there's an xfer 
> > leak but I'm afraid that with your diff usb_transfer_complete() might
> > not be called for the failing xfer.
> > 
> > That's easy to check, look if the ehcixfer pool counter increase when
> > you detach your device.
> > 
> > I'm afraid I cannot help more as I am currently traveling :)
> 
> Here's an alternative diff that removes xfers from the list of
> currently pending transfers as soon as they are finished or
> cancelled.  This should also prevent the race you analysed.
> 
> With this diff ehci_idone() should no longer be called on a
> CANCELLED or TIMEOUT xfer which mean the 'isdone' check can
> be removed.

I reverted the previous diff because it did not properly abort transfer
that were on the pipe's list but not on the interrupt list.

Diff below should take care of that and add an extra check for
suspend/resume.  I'm looking for test reports and oks.

I tried to keep the diff small, cleanups can be done afterward.

Index: ehci.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/ehci.c,v
retrieving revision 1.176
diff -u -p -r1.176 ehci.c
--- ehci.c      6 Mar 2015 22:53:03 -0000       1.176
+++ ehci.c      9 Mar 2015 16:03:56 -0000
@@ -206,11 +206,7 @@ void               ehci_dump_exfer(struct ehci_xfer *
 #define ehci_add_intr_list(sc, ex) \
        TAILQ_INSERT_TAIL(&(sc)->sc_intrhead, (ex), inext);
 #define ehci_del_intr_list(sc, ex) \
-       do { \
-               TAILQ_REMOVE(&sc->sc_intrhead, (ex), inext); \
-               (ex)->inext.tqe_prev = NULL; \
-       } while (0)
-#define ehci_active_intr_list(ex) ((ex)->inext.tqe_prev != NULL)
+       TAILQ_REMOVE(&sc->sc_intrhead, (ex), inext);
 
 struct usbd_bus_methods ehci_bus_methods = {
        .open_pipe = ehci_open,
@@ -753,6 +749,7 @@ ehci_check_qh_intr(struct ehci_softc *sc
        }
  done:
        DPRINTFN(12, ("ehci_check_intr: ex=%p done\n", ex));
+       ehci_del_intr_list(sc, ex);
        timeout_del(&xfer->timeout_handle);
        usb_rem_task(xfer->pipe->device, &xfer->abort_task);
        ehci_idone(xfer);
@@ -803,6 +800,7 @@ ehci_check_itd_intr(struct ehci_softc *s
        return;
 done:
        DPRINTFN(12, ("ehci_check_itd_intr: ex=%p done\n", ex));
+       ehci_del_intr_list(sc, ex);
        timeout_del(&xfer->timeout_handle);
        usb_rem_task(xfer->pipe->device, &xfer->abort_task);
        ehci_idone(xfer);
@@ -1056,6 +1054,15 @@ ehci_activate(struct device *self, int a
        switch (act) {
        case DVACT_SUSPEND:
                rv = config_activate_children(self, act);
+
+#ifdef DIAGNOSTIC
+               if (!TAILQ_EMPTY(&sc->sc_intrhead)) {
+                       printf("%s: interrupt list not empty\n",
+                           sc->sc_bus.bdev.dv_xname);
+                       return (-1);
+               }
+#endif
+
                sc->sc_bus.use_polling++;
 
                for (i = 1; i <= sc->sc_noport; i++) {
@@ -2718,9 +2725,10 @@ ehci_abort_xfer(struct usbd_xfer *xfer, 
 
        DPRINTF(("ehci_abort_xfer: xfer=%p pipe=%p\n", xfer, epipe));
 
-       if (sc->sc_bus.dying) {
-               /* If we're dying, just do the software part. */
+       if (sc->sc_bus.dying || xfer->status == USBD_NOT_STARTED) {
                s = splusb();
+               if (xfer->status != USBD_NOT_STARTED)
+                       ehci_del_intr_list(sc, ex);
                xfer->status = status;  /* make software ignore it */
                timeout_del(&xfer->timeout_handle);
                usb_rem_task(xfer->device, &xfer->abort_task);
@@ -2759,6 +2767,7 @@ ehci_abort_xfer(struct usbd_xfer *xfer, 
        s = splusb();
        ex->ehci_xfer_flags |= EHCI_XFER_ABORTING;
        xfer->status = status;  /* make software ignore it */
+       ehci_del_intr_list(sc, ex);
        timeout_del(&xfer->timeout_handle);
        usb_rem_task(xfer->device, &xfer->abort_task);
        splx(s);
@@ -2827,8 +2836,10 @@ ehci_abort_isoc_xfer(struct usbd_xfer *x
 
        DPRINTF(("ehci_abort_isoc_xfer: xfer %p pipe %p\n", xfer, xfer->pipe));
 
-       if (sc->sc_bus.dying) {
+       if (sc->sc_bus.dying || xfer->status == USBD_NOT_STARTED) {
                s = splusb();
+               if (xfer->status != USBD_NOT_STARTED)
+                       ehci_del_intr_list(sc, ex);
                xfer->status = status;
                timeout_del(&xfer->timeout_handle);
                usb_rem_task(xfer->device, &xfer->abort_task);
@@ -2852,9 +2863,10 @@ ehci_abort_isoc_xfer(struct usbd_xfer *x
                        tsleep(&ex->ehci_xfer_flags, PZERO, "ehciiaw", 0);
                return;
        }
-       ex->ehci_xfer_flags |= EHCI_XFER_ABORTING;
 
+       ex->ehci_xfer_flags |= EHCI_XFER_ABORTING;
        xfer->status = status;
+       ehci_del_intr_list(sc, ex);
        timeout_del(&xfer->timeout_handle);
        usb_rem_task(xfer->device, &xfer->abort_task);
 
@@ -2998,8 +3010,7 @@ ehci_device_ctrl_done(struct usbd_xfer *
        }
 #endif
 
-       if (xfer->status != USBD_NOMEM && ehci_active_intr_list(ex)) {
-               ehci_del_intr_list(sc, ex);     /* remove from active list */
+       if (xfer->status != USBD_NOMEM) {
                ehci_free_sqtd_chain(sc, ex);
        }
 
@@ -3283,8 +3294,7 @@ ehci_device_bulk_done(struct usbd_xfer *
        DPRINTFN(10,("ehci_bulk_done: xfer=%p, actlen=%d\n",
            xfer, xfer->actlen));
 
-       if (xfer->status != USBD_NOMEM && ehci_active_intr_list(ex)) {
-               ehci_del_intr_list(sc, ex);     /* remove from active list */
+       if (xfer->status != USBD_NOMEM) {
                ehci_free_sqtd_chain(sc, ex);
                usb_syncmem(&xfer->dmabuf, 0, xfer->length,
                    usbd_xfer_isread(xfer) ?
@@ -3480,11 +3490,10 @@ ehci_device_intr_done(struct usbd_xfer *
                        timeout_set(&xfer->timeout_handle, ehci_timeout, xfer);
                        timeout_add_msec(&xfer->timeout_handle, xfer->timeout);
                }
-               splx(s);
-
+               ehci_add_intr_list(sc, ex);
                xfer->status = USBD_IN_PROGRESS;
-       } else if (xfer->status != USBD_NOMEM && ehci_active_intr_list(ex)) {
-               ehci_del_intr_list(sc, ex); /* remove from active list */
+               splx(s);
+       } else if (xfer->status != USBD_NOMEM) {
                ehci_free_sqtd_chain(sc, ex);
                usb_syncmem(&xfer->dmabuf, 0, xfer->length,
                    usbd_xfer_isread(xfer) ?
@@ -3789,8 +3798,7 @@ ehci_device_isoc_done(struct usbd_xfer *
 
        s = splusb();
        epipe->u.isoc.cur_xfers--;
-       if (xfer->status != USBD_NOMEM && ehci_active_intr_list(ex)) {
-               ehci_del_intr_list(sc, ex);
+       if (xfer->status != USBD_NOMEM) {
                ehci_rem_free_itd_chain(sc, ex);
        }
        splx(s);

Reply via email to