On Wed, Apr 01, 2020 at 12:58:23PM +1100, Jonathan Gray wrote:
> On Wed, Mar 18, 2020 at 01:41:06PM +0100, Patrick Wildt wrote:
> > On Wed, Mar 18, 2020 at 11:22:40AM +0100, Patrick Wildt wrote:
> > > Hi,
> > > 
> > > I've spent a few days investigating why USB ethernet adapters are so
> > > horribly slow on my ARMs.  Using dt(4) I realized that it was spending
> > > most of its time in memcpy.  But, why?  As it turns out, all USB data
> > > buffers are mapped COHERENT, which on some/most ARMs means uncached.
> > > Using cached data buffers makes the performance rise from 20 mbit/s to
> > > 200 mbit/s.  Quite a difference.
> > > 
> > > sys/dev/usb/usb_mem.c:
> > >   error = bus_dmamem_map(tag, p->segs, p->nsegs, p->size,
> > >                          &p->kaddr, BUS_DMA_NOWAIT|BUS_DMA_COHERENT);
> > > 
> > > On x86, COHERENT is essentially a no-op.  On ARM, it depends on the SoC.
> > > Some SoCs have cache-coherent USB controllers, some don't.  Mine does
> > > not, so mapping it COHERENT means uncached and thus slow.
> > > 
> > > Why do we do that?  Well, when the code was imported in 99, it was
> > > already there.  Since then we have gained infrastructure for DMA
> > > syncs in the USB stack, which I think are proper.
> > > 
> > > sys/dev/usb/usbdi.c - usbd_transfer() (before transfer)
> > > 
> > >   if (!usbd_xfer_isread(xfer)) {
> > >           if ((xfer->flags & USBD_NO_COPY) == 0)
> > >                   memcpy(KERNADDR(&xfer->dmabuf, 0), xfer->buffer,
> > >                       xfer->length);
> > >           usb_syncmem(&xfer->dmabuf, 0, xfer->length,
> > >               BUS_DMASYNC_PREWRITE);
> > >   } else
> > >           usb_syncmem(&xfer->dmabuf, 0, xfer->length,
> > >               BUS_DMASYNC_PREREAD);
> > >   err = pipe->methods->transfer(xfer);
> > > 
> > > sys/dev/usb/usbdi.c - usb_transfer_complete() (after transfer)
> > > 
> > >   if (xfer->actlen != 0) {
> > >           if (usbd_xfer_isread(xfer)) {
> > >                   usb_syncmem(&xfer->dmabuf, 0, xfer->actlen,
> > >                       BUS_DMASYNC_POSTREAD);
> > >                   if (!(xfer->flags & USBD_NO_COPY))
> > >                           memcpy(xfer->buffer, KERNADDR(&xfer->dmabuf, 0),
> > >                               xfer->actlen);
> > >           } else
> > >                   usb_syncmem(&xfer->dmabuf, 0, xfer->actlen,
> > >                       BUS_DMASYNC_POSTWRITE);
> > >   }
> > > 
> > > We cannot just remove COHERENT, since some drivers, like ehci(4), use
> > > the same backend to allocate their rings.  And I can't vouch for those
> > > drivers' sanity.
> > > 
> > > As a first step, I would like to go ahead with another solution, which
> > > is based on a diff from Marius Strobl, who added those syncs in the
> > > first place.  Essentially it splits the memory handling into cacheable
> > > and non-cacheable blocks.  The USB data transfers and everyone who uses
> > > usbd_alloc_buffer() then use cacheable buffers, while code like ehci(4)
> > > still don't.  This is a bit of a safer approach imho, since we don't
> > > hurt the controller drivers, but speed up the data buffers.
> > > 
> > > Once we have verified that there are no regressions, we can adjust
> > > ehci(4) and the like, add proper syncs, make sure they still work as
> > > well as before, and maybe then back this out again.
> > > 
> > > Keep note that this is all a no-op on X86, but all the other archs will
> > > profit from this.
> > > 
> > > ok?
> > > 
> > > Patrick
> > 
> > Update diff with inverted logic.  kettenis@ argues that we should
> > invert the logic, and those who need COHERENT memory should ask
> > for that explicitly, since for bus_dmamem_map() it also needs to
> > be passed explicitly.  This also points out all those users that
> > use usb_allocmem() internally, where we might want to have a look
> > if COHERENT is actually needed or not, or if it can be refactored
> > in another way.
> 
> These commits broke usb on imx.6 with cubox:
> 
> imxehci0 at simplebus3
> usb0 at imxehci0: USB revision 2.0
> usb0: root hub problem
> imxehci1 at simplebus3
> usb1 at imxehci1: USB revision 2.0
> usb1: root hub problem
> "usbmisc" at simplebus3 not configured

pandaboard with omap4 (another cortex a9) also has broken usb with
the latest snapshot:

omehci0 at simplebus0
usb0 at omehci0: USB revision 2.0
usb0: root hub problem

Reply via email to