Hi, > This patch aims to improve robustness of 'usb' command operation on > the ehci-hcd IP block as it ports to contemporary U-Boot the patch > described and provided in [1] (originally applicable to U-Boot > 2016.05). > > According to the fix author - "rayvt" (from [1]): > > TD stands for "Queue Element Transfer Descriptor (qTD)", which is a > micro-code instruction for the EHCI USB chip. > The "token" field is detailed information, control, and status of the > TD's data transfer. In particular, the rightmost byte is the status > field. 0x80 bit means the TD is active and the data transfer has not > yet completed. 0x08 bit means there was some sort of data transfer > error (XactErr). ^^^^^^^ - this error is NOT handled now in mainline u-boot. Do you have any experience with it?
> > If the drive gets a transfer error, it refuses to do any other I/O > transfer until the error is properly cleared and reset. U-boot did > not do this, so every subsequent disk read would get a timeout error > because the drive would not send any data. The fix is to properly > clear and reset the USB disk when it gets a transfer error. > > Every read operation starts at the maximum block size. When the USB > pendrive is not able to correctly serve this data read request, the > dynamic reduction of IO size is performed. Up to six tries (with > smaller IO block each time) are attempted. > > A related problem is that some drives are slow to come up. Linux > handles this by issuing a spinup command and allowing more time for > the drive to respond. The same idea is applied in this fix. > > On TPC70 (i.MX6Q) once per ~10 times (without this patch): > > Bus usb@2184200: USB EHCI 1.00 > scanning bus usb@2184200 for devices... 2 USB Device(s) found > scanning usb for storage devices... 1 Storage Device(s) found > EHCI timed out on TD - token=0x1f8c80 This is how the error gets evident. The detailed explanation is in link [1]. > > Performance impact: > > With U-Boot -master in mainline: > 16869440 bytes read in 979 ms (16.4 MiB/s) > > With this patch: > 16869440 bytes read in 1219 ms (13.2 MiB/s) > > Links: > [1] - https://forum.doozan.com/read.php?3,35295,35295#msg-35295 > [2] - > https://www.dropbox.com/s/nrkrd1no63viuu8/uboot-bodhi-2016.05-timeoutTD.patch?dl=0 > > Signed-off-by: Lukasz Majewski <lu...@denx.de> > [Unfortunately, the original patch [2] did not contain S-o-B from the > original author - "rayvt"] > --- > > common/usb.c | 10 +++- > common/usb_storage.c | 106 > +++++++++++++++++++++++++++++++++--- drivers/usb/host/ehci-hcd.c | > 55 +++++++++++++++++-- include/usb.h | 1 + > include/usb_defs.h | 1 + > 5 files changed, 160 insertions(+), 13 deletions(-) > > diff --git a/common/usb.c b/common/usb.c > index 349e838f1d..305482b5bb 100644 > --- a/common/usb.c > +++ b/common/usb.c > @@ -925,14 +925,20 @@ static int get_descriptor_len(struct usb_device > *dev, int len, int expect_len) __maybe_unused struct > usb_device_descriptor *desc; ALLOC_CACHE_ALIGN_BUFFER(unsigned char, > tmpbuf, USB_BUFSIZ); int err; > + int retry = 5; > > desc = (struct usb_device_descriptor *)tmpbuf; > > +again: > err = usb_get_descriptor(dev, USB_DT_DEVICE, 0, desc, len); > if (err < expect_len) { > if (err < 0) { > - printf("unable to get device descriptor > (error=%d)\n", > - err); > + printf("unable to get device descriptor > (error=%d) retry: %d\n", > + err, retry); > + mdelay(50); > + if (--retry >= 0) > + /* Some drives are just slow to wake > up. */ > + goto again; This I think is just to "wait" for the inserted device if it is rather slow (or probably its memory got degraded over time). This shouldn't bring any transfer slow down. > return err; > } else { > printf("USB device descriptor short read > (expected %i, got %i)\n", diff --git a/common/usb_storage.c > b/common/usb_storage.c index 097b6729c1..48c8c2ae64 100644 > --- a/common/usb_storage.c > +++ b/common/usb_storage.c > @@ -111,6 +111,18 @@ int usb_stor_get_info(struct usb_device *dev, > struct us_data *us, struct blk_desc *dev_desc); > int usb_storage_probe(struct usb_device *dev, unsigned int ifnum, > struct us_data *ss); > + > +#ifdef CONFIG_USB_EHCI_HCD > + /* > + * The U-Boot EHCI driver can handle any transfer length as > long as > + * there is enough free heap space left, but the SCSI > READ(10) and > + * WRITE(10) commands are limited to 65535 blocks. > + */ > +int usb_max_xfer_blk = 4096; Setting it to magic value - 4096 may be indeed problematic. However, I don't know what is the 'usual' transfer size. > +#else > +int usb_max_xfer_blk = 20; This is also a magic value - probably took from some old U-Boot (when default transfer was set to this value). I do guess that this snippet shall be removed from this patch set. > +#endif > + > #if CONFIG_IS_ENABLED(BLK) > static unsigned long usb_stor_read(struct udevice *dev, lbaint_t > blknr, lbaint_t blkcnt, void *buffer); > @@ -729,6 +741,7 @@ static int usb_stor_BBB_transport(struct scsi_cmd > *srb, struct us_data *us) pipeout = usb_sndbulkpipe(us->pusb_dev, > us->ep_out); /* DATA phase + error handling */ > data_actlen = 0; > + mdelay(10); /* Like linux does. */ > /* no data, go immediately to the STATUS phase */ > if (srb->datalen == 0) > goto st; > @@ -740,6 +753,16 @@ static int usb_stor_BBB_transport(struct > scsi_cmd *srb, struct us_data *us) > result = usb_bulk_msg(us->pusb_dev, pipe, srb->pdata, > srb->datalen, &data_actlen, USB_CNTL_TIMEOUT * 5); > + > + /* special handling of XACTERR in DATA phase */ > + if (result < 0 && (us->pusb_dev->status & USB_ST_XACTERR)) { > + debug("XACTERR in data phase - clr, reset, and > return fail.\n"); > + usb_stor_BBB_clear_endpt_stall(us, dir_in ? > + us->ep_in : > us->ep_out); > + usb_stor_BBB_reset(us); > + return USB_STOR_TRANSPORT_FAILED; > + } > + Here the case of XACTERR is handled. This is missing now in -master. > /* special handling of STALL in DATA phase */ > if ((result < 0) && (us->pusb_dev->status & USB_ST_STALLED)) > { debug("DATA:stall\n"); > @@ -1013,9 +1036,32 @@ static int usb_request_sense(struct scsi_cmd > *srb, struct us_data *ss) return 0; > } > > +/* > + * This spins up the disk and also consumes the time that the > + * disk takes to become active and ready to read data. > + * Some drives (like Western Digital) can take more than 5 seconds. > + * The delay occurs on the 1st data read from the disk. > + * Extending the timeout here works better than handling the timeout > + * as an error on a "real" read. > + */ > +static int usb_spinup(struct scsi_cmd *srb, struct us_data *ss) > +{ > + memset(&srb->cmd[0], 0, 12); > + srb->cmd[0] = SCSI_START_STP; > + srb->cmd[1] = srb->lun << 5; > + srb->cmd[4] = 1; /* Start spinup. */ > + srb->datalen = 0; > + srb->cmdlen = 6; > + ss->pusb_dev->extra_timeout = 9876; > + ss->transport(srb, ss); > + ss->pusb_dev->extra_timeout = 0; > + return 0; > +} > + > static int usb_test_unit_ready(struct scsi_cmd *srb, struct us_data > *ss) { > int retries = 10; > + int gave_extra_time = 0; > > do { > memset(&srb->cmd[0], 0, 12); > @@ -1038,6 +1084,17 @@ static int usb_test_unit_ready(struct scsi_cmd > *srb, struct us_data *ss) if ((srb->sense_buf[2] == 0x02) && > (srb->sense_buf[12] == 0x3a)) > return -1; > + /* > + * If the status is "Not Ready - becoming ready", > give it > + * more time. Linux issues a spinup command (once) > and gives > + * it 100 seconds. > + */ > + if (srb->sense_buf[2] == 0x02 && > + srb->sense_buf[12] == 0x04 && > + gave_extra_time == 0) { > + retries = 100; /* Allow 10 seconds. */ > + gave_extra_time = retries; > + } The above code looks like targeting the HDDs, but some degraded pen drives may need this time as well for proper operation. > mdelay(100); > } while (retries--); > > @@ -1166,22 +1223,57 @@ static unsigned long usb_stor_read(struct > blk_desc *block_dev, lbaint_t blknr, block_dev->devnum, start, blks, > buf_addr); > do { > - /* XXX need some comment here */ > + /* > + * Probably most errors are USB errors, not hard > disk error. > + * Many disks use a USB chip that is flaky when > doing large > + * transfers. The workaround is to dynamically > reduce the > + * transfer size and allow an additional try. > + * This should pick up flaky disks. Linux uses a > quirks table. > + * We'll use observation: > + * Give it 1 try very large, 1 try large, 2 tries > medium and 2 > + * tries small(ish). > + * On a solid fail (actual disk error - which should > be rare), > + * this will give us 6 tries max, and only that many > if the read > + * is quite large. > + * A fail on a very short read obviously doesn't > have a > + * too-large max_blks. Timeout due to spinup being > a case in > + * this point. > + */ > retry = 2; > srb->pdata = (unsigned char *)buf_addr; > - if (blks > ss->max_xfer_blk) > - smallblks = ss->max_xfer_blk; > +retry_it: > + if (blks > usb_max_xfer_blk) > + smallblks = usb_max_xfer_blk; > else > smallblks = (unsigned short) blks; > -retry_it: > - if (smallblks == ss->max_xfer_blk) > + > + if (smallblks == usb_max_xfer_blk) > usb_show_progress(); > + > srb->datalen = block_dev->blksz * smallblks; > srb->pdata = (unsigned char *)buf_addr; > if (usb_read_10(srb, ss, start, smallblks)) { > debug("Read ERROR\n"); > ss->flags &= ~USB_READY; > usb_request_sense(srb, ss); > + > + /* Dynamically reduce the I/O size. */ > + if (smallblks > 2047) { > + usb_max_xfer_blk = 2047; > + ++retry; > + } else if (smallblks > 512) { > + usb_max_xfer_blk = 512; > + ++retry; > + } else if (smallblks > 511) { > + usb_max_xfer_blk = 511; > + ++retry; > + } else if (smallblks > 63) { > + usb_max_xfer_blk = 63; > + retry += 2; > + } > + debug("step down usb_max_xfer_blk to %d\n", > + usb_max_xfer_blk); > + This shall not be needed, as the ehci controller's driver will scatter the transfers up till 240 blocks max. Am I right? The above code was supposed to work before the 240 block limitation was in place? > if (retry--) > goto retry_it; > blkcnt -= blks; > @@ -1196,8 +1288,7 @@ retry_it: > start, smallblks, buf_addr); > > usb_disable_asynch(0); /* asynch transfer allowed */ > - if (blkcnt >= ss->max_xfer_blk) > - debug("\n"); > + > return blkcnt; > } > > @@ -1492,6 +1583,7 @@ int usb_stor_get_info(struct usb_device *dev, > struct us_data *ss, dev_desc->log2blksz = LOG2(dev_desc->blksz); > dev_desc->type = perq; > debug(" address %d\n", dev_desc->target); > + usb_spinup(pccb, ss); As above - this may be required for some "slower" (i.e. degraded) disks. > > return 1; > } > diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c > index 1cc02052f5..c4041f1e9c 100644 > --- a/drivers/usb/host/ehci-hcd.c > +++ b/drivers/usb/host/ehci-hcd.c > @@ -315,7 +315,9 @@ ehci_submit_async(struct usb_device *dev, > unsigned long pipe, void *buffer, int timeout; > int ret = 0; > struct ehci_ctrl *ctrl = ehci_get_ctrl(dev); > + int trynum; > > + debug("\n***** %s ****\n", __func__); > debug("dev=%p, pipe=%lx, buffer=%p, length=%d, req=%p\n", > dev, pipe, buffer, length, req); > if (req != NULL) > @@ -557,6 +559,15 @@ ehci_submit_async(struct usb_device *dev, > unsigned long pipe, void *buffer, ehci_writel(&ctrl->hcor->or_usbsts, > (usbsts & 0x3f)); > /* Enable async. schedule. */ > + trynum = 1; /* No more than 2 tries, in case of > XACTERR. */ > + /* When the 1st try gets xacterr, > + * 2nd try gets xacterr and often babble and/or halted. > + * 3rd try times out. > + * After the 2nd try, the disk has recovered, so we need to > clear and > + * reset the USB port, then return fail so the upper layer > can retry. > + */ > +retry_xacterr: > + vtd = &qtd[qtd_counter - 1]; > cmd = ehci_readl(&ctrl->hcor->or_usbcmd); > if (!(cmd & CMD_ASE)) { > cmd |= CMD_ASE; > @@ -572,8 +583,9 @@ ehci_submit_async(struct usb_device *dev, > unsigned long pipe, void *buffer, > /* Wait for TDs to be processed. */ > ts = get_timer(0); > - vtd = &qtd[qtd_counter - 1]; > + > timeout = USB_TIMEOUT_MS(pipe); > + timeout += dev->extra_timeout; > do { > /* Invalidate dcache */ > invalidate_dcache_range((unsigned > long)&ctrl->qh_list, @@ -588,9 +600,10 @@ ehci_submit_async(struct > usb_device *dev, unsigned long pipe, void *buffer, break; > WATCHDOG_RESET(); > } while (get_timer(ts) < timeout); > - qhtoken = hc32_to_cpu(qh->qh_overlay.qt_token); > > - ctrl->qh_list.qh_link = > cpu_to_hc32(virt_to_phys(&ctrl->qh_list) | QH_LINK_TYPE_QH); > + debug("took %4lu ms of %4d\n", get_timer(ts), timeout); > + ctrl->qh_list.qh_link = > + cpu_to_hc32(virt_to_phys(&ctrl->qh_list) | > QH_LINK_TYPE_QH); flush_dcache_range((unsigned long)&ctrl->qh_list, > ALIGN_END_ADDR(struct QH, &ctrl->qh_list, 1)); > > @@ -605,14 +618,48 @@ ehci_submit_async(struct usb_device *dev, > unsigned long pipe, void *buffer, */ > if (buffer != NULL && length > 0) > invalidate_dcache_range((unsigned long)buffer, > - ALIGN((unsigned long)buffer + length, > ARCH_DMA_MINALIGN)); > + ALIGN((unsigned long)buffer + length, > + ARCH_DMA_MINALIGN)); > > /* Check that the TD processing happened */ > if (QT_TOKEN_GET_STATUS(token) & QT_TOKEN_STATUS_ACTIVE) > printf("EHCI timed out on TD - token=%#x\n", token); > > + /* Disable async schedule. */ > + cmd = ehci_readl(&ctrl->hcor->or_usbcmd); > + cmd &= ~CMD_ASE; > + ehci_writel(&ctrl->hcor->or_usbcmd, cmd); > + > + ret = handshake((uint32_t *)&ctrl->hcor->or_usbsts, STS_ASS, > 0, > + 100 * 1000); > + if (ret < 0) { > + printf("EHCI fail timeout STS_ASS reset\n"); > + goto fail; > + } > + > + qhtoken = hc32_to_cpu(qh->qh_overlay.qt_token); > if (!(QT_TOKEN_GET_STATUS(qhtoken) & > QT_TOKEN_STATUS_ACTIVE)) { debug("TOKEN=%#x\n", qhtoken); > + if (qhtoken & QT_TOKEN_STATUS_XACTERR) { ^^^^^^^^^^^^^^^^^^^^^^^^ - this flag before this patch was also not checked. > + if (--trynum >= 0) { > + /* > + * It is necessary to do this, > otherwise the > + * disk is clagged. > + */ > + debug("reset the TD and redo, > because of XACTERR\n"); > + qhtoken &= ~QT_TOKEN_STATUS_HALTED; > + qhtoken |= QT_TOKEN_STATUS_ACTIVE | > + QT_TOKEN_CERR(2); > + vtd->qt_token = cpu_to_hc32(qhtoken); > + qh->qh_overlay.qt_token = > cpu_to_hc32(qhtoken); > + goto retry_xacterr; > + } > + dev->status = USB_ST_XACTERR; > + dev->act_len = length - > + QT_TOKEN_GET_TOTALBYTES(qhtoken); This may solve the issue - as it resets the controller in the case of transfer error. > + goto fail; > + } > + > switch (QT_TOKEN_GET_STATUS(qhtoken) & > ~(QT_TOKEN_STATUS_SPLITXSTATE | > QT_TOKEN_STATUS_PERR)) { case 0: > diff --git a/include/usb.h b/include/usb.h > index efb67ea33f..41c8c47f89 100644 > --- a/include/usb.h > +++ b/include/usb.h > @@ -140,6 +140,7 @@ struct usb_device { > int act_len; /* transferred bytes */ > int maxchild; /* Number of ports if > hub */ int portnr; /* Port number, 1=first */ > + int extra_timeout; /* Add to timeout in > ehci_submit_async */ #if !CONFIG_IS_ENABLED(DM_USB) > /* parent hub, or NULL if this is the root hub */ > struct usb_device *parent; > diff --git a/include/usb_defs.h b/include/usb_defs.h > index 6dd2c997f9..572f6ab296 100644 > --- a/include/usb_defs.h > +++ b/include/usb_defs.h > @@ -197,6 +197,7 @@ > #define USB_ST_NAK_REC 0x10 /* NAK Received*/ > #define USB_ST_CRC_ERR 0x20 /* CRC/timeout Error */ > #define USB_ST_BIT_ERR 0x40 /* Bitstuff error */ > +#define USB_ST_XACTERR 0x80 /* XACTERR error */ > #define USB_ST_NOT_PROC 0x80000000L /* Not yet > processed */ > Best regards, Lukasz Majewski -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lu...@denx.de
pgpEBHvDasMPs.pgp
Description: OpenPGP digital signature