On Sun, 4 Aug 2024 03:47:23 Taylor R Campbell wrote: > > Date: Sat, 3 Aug 2024 09:59:21 +1000 > > From: Nathanial Sloss <nathanialsl...@yahoo.com.au>
> Why do we end up in recursion here? Why does the driver need to > decide when to break this recursion? If drivers sometimes _skip_ > scsipi_run_queue, won't they sometimes wind up with queued xfers that > never get issued to the hardware? > > It seems to me that either > > (a) scsipi_run_queue -> scsipi_adapter_request -> driver adapt_request > ought to arrange for scsipi_done to be invoked asynchronously, and > it's a bug for it to call scsipi_done synchronously; or > > (b) if it's really important for driver adapt_request to synchronously > call scsipi_done, then that should _never_ synchronously re-enter > scsipi_run_queue but instead should _always_ defer to an > asynchronous trigger like a softint or something; or > > (c) maybe, if it's really important for _both_ synchronous calls to > happen, then when scsipi_done/scsipi_run_queue detects it is being > recursively called, the inner call should detect this and do > nothing so it can unwind to the outer call to continue processing > the queue. > Would this patch suffice, it kicks the queue after calling scsipi_done_once. I've been running this without issue for a day with heavy (heavy for dse(4)) network traffic - so far so good. Best regards, Nat
--- a/sys/dev/ic/ncr5380sbc.c Fri Aug 02 11:45:52 2024 +0000 +++ b/sys/dev/ic/ncr5380sbc.c Wed Aug 07 09:12:20 2024 +1000 @@ -802,10 +808,13 @@ finish: sc->sc_ncmds--; /* Tell common SCSI code it is done. */ - scsipi_done(xs); + scsipi_done_once(xs); sc->sc_state = NCR_IDLE; /* Now ncr5380_sched() may be called again. */ + + /* Check the queue. */ + scsipi_channel_thaw(&sc->sc_channel, 0); }