On Fri, May 08, 2020 at 04:36:28PM +0200, Mark Kettenis wrote:
> > Date: Fri, 8 May 2020 15:39:26 +0200
> > From: Jan Klemkow <j.klem...@wemelug.de>
> > On Mon, Apr 27, 2020 at 10:42:52AM +0200, Martin Pieuchot wrote:
> > > So the current logic assumes that as long as the ring contains descriptors
> > > em_rxrefill() failures aren't fatal and it is not necessary to schedule a
> > > refill.  Why?
> > 
> > I'm not sure.  I just figured out, what I described below.
> > 
> > > Unless we understand this it is difficult to reason about a fix.  Have
> > > you looked at other drivers, is it a common pattern?
> > 
> > Drivers like iavf(4) and ixl(4) just set the timeout when the alive
> > (if_rxr_inuse returns) counter is 0.  Other drivers like ix(4) and
> > bnx(4) call the timeout, when refill was not possible.
> > 
> > I adjust the patch to the behavior of ix(4).  (see below).
> > 
> > > > In my case, the function em_rxeof is unable to pull the last one or two
> > > > mbufs from the receive ring, because the descriptor status hasn't set
> > > > the "descriptor done" flag (E1000_RXD_STAT_DD).  Thus, the ring has one
> > > > or two remaining mbufs, and the timeout to put new ones in is never set.
> > > > When new mbufs are available later, the em driver stays in that
> > > > situation.
> > > 
> > > Are you saying there's always some descriptor on the ring?  Did you
> > > investigate why?
> > 
> > Yes, with printf-debugging:  I see that we return from em_rxeof()
> > because the status bit E1000_RXD_STAT_DD is not set.  And systat(8)
> > shows one or two remaining mbufs in the ALIVE column.
> > 
> > > > The diff always sets the timeout to put new mbufs in the receive ring,
> > > > when it has lesser mbufs then the "low water mark".
> > > 
> > > This changes the logic explained above.  With it the fatal condition
> > > changes from 0 to `lwm'.  Another possible fix would be to always
> > > schedule a timeout as soon as em_rxrefill() fails.
> > 
> > I adjusted the patch to this behavior, as I mentioned above.
> > 
> > > I'm not advocating for a particular change, I'm suggesting we understand
> > > the current logic and its implications.
> > 
> > The logic looks to me like: The NIC should first fill the last mbuf
> > (descriptor), then we set a timeout to refill new mbufs (descriptors).
> > It seems to me, that one or two mbufs maybe some kind of small to
> > operate correctly?!
> > 
> > But, as far as I understand the documentation of Intel, It should be
> > find to deal with such a low amount of mbufs.  I couldn't find any text
> > that said, the firmware needs a minimum amount of DMA memory.
> 
> Sorry, I was going to reply earlier to this mail thread.
> 
> As far as I know, OpenBSD is the only OS that tries to run with
> partially filled rings.  When we started doing this we quickly found
> out that there was hardware out there that didn't work with only a
> single descriptor on the Rx ring.  This is why we introduced the low
> water mark.

Should I make similar diffs for drivers that check for 0 descriptors as
well?

> I'm not certain if we ever found this documented unambiguously the the
> hardware documentation available to us.  But there certainly are hints
> that indicate that descriptors are read a full cache-line at a time.
> And if the descrioptor size is 16 bytes, you need 4 descriptors to
> fill a 64-byte cache line.

Good to know.

> So what you were proposing in your first diff makes sense.  It maight
> make sense to introduce a function to check the limit though, perhaps
> something like if_rxr_needfill() that returns true if the number of
> descriptors on the ring is below the low water mark.

What do you think of this diff, OK?

Thanks,
Jan

Index: share/man/man9/if_rxr_init.9
===================================================================
RCS file: /cvs/src/share/man/man9/if_rxr_init.9,v
retrieving revision 1.7
diff -u -p -r1.7 if_rxr_init.9
--- share/man/man9/if_rxr_init.9        17 Nov 2017 03:51:32 -0000      1.7
+++ share/man/man9/if_rxr_init.9        9 May 2020 06:40:46 -0000
@@ -35,6 +35,8 @@
 .Fn if_rxr_put "struct if_rxring *rxr" "unsigned int n"
 .Ft void
 .Fn if_rxr_livelocked "struct if_rxring *rxr"
+.Ft int
+.Fn if_rxr_needrefill "struct if_rxring *rxr"
 .Ft unsigned int
 .Fn if_rxr_inuse "struct if_rxring *rxr"
 .Ft unsigned int
@@ -88,6 +90,10 @@ receive descriptor slots to the ring.
 .Fn if_rxr_livelocked
 can signal that that receive ring is generating too much load.
 .Pp
+.Fn if_rxr_needrefill
+signals that that receive ring runs below the low watermark of descriptors and
+the chip may not work properly.
+.Pp
 .Fn if_rxr_inuse
 can be used to determine how many descriptor slots have been allocated
 on the ring.
@@ -139,6 +145,7 @@ to fill them again.
 .Fn if_rxr_get ,
 .Fn if_rxr_put ,
 .Fn if_rxr_livelocked ,
+.Fn if_rxr_needrefill ,
 .Fn if_rxr_inuse ,
 and
 .Fn if_rxr_cwm
@@ -159,6 +166,12 @@ returns the number of receive descriptor
 The number of descriptors may be less than the
 .Fa max
 requested.
+.Pp
+.Fn if_rxr_needrefill
+returns 1 if the number of receive descriptor slots currently in use on the
+ring is below the value of
+.Fa lwm .
+Otherwise, zero is returned.
 .Pp
 .Fn if_rxr_inuse
 returns the number of receive descriptor slots currently in use on the ring.
Index: sys/dev/pci/if_em.c
===================================================================
RCS file: /cvs/src/sys/dev/pci/if_em.c,v
retrieving revision 1.351
diff -u -p -r1.351 if_em.c
--- sys/dev/pci/if_em.c 26 Apr 2020 20:49:56 -0000      1.351
+++ sys/dev/pci/if_em.c 9 May 2020 09:45:07 -0000
@@ -2833,7 +2833,7 @@ em_rxrefill(void *arg)
 
        if (em_rxfill(que))
                E1000_WRITE_REG(&sc->hw, RDT(que->me), que->rx.sc_rx_desc_head);
-       else if (if_rxr_inuse(&que->rx.sc_rx_ring) == 0)
+       else if (if_rxr_needrefill(&que->rx.sc_rx_ring))
                timeout_add(&que->rx_refill, 1);
 }
 
Index: sys/net/if_var.h
===================================================================
RCS file: /cvs/src/sys/net/if_var.h,v
retrieving revision 1.104
diff -u -p -r1.104 if_var.h
--- sys/net/if_var.h    12 Apr 2020 07:02:43 -0000      1.104
+++ sys/net/if_var.h    9 May 2020 09:29:25 -0000
@@ -387,6 +387,7 @@ void        if_rxr_init(struct if_rxring *, u_i
 u_int  if_rxr_get(struct if_rxring *, u_int);
 
 #define if_rxr_put(_r, _c)     do { (_r)->rxr_alive -= (_c); } while (0)
+#define if_rxr_needrefill(_r)  ((_r)->rxr_alive < (_r)->rxr_lwm)
 #define if_rxr_inuse(_r)       ((_r)->rxr_alive)
 #define if_rxr_cwm(_r)         ((_r)->rxr_cwm)
 

Reply via email to