Hi,

The following diff fixes a deadlock in em(4) when system runs out of
mbuf(9)s.

Tested with current on amd64 with:
em0 at pci0 dev 25 function 0 "Intel I217-LM" rev 0x05: msi, mac 0x1e phy 0xb, 
address d0:50:99:c1:67:94

When the system runs out of mbufs, the receive ring of the em-NIC also
run out of mbufs.  If the driver puts an mbuf from the NICs receive ring
with em_rxeof, it calls em_rxrefill to refill the ring.  If its not
possible to refill the ring and the receive ring is empty, it sets a
timer to retry refilling later till new mbufs are available.

In my case, the function em_rxeof is unable to pull the last one or two
mbufs from the receive ring, because the descriptor status hasn't set
the "descriptor done" flag (E1000_RXD_STAT_DD).  Thus, the ring has one
or two remaining mbufs, and the timeout to put new ones in is never set.
When new mbufs are available later, the em driver stays in that
situation.

The diff always sets the timeout to put new mbufs in the receive ring,
when it has lesser mbufs then the "low water mark".

A side effect is: when the mbufs in use (alive) are below then the "low
water mark", the "current water mark" raises up to value of the "high
water mark".  When new mbufs are available the number of mbufs in use
also raises to that value.  Because, the "current water mark" just sinks
during net livelocks.

I'm sure this path is not the best solution.  Maybe someone else has a
better idea to avoid that situation?!

Bye,
Jan

Index: dev/pci/if_em.c
===================================================================
RCS file: /cvs/src/sys/dev/pci/if_em.c,v
retrieving revision 1.349
diff -u -p -r1.349 if_em.c
--- dev/pci/if_em.c     24 Mar 2020 09:30:06 -0000      1.349
+++ dev/pci/if_em.c     21 Apr 2020 12:02:28 -0000
@@ -2809,7 +2809,8 @@ em_rxrefill(void *arg)
 
        if (em_rxfill(que))
                E1000_WRITE_REG(&sc->hw, RDT(que->me), que->rx.sc_rx_desc_head);
-       else if (if_rxr_inuse(&que->rx.sc_rx_ring) == 0)
+       else if (if_rxr_inuse(&que->rx.sc_rx_ring) <
+           if_rxr_lwm(&que->rx.sc_rx_ring))
                timeout_add(&que->rx_refill, 1);
 }
 
Index: net/if_var.h
===================================================================
RCS file: /cvs/src/sys/net/if_var.h,v
retrieving revision 1.104
diff -u -p -r1.104 if_var.h
--- net/if_var.h        12 Apr 2020 07:02:43 -0000      1.104
+++ net/if_var.h        21 Apr 2020 11:58:17 -0000
@@ -389,6 +389,7 @@ u_int       if_rxr_get(struct if_rxring *, u_i
 #define if_rxr_put(_r, _c)     do { (_r)->rxr_alive -= (_c); } while (0)
 #define if_rxr_inuse(_r)       ((_r)->rxr_alive)
 #define if_rxr_cwm(_r)         ((_r)->rxr_cwm)
+#define if_rxr_lwm(_r)         ((_r)->rxr_lwm)
 
 int    if_rxr_info_ioctl(struct if_rxrinfo *, u_int, struct if_rxring_info *);
 int    if_rxr_ioctl(struct if_rxrinfo *, const char *, u_int,

Reply via email to