On Sat, 16 Sep 2017, Alexander Leidinger wrote:

Quoting Bruce Evans <b...@optusnet.com.au> (from Sat, 16 Sep 2017 13:46:37 +1000 (EST)):

It gives lesser breakage here:
- with an old PCI em, an error that occur every few makeworlds over nfs now
  hang the hardware.  It used to be recovered from afger about 10 seconds.
  This only happened once.  I then applied my old fix which ignores the
  error better so as to recover from it immediately.  This seems to work as
  before.

As I also have an em device which switches into non-working state: what's the patch you have for this? I would like to see if your change also helps my device to get back into working shape again.

X Index: em_txrx.c
X ===================================================================
X --- em_txrx.c (revision 323636)
X +++ em_txrx.c (working copy)
X @@ -640,9 +640,20 @@
X X /* Make sure bad packets are discarded */
X               if (errors & E1000_RXD_ERR_FRAME_ERR_MASK) {
X +#if 0
X                       adapter->dropped_pkts++;
X -                     /* XXX fixup if common */
X                       return (EBADMSG);
X +#else
X +                     /*
X +                      * XXX the above error handling is worse than none.
X +                      * First it it drops 'i' packets before the current
X +                      * one and doesn't count them.  Then it returns an
X +                      * error.  iflib can't really handle this error.
X +                      * It just resets, and this usually drops many more
X +                      * packets (without counting them) and much time.
X +                      */
X +                     printf("lem: frame error: ignored\n");
X +#endif
X               }
X X ri->iri_frags[i].irf_flid = 0;

This is for old em.  nfs doesn't seem to notice the dropped packet(s) after
this.

I think the comment "fixup if common" means "this error should actually
be handled if it occurs enough to matter".

I removed the increment of the dropped packet count because with the change
none are dropped directly here.  I think the error is just for this packet
but more than 1 packet might be dropped by returning in the old code, but
debugging code seem to show no more than 1 packet at a time having an error.
I think returning drops good packets after the bad one together with leaving
the state inconsistent, and it takes almost a reset to recover.

X @@ -703,8 +714,12 @@
X X /* Make sure bad packets are discarded */
X               if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
X +#if 0
X                       adapter->dropped_pkts++;
X                       return EBADMSG;
X +#else
X +                     printf("em: frame error: ignored\n");
X +#endif
X               }
X X ri->iri_frags[i].irf_flid = 0;

This is for newer em.  I haven't noticed any problems with that (except it
has 27 usec higher latency).

Bruce
_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to