** Description changed:

- Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13
- to the Kernel 4.15.0-24-generic.
+ [Impact]
+ The i40e driver can get stalled on tx timeouts. This can happen when
+ DCB is enabled on the connected switch. This can also trigger a
+ second situation when a tx timeout occurs before the recovery of
+ a previous timeout has completed due to CPU load, which is not
+ handled correctly. This leads to networking delays, drops and
+ application timeouts and hangs. Note that the first tx timeout
+ cause is just one of the ways to end up in the second situation.
+ 
+ This issue was seen on a heavily loaded Kafka broker node running
+ the 4.15.0-38-generic kernel on Xenial. 
+ 
+ Symptoms include messages in the kernel log of the form:
+ 
+ ---
+ [4733544.982116] i40e 0000:18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 
0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0
+ [4733544.982119] i40e 0000:18:00.1 eno2: tx_timeout recovery level 1, 
hung_queue 6
+ ----
+ 
+ With the test kernel provided in this LP bug which had these
+ two commits compiled in, the problem has not been seen again,
+ and has been running successfully for several months:
+ 
+ "i40e: Fix for Tx timeouts when interface is brought up if 
+  DCB is enabled"
+ Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee
+ 
+ "i40e: prevent overlapping tx_timeout recover"
+ Commit: d5585b7b6846a6d0f9517afe57be3843150719da
+ 
+ * The first commit is already in Disco, Cosmic
+ * The second commit is already in Disco
+ * Bionic needs both patches and Cosmic needs the second
+ 
+ [Test Case]
+ * We are considering the case of both issues above occurring.
+ * Seen by reporter on a Kafka broker node with heavy traffic.
+ * Not easy to reproduce as it requires something like the 
+   following example environment and heavy load:
+ 
+   Kernel: 4.15.0-38-generic
+   Network driver: i40e
+         version: 2.1.14-k
+         firmware-version: 6.00 0x800034e6 18.3.6
+   NIC: Intel 40Gb XL710 
+   DCB enabled
+ 
+ 
+ [Regression Potential]
+ Low, as the first only impacts i40e DCB environment, and has
+ been running for several months in production-load testing 
+ successfully.
+ 
+ 
+ --- Original Description
+ Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13 to 
the Kernel 4.15.0-24-generic.
  
  On a "Dell PowerEdge R330" server with a network adapter "Intel Ethernet
  Converged Network Adapter X710-DA2" (driver i40e) the network card no
  longer works and permanently displays these three lines :
  
- 
  [   98.012098] i40e 0000:01:00.0 enp1s0f0: tx_timeout: VSI_seid: 388, Q 8, 
NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
  [   98.012119] i40e 0000:01:00.0 enp1s0f0: tx_timeout recovery level 11, 
hung_queue 8
  [   98.012125] i40e 0000:01:00.0 enp1s0f0: tx_timeout recovery unsuccessful

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1779756

Title:
  Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu
  18.04)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to