** Description changed: - Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13 - to the Kernel 4.15.0-24-generic. + [Impact] + The i40e driver can get stalled on tx timeouts. This can happen when + DCB is enabled on the connected switch. This can also trigger a + second situation when a tx timeout occurs before the recovery of + a previous timeout has completed due to CPU load, which is not + handled correctly. This leads to networking delays, drops and + application timeouts and hangs. Note that the first tx timeout + cause is just one of the ways to end up in the second situation. + + This issue was seen on a heavily loaded Kafka broker node running + the 4.15.0-38-generic kernel on Xenial. + + Symptoms include messages in the kernel log of the form: + + --- + [4733544.982116] i40e 0000:18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0 + [4733544.982119] i40e 0000:18:00.1 eno2: tx_timeout recovery level 1, hung_queue 6 + ---- + + With the test kernel provided in this LP bug which had these + two commits compiled in, the problem has not been seen again, + and has been running successfully for several months: + + "i40e: Fix for Tx timeouts when interface is brought up if + DCB is enabled" + Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee + + "i40e: prevent overlapping tx_timeout recover" + Commit: d5585b7b6846a6d0f9517afe57be3843150719da + + * The first commit is already in Disco, Cosmic + * The second commit is already in Disco + * Bionic needs both patches and Cosmic needs the second + + [Test Case] + * We are considering the case of both issues above occurring. + * Seen by reporter on a Kafka broker node with heavy traffic. + * Not easy to reproduce as it requires something like the + following example environment and heavy load: + + Kernel: 4.15.0-38-generic + Network driver: i40e + version: 2.1.14-k + firmware-version: 6.00 0x800034e6 18.3.6 + NIC: Intel 40Gb XL710 + DCB enabled + + + [Regression Potential] + Low, as the first only impacts i40e DCB environment, and has + been running for several months in production-load testing + successfully. + + + --- Original Description + Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13 to the Kernel 4.15.0-24-generic. On a "Dell PowerEdge R330" server with a network adapter "Intel Ethernet Converged Network Adapter X710-DA2" (driver i40e) the network card no longer works and permanently displays these three lines : - [ 98.012098] i40e 0000:01:00.0 enp1s0f0: tx_timeout: VSI_seid: 388, Q 8, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1 [ 98.012119] i40e 0000:01:00.0 enp1s0f0: tx_timeout recovery level 11, hung_queue 8 [ 98.012125] i40e 0000:01:00.0 enp1s0f0: tx_timeout recovery unsuccessful
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1779756 Title: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs