** Description changed:

- For the customer OpenStack deployment we deploy infra nodes on Dell R630
- servers. The servers have onboard Broadcom's NetXtreme II BCM57800 NIC
- (quad port: 2x1G ports, 2x10G ports). For each port in UP state, we
- observe 100% CPU load. So in total, we observe 4 CPUs with 100% load.
+ [Impact]
  
- perf report shows function bnx2x_ptp_task taking up much of the CPUs
- time: https://pastebin.canonical.com/p/kfrpd6Pwh5/
+ * The PTP feature in bnx2x driver is implemented in a way that if the
+ NIC firmware takes some time to perform the timestamping - which is
+ observed as a bad register read in bnx2x_ptp_task() - then the ptp
+ worker function will reschedule itself indefinitely until the value read
+ from the register is meaningful. With that behavior, if an userspace
+ tool request a bad configured RX filter to bnx2x (or if NIC firmware has
+ any other issue in timestamping), the function bnx2x_ptp_task() will be
+ rescheduled forever and cause a unbound resource consumption. This
+ manifests as a kworker thread consuming 100% of CPU.
  
- Also, /var/log/syslog contains the following outputs every few seconds:
  
- [1738143.581721] bnx2x: [bnx2x_start_xmit:3855(eno4)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
- [1738176.727642] bnx2x: [bnx2x_start_xmit:3855(eno1)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
- [1738207.988310] bnx2x: [bnx2x_start_xmit:3855(eno3)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
- [1738240.227333] bnx2x: [bnx2x_start_xmit:3855(eno2)]The device supports only 
a single outstanding packet to timestamp, this packet will not be timestamped
+ * The dmesg log will show the following message regarding other packets being 
skipped on timestamp routine due to a packet getting stuck in the timestamping 
"pipeline":
  
- So, the problem seems to be in a "timestampped" TX packet; the driver
- for some reason (to be yet understood) get an unexpected value from a
- register and then, it that same function, reschedule itself to try again
- this register read, read gets a bad value again, and so on infinitely.
+ "bnx2x: [bnx2x_start_xmit:3862(eno4)]The device supports only a single
+ outstanding packet to timestamp, this packet will not be timestamped"
  
- This is showing in the system as the 100% CPU usage kthreads; the
- message "The device supports only a single outstanding packet to
- timestamp, this packet will not be timestamped" happens because the
- driver can only timestamp a single TX packet at a time, and given it's
- stuck trying, it cannot accept another packet in this "queue".
+ Also, by using ftrace user can notice that function bnx2x_ptp_task() is
+ being called a lot, and by enabling bnx2x PTP debugging log (ethtool -s
+ <iface> msglvl 16777216) it's possible to observe the following message
+ flooding the kernel log:
  
- The infinite loop appears to be:
+ "bnx2x: [bnx2x_ptp_task:15242(eno4)]There is no valid Tx timestamp yet"
  
- static void bnx2x_ptp_task(struct work_struct *work)
- {
- struct bnx2x *bp = container_of(work, struct bnx2x, ptp_task);
- int port = BP_PORT(bp);
- u32 val_seq;
- u64 timestamp, ns;
- struct skb_shared_hwtstamps shhwtstamps;
  
- /* Read Tx timestamp registers */
- val_seq = REG_RD(bp, port ? NIG_REG_P1_TLLH_PTP_BUF_SEQID :
- NIG_REG_P0_TLLH_PTP_BUF_SEQID);
- if (val_seq & 0x10000) {
- [...]
- } else {
- DP(BNX2X_MSG_PTP, "There is no valid Tx timestamp yet\n");
- /* Reschedule to keep checking for a valid timestamp value */
- schedule_work(&bp->ptp_task);
- }
+ * The  patch proposed in this SRU request is accepted upstream and is 
available currently (2019-07-03) in David Miller's linux-net tree:
+ git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=3c91f25c2f72
+ Besides fixing the issue, it also adds an ethtool statistics for accounting 
the ptp errors and reduces message flooding in case of errors.
  
- It appears that val_seq & 0x10000 is never true, so the task constantly
- reschedules itself immediately. Instrumenting the function shows that it
- is being called in excess of 100,000 times per second. The REG_RD call
- does appear to be expensive (as it's a register read from the device)
- and shows high in the perf report, but that by itself doesn't appear to
- be the root cause (i.e., it's not hanging forever in the REG_RD).
  
- The cause appears to be that the driver is not prepared to deal with the
- PTP request never being completed by the hardware. It's unclear why it
- isn't completing, but regardless, the driver should not loop forever
- here.
+ [Test case]
+ 
+ Reproducing the problem is not difficult; we've used chrony in Bionic to
+ trigger the problem. The steps are:
+ 
+ a) Install chrony on Bionic in a system with working NIC managed by
+ bnx2x;
+ 
+ b) Edit chrony configuration and add: "hwtimestamp *" to the top of its
+ conf file;
+ 
+ c) Restart chrony service
+ 
+ Check dmesg for the "[...]single outstanding packet" message and the
+ overall CPU workload using a tool like "top" to observe a kthread
+ consuming 100% of CPU.
+ 
+ 
+ [Regression potential]
+ 
+ The patch scope is restricted to bnx2x ptp handler, and was validated by
+ the driver maintainer. If there's any possibility of regressions, we
+ believe the worst would be an issue affecting the packet timestamping,
+ not messing with the regular xmit path for the driver.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1832082

Title:
  bnx2x driver causes 100% CPU load

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1832082/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to