Synchronized CPU stalls with 8 queues on Mellanox MLX5 NIC

Lukáš Šišmiš Wed, 10 Jan 2024 05:29:41 -0800

Hi everyone,

past few weeks I am trying to debug why independent application workershave the same access patterns to a Mellanox NIC.The application I am debugging is Suricata and the debug tool that I amusing is primarily Intel Vtune.

I am using 8 cores for packet processing, each core has an independentprocessing queue. All application cores are on the same NUMA node.Importantly, this only happens on Mellanox/NVIDIA NIC (currently MT2892Family - mlx5) and NOT on X710. Suricata is compiled with DPDK (2versions tested, replicated on both - master 1dcf69b211(https://github.com/OISF/suricata/) and version with interrupt support(commit c822f66b -https://github.com/lukashino/suricata/commits/feat-power-saving-v4/)).I've used various number of descriptors but the problem remained the same.For packet generation I use the Trex packet generator on an independentserver in ASTF mode with the command "start -f astf/http_simple.py -m6000". The traffic exchanged between the two trex interfaces is mirroredon a switch to Suricata interface. That yields roughly 4.6 Gbps oftraffic. The traffic is a simple http GET request yet the flows arealternating each iteration with an increment in an IP address. RSS thendistributes the traffic evenly across all cores. The problem occurs bothon 500 Mbps and on 20 Gbps transmit speed.

This is a flame graph from one of the runs. I wonder why CPUs havealmost synchronous no CPU/some CPU activity in the graph below. Theworker cores are denoted with "W#0..." and are in 2 groups that arealternating. CPU stalls can be especially seen in regions of high CPUactivity but it is present also with low CPU activity. Having high/lowCPU activity is not relevant here as I am only interested in thepattern of CPU stalls. It suggest for some shared resource. But evenwith a shared resource it would not be paused synchronously but randomlyblocked.I am debugging the application with interrupts enabled however the samepattern occurs when poll mode is enabled. When polling mode is active Ifiltered out mlx5 module activity from the Vtune result and was stillable to see CPU pauses ranging from 0.5 to 1 second across all cores.


DPDK 8 cores, MLX5 NIC

https://imgur.com/a/TrZ9vIy

I tried to profile Suricata in different scenarios and this pattern ofcomplete CPU stalls doesn't happen elsewhere.


e.g.

AF_PACKET 8 cores, MLX5 NIC, the CPU activity is similar across coresbut the cores never pause:


https://imgur.com/a/HIhDVyQ


DPDK 4 cores, MLX5 NIC,

https://imgur.com/a/G0JVOXa


DPDK 9 cores, MLX5 NIC

https://imgur.com/a/IdHCruj


DPDK 8 cores, X710 NIC, no CPU stalls on worker cores

https://imgur.com/a/94KLCjE

Testpmd, MLX5, 8 cores, I tried to filter out majority of RX NICfunctions and it still seems that CPUs are being continuously active.(It was running in rxonly fwd mode, with 8 queues and 8 cores) Though Iam a bit skeptical about the CPU activity as testpmd onlyreceives/discards the traffic.


https://imgur.com/a/UwHZzAr

It seems like the issue is connected with MLX5 NIC and DPDK as it workswell with AF_PACKET, lower/higher number of threads.

Does anybody have an idea why CPU stalls occurs in combination with 8cores or possibly what else I could do to mitigate/better evaluate theproblem?


Thanks in advance.

Lukas

Synchronized CPU stalls with 8 queues on Mellanox MLX5 NIC

Reply via email to