Hello,
We're observing an abrupt performance drop from 148 to 107 Mpps @ 64B packets
apparently caused by any rule that jumps out of ingress group 0
when using HWS (async API) instead of SWS (sync API).
Is it some known issue or temporary limitation?
NIC: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16;
FW: 22.40.1000
OFED: MLNX_OFED_LINUX-24.01-0.3.3.1
DPDK: v24.03-23-g76cef1af8b
TG is custom, traffic is Ethernet / VLAN / IPv4 / TCP SYN @ 148 Mpps.
Examples below do only the jump and miss all packets in group 1,
but the same is observed when dropping all the packets in group 1.
Software steering:
/root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=1 -- -i --rxq=1 --txq=1
flow create 0 ingress group 0 pattern end actions jump group 1 / end
Neohost (from OFED 5.7):
||===========================================================================
||| Packet Rate ||
||---------------------------------------------------------------------------
||| RX Packet Rate || 148,813,590 [Packets/Seconds] ||
||| TX Packet Rate || 0 [Packets/Seconds] ||
||===========================================================================
||| eSwitch ||
||---------------------------------------------------------------------------
||| RX Hops Per Packet || 3.075 [Hops/Packet] ||
||| RX Optimal Hops Per Packet Per Pipe || 1.5375 [Hops/Packet] ||
||| RX Optimal Packet Rate Bottleneck || 279.6695 [MPPS] ||
||| RX Packet Rate Bottleneck || 262.2723 [MPPS] ||
(Full Neohost output is attached.)
Hardware steering:
/root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=2 -- -i --rxq=1 --txq=1
port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump
group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1
actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0
postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0
Neohost:
||===========================================================================
||| Packet Rate ||
||---------------------------------------------------------------------------
||| RX Packet Rate || 107,498,115 [Packets/Seconds] ||
||| TX Packet Rate || 0 [Packets/Seconds] ||
||===========================================================================
||| eSwitch ||
||---------------------------------------------------------------------------
||| RX Hops Per Packet || 4.5503 [Hops/Packet] ||
||| RX Optimal Hops Per Packet Per Pipe || 2.2751 [Hops/Packet] ||
||| RX Optimal Packet Rate Bottleneck || 188.9994 [MPPS] ||
||| RX Packet Rate Bottleneck || 182.5796 [MPPS] ||
AFAIU, performance is not constrained by the complexity of the rules.
mlnx_perf -i enp33s0f0np0 -t 1:
rx_steer_missed_packets: 108,743,272
rx_vport_unicast_packets: 108,743,424
rx_vport_unicast_bytes: 6,959,579,136 Bps = 55,676.63 Mbps
tx_packets_phy: 7,537
rx_packets_phy: 150,538,251
tx_bytes_phy: 482,368 Bps = 3.85 Mbps
rx_bytes_phy: 9,634,448,128 Bps = 77,075.58 Mbps
tx_mac_control_phy: 7,536
tx_pause_ctrl_phy: 7,536
rx_discards_phy: 41,794,740
rx_64_bytes_phy: 150,538,352 Bps = 1,204.30 Mbps
rx_buffer_passed_thres_phy: 202
rx_prio0_bytes: 9,634,520,256 Bps = 77,076.16 Mbps
rx_prio0_packets: 108,744,322
rx_prio0_discards: 41,795,050
tx_global_pause: 7,537
tx_global_pause_duration: 1,011,592
"rx_discards_phy" is described as follows [1]:
The number of received packets dropped due to lack of buffers on a
physical port. If this counter is increasing, it implies that the adapter
is congested and cannot absorb the traffic coming from the network.
However, the adapter certainly *is* able to process 148 Mpps,
since it does so with SWS and it can deliver this much to SW (with MPRQ).
[1]:
https://www.kernel.org/doc/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
=============================================================================================================================================================
|| Counter Name || Counter Value
||| Performance Analysis || Analysis Value [Units] ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit || 0
||| Bandwidth ||
|| Level 0 MTT Cache Miss || 0
||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit || 0
||| RX BandWidth || 55.039 [Gb/s] ||
|| Level 1 MTT Cache Miss || 0
||| TX BandWidth || 0 [Gb/s] ||
|| Level 0 MPT Cache Hit || 0
||===========================================================================
|| Level 0 MPT Cache Miss || 0
||| Memory ||
|| Level 1 MPT Cache Hit || 0
||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss || 0
||| RX Indirect Memory Keys Rate || 0 [Keys/Packet] ||
|| Indirect Memory Key Access || 0
||===========================================================================
|| ICM Cache Miss || 38
||| PCIe Bandwidth ||
|| PCIe Internal Back Pressure || 0
||---------------------------------------------------------------------------
|| Outbound Stalled Reads || 0
||| PCIe Inbound Available BW || 251.3851 [Gb/s] ||
|| Outbound Stalled Writes || 0
||| PCIe Inbound BW Utilization || 0.0027 [%] ||
|| PCIe Read Stalled due to No Read Engines || 0
||| PCIe Inbound Used BW || 0.0069 [Gb/s] ||
|| PCIe Read Stalled due to No Completion Buffer || 0
||| PCIe Outbound Available BW || 251.3851 [Gb/s] ||
|| PCIe Read Stalled due to Ordering || 0
||| PCIe Outbound BW Utilization || 0.0025 [%] ||
|| RX IPsec Packets || 0
||| PCIe Outbound Used BW || 0.0062 [Gb/s] ||
|| Back Pressure from RXD to PSA || 0
||===========================================================================
|| Chip Frequency || 429.9925
||| PCIe Latency ||
|| Back Pressure from RXB Buffer to RXB FIFO || 0
||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT || 0
||| PCIe Avg Latency || 523 [NS] ||
|| Back Pressure from PSA switch to RXB || 0
||| PCIe Max Latency || 548 [NS] ||
|| Back Pressure from PSA switch to RXD || 0
||| PCIe Min Latency || 511 [NS] ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 107,498,115
||===========================================================================
|| Receive WQE Cache Hit || 0
||| PCIe Unit Internal Latency ||
|| Receive WQE Cache Miss || 0
||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter || 0
||| PCIe Internal Avg Latency || 4 [NS] ||
|| RX Steering Packets || 107,498,116
||| PCIe Internal Max Latency || 4 [NS] ||
|| RX Steering Packets Fast Path || 0
||| PCIe Internal Min Latency || 4 [NS] ||
|| EQ All State Machines Busy || 0
||===========================================================================
|| CQ All State Machines Busy || 0
||| Packet Rate ||
|| MSI-X All State Machines Busy || 0
||---------------------------------------------------------------------------
|| CQE Compression Sessions || 0
||| RX Packet Rate || 107,498,115 [Packets/Seconds] ||
|| Compressed CQEs || 0
||| TX Packet Rate || 0 [Packets/Seconds] ||
|| Compression Session Closed due to EQE || 0
||===========================================================================
|| Compression Session Closed due to Timeout || 0
||| eSwitch ||
|| Compression Session Closed due to Mismatch || 0
||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle || 0
||| RX Hops Per Packet || 4.5503 [Hops/Packet] ||
|| Compression Session Closed due to S2CQE || 0
||| RX Optimal Hops Per Packet Per Pipe || 2.2751 [Hops/Packet] ||
|| Compressed CQE Strides || 0
||| RX Optimal Packet Rate Bottleneck || 188.9994 [MPPS] ||
|| Compression Session Closed due to LRO || 0
||| RX Packet Rate Bottleneck || 182.5796 [MPPS] ||
|| TX Descriptor Handling Stopped due to Limited State || 0
||| TX Hops Per Packet || 0 [Hops/Packet] ||
|| TX Descriptor Handling Stopped due to Limited VL || 0
||| TX Optimal Hops Per Packet Per Pipe || 0 [Hops/Packet] ||
|| TX Descriptor Handling Stopped due to De-schedule || 0
||| TX Optimal Packet Rate Bottleneck || 0 [MPPS] ||
|| TX Descriptor Handling Stopped due to Work Done || 0
||| TX Packet Rate Bottleneck || 0 [MPPS] ||
|| TX Descriptor Handling Stopped due to E2E Credits || 0
||===========================================================================
|| Line Transmitted Port 1 || 0
||
|| Line Transmitted Port 2 || 0
||
|| Line Transmitted Loop Back || 0
||
|| RX_PSA0 Steering Pipe 0 || 253,168,409
||
|| RX_PSA0 Steering Pipe 1 || 235,977,321
||
|| RX_PSA0 Steering Cache Access Pipe 0 || 224,400,319
||
|| RX_PSA0 Steering Cache Access Pipe 1 || 208,687,547
||
|| RX_PSA0 Steering Cache Hit Pipe 0 || 224,400,319
||
|| RX_PSA0 Steering Cache Hit Pipe 1 || 208,687,547
||
|| RX_PSA0 Steering Cache Miss Pipe 0 || 0
||
|| RX_PSA0 Steering Cache Miss Pipe 1 || 0
||
|| RX_PSA1 Steering Pipe 0 || 253,168,409
||
|| RX_PSA1 Steering Pipe 1 || 235,977,321
||
|| RX_PSA1 Steering Cache Access Pipe 0 || 224,400,319
||
|| RX_PSA1 Steering Cache Access Pipe 1 || 208,687,547
||
|| RX_PSA1 Steering Cache Hit Pipe 0 || 224,400,319
||
|| RX_PSA1 Steering Cache Hit Pipe 1 || 208,687,547
||
|| RX_PSA1 Steering Cache Miss Pipe 0 || 0
||
|| RX_PSA1 Steering Cache Miss Pipe 1 || 0
||
|| TX_PSA0 Steering Pipe 0 || 0
||
|| TX_PSA0 Steering Pipe 1 || 0
||
|| TX_PSA0 Steering Cache Access Pipe 0 || 0
||
|| TX_PSA0 Steering Cache Access Pipe 1 || 0
||
|| TX_PSA0 Steering Cache Hit Pipe 0 || 0
||
|| TX_PSA0 Steering Cache Hit Pipe 1 || 0
||
|| TX_PSA0 Steering Cache Miss Pipe 0 || 0
||
|| TX_PSA0 Steering Cache Miss Pipe 1 || 0
||
|| TX_PSA1 Steering Pipe 0 || 0
||
|| TX_PSA1 Steering Pipe 1 || 0
||
|| TX_PSA1 Steering Cache Access Pipe 0 || 0
||
|| TX_PSA1 Steering Cache Access Pipe 1 || 0
||
|| TX_PSA1 Steering Cache Hit Pipe 0 || 0
||
|| TX_PSA1 Steering Cache Hit Pipe 1 || 0
||
|| TX_PSA1 Steering Cache Miss Pipe 0 || 0
||
|| TX_PSA1 Steering Cache Miss Pipe 1 || 0
||
==================================================================================
=============================================================================================================================================================
|| Counter Name || Counter Value
||| Performance Analysis || Analysis Value [Units] ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit || 0
||| Bandwidth ||
|| Level 0 MTT Cache Miss || 0
||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit || 0
||| RX BandWidth || 76.1926 [Gb/s] ||
|| Level 1 MTT Cache Miss || 0
||| TX BandWidth || 0 [Gb/s] ||
|| Level 0 MPT Cache Hit || 0
||===========================================================================
|| Level 0 MPT Cache Miss || 0
||| Memory ||
|| Level 1 MPT Cache Hit || 0
||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss || 0
||| RX Indirect Memory Keys Rate || 0 [Keys/Packet] ||
|| Indirect Memory Key Access || 0
||===========================================================================
|| ICM Cache Miss || 38
||| PCIe Bandwidth ||
|| PCIe Internal Back Pressure || 0
||---------------------------------------------------------------------------
|| Outbound Stalled Reads || 0
||| PCIe Inbound Available BW || 251.385 [Gb/s] ||
|| Outbound Stalled Writes || 0
||| PCIe Inbound BW Utilization || 0.0027 [%] ||
|| PCIe Read Stalled due to No Read Engines || 0
||| PCIe Inbound Used BW || 0.0069 [Gb/s] ||
|| PCIe Read Stalled due to No Completion Buffer || 0
||| PCIe Outbound Available BW || 251.385 [Gb/s] ||
|| PCIe Read Stalled due to Ordering || 0
||| PCIe Outbound BW Utilization || 0.0025 [%] ||
|| RX IPsec Packets || 0
||| PCIe Outbound Used BW || 0.0062 [Gb/s] ||
|| Back Pressure from RXD to PSA || 0
||===========================================================================
|| Chip Frequency || 429.9919
||| PCIe Latency ||
|| Back Pressure from RXB Buffer to RXB FIFO || 0
||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT || 0
||| PCIe Avg Latency || 522 [NS] ||
|| Back Pressure from PSA switch to RXB || 0
||| PCIe Max Latency || 541 [NS] ||
|| Back Pressure from PSA switch to RXD || 0
||| PCIe Min Latency || 511 [NS] ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 148,813,590
||===========================================================================
|| Receive WQE Cache Hit || 0
||| PCIe Unit Internal Latency ||
|| Receive WQE Cache Miss || 0
||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter || 0
||| PCIe Internal Avg Latency || 4 [NS] ||
|| RX Steering Packets || 148,813,592
||| PCIe Internal Max Latency || 4 [NS] ||
|| RX Steering Packets Fast Path || 0
||| PCIe Internal Min Latency || 4 [NS] ||
|| EQ All State Machines Busy || 0
||===========================================================================
|| CQ All State Machines Busy || 0
||| Packet Rate ||
|| MSI-X All State Machines Busy || 0
||---------------------------------------------------------------------------
|| CQE Compression Sessions || 0
||| RX Packet Rate || 148,813,590 [Packets/Seconds] ||
|| Compressed CQEs || 0
||| TX Packet Rate || 0 [Packets/Seconds] ||
|| Compression Session Closed due to EQE || 0
||===========================================================================
|| Compression Session Closed due to Timeout || 0
||| eSwitch ||
|| Compression Session Closed due to Mismatch || 0
||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle || 0
||| RX Hops Per Packet || 3.075 [Hops/Packet] ||
|| Compression Session Closed due to S2CQE || 0
||| RX Optimal Hops Per Packet Per Pipe || 1.5375 [Hops/Packet] ||
|| Compressed CQE Strides || 0
||| RX Optimal Packet Rate Bottleneck || 279.6695 [MPPS] ||
|| Compression Session Closed due to LRO || 0
||| RX Packet Rate Bottleneck || 262.2723 [MPPS] ||
|| TX Descriptor Handling Stopped due to Limited State || 0
||| TX Hops Per Packet || 0 [Hops/Packet] ||
|| TX Descriptor Handling Stopped due to Limited VL || 0
||| TX Optimal Hops Per Packet Per Pipe || 0 [Hops/Packet] ||
|| TX Descriptor Handling Stopped due to De-schedule || 0
||| TX Optimal Packet Rate Bottleneck || 0 [MPPS] ||
|| TX Descriptor Handling Stopped due to Work Done || 0
||| TX Packet Rate Bottleneck || 0 [MPPS] ||
|| TX Descriptor Handling Stopped due to E2E Credits || 0
||===========================================================================
|| Line Transmitted Port 1 || 0
||
|| Line Transmitted Port 2 || 0
||
|| Line Transmitted Loop Back || 0
||
|| RX_PSA0 Steering Pipe 0 || 243,977,877
||
|| RX_PSA0 Steering Pipe 1 || 213,617,683
||
|| RX_PSA0 Steering Cache Access Pipe 0 || 203,526,803
||
|| RX_PSA0 Steering Cache Access Pipe 1 || 177,919,444
||
|| RX_PSA0 Steering Cache Hit Pipe 0 || 202,742,093
||
|| RX_PSA0 Steering Cache Hit Pipe 1 || 177,158,314
||
|| RX_PSA0 Steering Cache Miss Pipe 0 || 161,513
||
|| RX_PSA0 Steering Cache Miss Pipe 1 || 158,843
||
|| RX_PSA1 Steering Pipe 0 || 243,977,877
||
|| RX_PSA1 Steering Pipe 1 || 213,617,683
||
|| RX_PSA1 Steering Cache Access Pipe 0 || 203,526,803
||
|| RX_PSA1 Steering Cache Access Pipe 1 || 177,919,444
||
|| RX_PSA1 Steering Cache Hit Pipe 0 || 202,742,093
||
|| RX_PSA1 Steering Cache Hit Pipe 1 || 177,158,314
||
|| RX_PSA1 Steering Cache Miss Pipe 0 || 161,513
||
|| RX_PSA1 Steering Cache Miss Pipe 1 || 0
||
|| TX_PSA0 Steering Pipe 0 || 0
||
|| TX_PSA0 Steering Pipe 1 || 0
||
|| TX_PSA0 Steering Cache Access Pipe 0 || 0
||
|| TX_PSA0 Steering Cache Access Pipe 1 || 0
||
|| TX_PSA0 Steering Cache Hit Pipe 0 || 0
||
|| TX_PSA0 Steering Cache Hit Pipe 1 || 0
||
|| TX_PSA0 Steering Cache Miss Pipe 0 || 0
||
|| TX_PSA0 Steering Cache Miss Pipe 1 || 0
||
|| TX_PSA1 Steering Pipe 0 || 0
||
|| TX_PSA1 Steering Pipe 1 || 0
||
|| TX_PSA1 Steering Cache Access Pipe 0 || 0
||
|| TX_PSA1 Steering Cache Access Pipe 1 || 0
||
|| TX_PSA1 Steering Cache Hit Pipe 0 || 0
||
|| TX_PSA1 Steering Cache Hit Pipe 1 || 0
||
|| TX_PSA1 Steering Cache Miss Pipe 0 || 0
||
|| TX_PSA1 Steering Cache Miss Pipe 1 || 0
||
==================================================================================