Hello, We're observing an abrupt performance drop from 148 to 107 Mpps @ 64B packets apparently caused by any rule that jumps out of ingress group 0 when using HWS (async API) instead of SWS (sync API). Is it some known issue or temporary limitation?
NIC: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16; FW: 22.40.1000 OFED: MLNX_OFED_LINUX-24.01-0.3.3.1 DPDK: v24.03-23-g76cef1af8b TG is custom, traffic is Ethernet / VLAN / IPv4 / TCP SYN @ 148 Mpps. Examples below do only the jump and miss all packets in group 1, but the same is observed when dropping all the packets in group 1. Software steering: /root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=1 -- -i --rxq=1 --txq=1 flow create 0 ingress group 0 pattern end actions jump group 1 / end Neohost (from OFED 5.7): ||=========================================================================== ||| Packet Rate || ||--------------------------------------------------------------------------- ||| RX Packet Rate || 148,813,590 [Packets/Seconds] || ||| TX Packet Rate || 0 [Packets/Seconds] || ||=========================================================================== ||| eSwitch || ||--------------------------------------------------------------------------- ||| RX Hops Per Packet || 3.075 [Hops/Packet] || ||| RX Optimal Hops Per Packet Per Pipe || 1.5375 [Hops/Packet] || ||| RX Optimal Packet Rate Bottleneck || 279.6695 [MPPS] || ||| RX Packet Rate Bottleneck || 262.2723 [MPPS] || (Full Neohost output is attached.) Hardware steering: /root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=2 -- -i --rxq=1 --txq=1 port stop 0 flow configure 0 queues_number 1 queues_size 128 counters_number 16 port start 0 flow pattern_template 0 create pattern_template_id 1 ingress template end flow actions_template 0 create ingress actions_template_id 1 template jump group 1 / end mask jump group 0xFFFFFFFF / end flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 actions_template 1 rules_number 1 flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 postpone false pattern end actions jump group 1 / end flow pull 0 queue 0 Neohost: ||=========================================================================== ||| Packet Rate || ||--------------------------------------------------------------------------- ||| RX Packet Rate || 107,498,115 [Packets/Seconds] || ||| TX Packet Rate || 0 [Packets/Seconds] || ||=========================================================================== ||| eSwitch || ||--------------------------------------------------------------------------- ||| RX Hops Per Packet || 4.5503 [Hops/Packet] || ||| RX Optimal Hops Per Packet Per Pipe || 2.2751 [Hops/Packet] || ||| RX Optimal Packet Rate Bottleneck || 188.9994 [MPPS] || ||| RX Packet Rate Bottleneck || 182.5796 [MPPS] || AFAIU, performance is not constrained by the complexity of the rules. mlnx_perf -i enp33s0f0np0 -t 1: rx_steer_missed_packets: 108,743,272 rx_vport_unicast_packets: 108,743,424 rx_vport_unicast_bytes: 6,959,579,136 Bps = 55,676.63 Mbps tx_packets_phy: 7,537 rx_packets_phy: 150,538,251 tx_bytes_phy: 482,368 Bps = 3.85 Mbps rx_bytes_phy: 9,634,448,128 Bps = 77,075.58 Mbps tx_mac_control_phy: 7,536 tx_pause_ctrl_phy: 7,536 rx_discards_phy: 41,794,740 rx_64_bytes_phy: 150,538,352 Bps = 1,204.30 Mbps rx_buffer_passed_thres_phy: 202 rx_prio0_bytes: 9,634,520,256 Bps = 77,076.16 Mbps rx_prio0_packets: 108,744,322 rx_prio0_discards: 41,795,050 tx_global_pause: 7,537 tx_global_pause_duration: 1,011,592 "rx_discards_phy" is described as follows [1]: The number of received packets dropped due to lack of buffers on a physical port. If this counter is increasing, it implies that the adapter is congested and cannot absorb the traffic coming from the network. However, the adapter certainly *is* able to process 148 Mpps, since it does so with SWS and it can deliver this much to SW (with MPRQ). [1]: https://www.kernel.org/doc/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
============================================================================================================================================================= || Counter Name || Counter Value ||| Performance Analysis || Analysis Value [Units] || ============================================================================================================================================================= || Level 0 MTT Cache Hit || 0 ||| Bandwidth || || Level 0 MTT Cache Miss || 0 ||--------------------------------------------------------------------------- || Level 1 MTT Cache Hit || 0 ||| RX BandWidth || 55.039 [Gb/s] || || Level 1 MTT Cache Miss || 0 ||| TX BandWidth || 0 [Gb/s] || || Level 0 MPT Cache Hit || 0 ||=========================================================================== || Level 0 MPT Cache Miss || 0 ||| Memory || || Level 1 MPT Cache Hit || 0 ||--------------------------------------------------------------------------- || Level 1 MPT Cache Miss || 0 ||| RX Indirect Memory Keys Rate || 0 [Keys/Packet] || || Indirect Memory Key Access || 0 ||=========================================================================== || ICM Cache Miss || 38 ||| PCIe Bandwidth || || PCIe Internal Back Pressure || 0 ||--------------------------------------------------------------------------- || Outbound Stalled Reads || 0 ||| PCIe Inbound Available BW || 251.3851 [Gb/s] || || Outbound Stalled Writes || 0 ||| PCIe Inbound BW Utilization || 0.0027 [%] || || PCIe Read Stalled due to No Read Engines || 0 ||| PCIe Inbound Used BW || 0.0069 [Gb/s] || || PCIe Read Stalled due to No Completion Buffer || 0 ||| PCIe Outbound Available BW || 251.3851 [Gb/s] || || PCIe Read Stalled due to Ordering || 0 ||| PCIe Outbound BW Utilization || 0.0025 [%] || || RX IPsec Packets || 0 ||| PCIe Outbound Used BW || 0.0062 [Gb/s] || || Back Pressure from RXD to PSA || 0 ||=========================================================================== || Chip Frequency || 429.9925 ||| PCIe Latency || || Back Pressure from RXB Buffer to RXB FIFO || 0 ||--------------------------------------------------------------------------- || Back Pressure from PSA switch to RXT || 0 ||| PCIe Avg Latency || 523 [NS] || || Back Pressure from PSA switch to RXB || 0 ||| PCIe Max Latency || 548 [NS] || || Back Pressure from PSA switch to RXD || 0 ||| PCIe Min Latency || 511 [NS] || || Back Pressure from Internal MMU to RX Descriptor Handling || 107,498,115 ||=========================================================================== || Receive WQE Cache Hit || 0 ||| PCIe Unit Internal Latency || || Receive WQE Cache Miss || 0 ||--------------------------------------------------------------------------- || Back Pressure from PCIe to Packet Scatter || 0 ||| PCIe Internal Avg Latency || 4 [NS] || || RX Steering Packets || 107,498,116 ||| PCIe Internal Max Latency || 4 [NS] || || RX Steering Packets Fast Path || 0 ||| PCIe Internal Min Latency || 4 [NS] || || EQ All State Machines Busy || 0 ||=========================================================================== || CQ All State Machines Busy || 0 ||| Packet Rate || || MSI-X All State Machines Busy || 0 ||--------------------------------------------------------------------------- || CQE Compression Sessions || 0 ||| RX Packet Rate || 107,498,115 [Packets/Seconds] || || Compressed CQEs || 0 ||| TX Packet Rate || 0 [Packets/Seconds] || || Compression Session Closed due to EQE || 0 ||=========================================================================== || Compression Session Closed due to Timeout || 0 ||| eSwitch || || Compression Session Closed due to Mismatch || 0 ||--------------------------------------------------------------------------- || Compression Session Closed due to PCIe Idle || 0 ||| RX Hops Per Packet || 4.5503 [Hops/Packet] || || Compression Session Closed due to S2CQE || 0 ||| RX Optimal Hops Per Packet Per Pipe || 2.2751 [Hops/Packet] || || Compressed CQE Strides || 0 ||| RX Optimal Packet Rate Bottleneck || 188.9994 [MPPS] || || Compression Session Closed due to LRO || 0 ||| RX Packet Rate Bottleneck || 182.5796 [MPPS] || || TX Descriptor Handling Stopped due to Limited State || 0 ||| TX Hops Per Packet || 0 [Hops/Packet] || || TX Descriptor Handling Stopped due to Limited VL || 0 ||| TX Optimal Hops Per Packet Per Pipe || 0 [Hops/Packet] || || TX Descriptor Handling Stopped due to De-schedule || 0 ||| TX Optimal Packet Rate Bottleneck || 0 [MPPS] || || TX Descriptor Handling Stopped due to Work Done || 0 ||| TX Packet Rate Bottleneck || 0 [MPPS] || || TX Descriptor Handling Stopped due to E2E Credits || 0 ||=========================================================================== || Line Transmitted Port 1 || 0 || || Line Transmitted Port 2 || 0 || || Line Transmitted Loop Back || 0 || || RX_PSA0 Steering Pipe 0 || 253,168,409 || || RX_PSA0 Steering Pipe 1 || 235,977,321 || || RX_PSA0 Steering Cache Access Pipe 0 || 224,400,319 || || RX_PSA0 Steering Cache Access Pipe 1 || 208,687,547 || || RX_PSA0 Steering Cache Hit Pipe 0 || 224,400,319 || || RX_PSA0 Steering Cache Hit Pipe 1 || 208,687,547 || || RX_PSA0 Steering Cache Miss Pipe 0 || 0 || || RX_PSA0 Steering Cache Miss Pipe 1 || 0 || || RX_PSA1 Steering Pipe 0 || 253,168,409 || || RX_PSA1 Steering Pipe 1 || 235,977,321 || || RX_PSA1 Steering Cache Access Pipe 0 || 224,400,319 || || RX_PSA1 Steering Cache Access Pipe 1 || 208,687,547 || || RX_PSA1 Steering Cache Hit Pipe 0 || 224,400,319 || || RX_PSA1 Steering Cache Hit Pipe 1 || 208,687,547 || || RX_PSA1 Steering Cache Miss Pipe 0 || 0 || || RX_PSA1 Steering Cache Miss Pipe 1 || 0 || || TX_PSA0 Steering Pipe 0 || 0 || || TX_PSA0 Steering Pipe 1 || 0 || || TX_PSA0 Steering Cache Access Pipe 0 || 0 || || TX_PSA0 Steering Cache Access Pipe 1 || 0 || || TX_PSA0 Steering Cache Hit Pipe 0 || 0 || || TX_PSA0 Steering Cache Hit Pipe 1 || 0 || || TX_PSA0 Steering Cache Miss Pipe 0 || 0 || || TX_PSA0 Steering Cache Miss Pipe 1 || 0 || || TX_PSA1 Steering Pipe 0 || 0 || || TX_PSA1 Steering Pipe 1 || 0 || || TX_PSA1 Steering Cache Access Pipe 0 || 0 || || TX_PSA1 Steering Cache Access Pipe 1 || 0 || || TX_PSA1 Steering Cache Hit Pipe 0 || 0 || || TX_PSA1 Steering Cache Hit Pipe 1 || 0 || || TX_PSA1 Steering Cache Miss Pipe 0 || 0 || || TX_PSA1 Steering Cache Miss Pipe 1 || 0 || ==================================================================================
============================================================================================================================================================= || Counter Name || Counter Value ||| Performance Analysis || Analysis Value [Units] || ============================================================================================================================================================= || Level 0 MTT Cache Hit || 0 ||| Bandwidth || || Level 0 MTT Cache Miss || 0 ||--------------------------------------------------------------------------- || Level 1 MTT Cache Hit || 0 ||| RX BandWidth || 76.1926 [Gb/s] || || Level 1 MTT Cache Miss || 0 ||| TX BandWidth || 0 [Gb/s] || || Level 0 MPT Cache Hit || 0 ||=========================================================================== || Level 0 MPT Cache Miss || 0 ||| Memory || || Level 1 MPT Cache Hit || 0 ||--------------------------------------------------------------------------- || Level 1 MPT Cache Miss || 0 ||| RX Indirect Memory Keys Rate || 0 [Keys/Packet] || || Indirect Memory Key Access || 0 ||=========================================================================== || ICM Cache Miss || 38 ||| PCIe Bandwidth || || PCIe Internal Back Pressure || 0 ||--------------------------------------------------------------------------- || Outbound Stalled Reads || 0 ||| PCIe Inbound Available BW || 251.385 [Gb/s] || || Outbound Stalled Writes || 0 ||| PCIe Inbound BW Utilization || 0.0027 [%] || || PCIe Read Stalled due to No Read Engines || 0 ||| PCIe Inbound Used BW || 0.0069 [Gb/s] || || PCIe Read Stalled due to No Completion Buffer || 0 ||| PCIe Outbound Available BW || 251.385 [Gb/s] || || PCIe Read Stalled due to Ordering || 0 ||| PCIe Outbound BW Utilization || 0.0025 [%] || || RX IPsec Packets || 0 ||| PCIe Outbound Used BW || 0.0062 [Gb/s] || || Back Pressure from RXD to PSA || 0 ||=========================================================================== || Chip Frequency || 429.9919 ||| PCIe Latency || || Back Pressure from RXB Buffer to RXB FIFO || 0 ||--------------------------------------------------------------------------- || Back Pressure from PSA switch to RXT || 0 ||| PCIe Avg Latency || 522 [NS] || || Back Pressure from PSA switch to RXB || 0 ||| PCIe Max Latency || 541 [NS] || || Back Pressure from PSA switch to RXD || 0 ||| PCIe Min Latency || 511 [NS] || || Back Pressure from Internal MMU to RX Descriptor Handling || 148,813,590 ||=========================================================================== || Receive WQE Cache Hit || 0 ||| PCIe Unit Internal Latency || || Receive WQE Cache Miss || 0 ||--------------------------------------------------------------------------- || Back Pressure from PCIe to Packet Scatter || 0 ||| PCIe Internal Avg Latency || 4 [NS] || || RX Steering Packets || 148,813,592 ||| PCIe Internal Max Latency || 4 [NS] || || RX Steering Packets Fast Path || 0 ||| PCIe Internal Min Latency || 4 [NS] || || EQ All State Machines Busy || 0 ||=========================================================================== || CQ All State Machines Busy || 0 ||| Packet Rate || || MSI-X All State Machines Busy || 0 ||--------------------------------------------------------------------------- || CQE Compression Sessions || 0 ||| RX Packet Rate || 148,813,590 [Packets/Seconds] || || Compressed CQEs || 0 ||| TX Packet Rate || 0 [Packets/Seconds] || || Compression Session Closed due to EQE || 0 ||=========================================================================== || Compression Session Closed due to Timeout || 0 ||| eSwitch || || Compression Session Closed due to Mismatch || 0 ||--------------------------------------------------------------------------- || Compression Session Closed due to PCIe Idle || 0 ||| RX Hops Per Packet || 3.075 [Hops/Packet] || || Compression Session Closed due to S2CQE || 0 ||| RX Optimal Hops Per Packet Per Pipe || 1.5375 [Hops/Packet] || || Compressed CQE Strides || 0 ||| RX Optimal Packet Rate Bottleneck || 279.6695 [MPPS] || || Compression Session Closed due to LRO || 0 ||| RX Packet Rate Bottleneck || 262.2723 [MPPS] || || TX Descriptor Handling Stopped due to Limited State || 0 ||| TX Hops Per Packet || 0 [Hops/Packet] || || TX Descriptor Handling Stopped due to Limited VL || 0 ||| TX Optimal Hops Per Packet Per Pipe || 0 [Hops/Packet] || || TX Descriptor Handling Stopped due to De-schedule || 0 ||| TX Optimal Packet Rate Bottleneck || 0 [MPPS] || || TX Descriptor Handling Stopped due to Work Done || 0 ||| TX Packet Rate Bottleneck || 0 [MPPS] || || TX Descriptor Handling Stopped due to E2E Credits || 0 ||=========================================================================== || Line Transmitted Port 1 || 0 || || Line Transmitted Port 2 || 0 || || Line Transmitted Loop Back || 0 || || RX_PSA0 Steering Pipe 0 || 243,977,877 || || RX_PSA0 Steering Pipe 1 || 213,617,683 || || RX_PSA0 Steering Cache Access Pipe 0 || 203,526,803 || || RX_PSA0 Steering Cache Access Pipe 1 || 177,919,444 || || RX_PSA0 Steering Cache Hit Pipe 0 || 202,742,093 || || RX_PSA0 Steering Cache Hit Pipe 1 || 177,158,314 || || RX_PSA0 Steering Cache Miss Pipe 0 || 161,513 || || RX_PSA0 Steering Cache Miss Pipe 1 || 158,843 || || RX_PSA1 Steering Pipe 0 || 243,977,877 || || RX_PSA1 Steering Pipe 1 || 213,617,683 || || RX_PSA1 Steering Cache Access Pipe 0 || 203,526,803 || || RX_PSA1 Steering Cache Access Pipe 1 || 177,919,444 || || RX_PSA1 Steering Cache Hit Pipe 0 || 202,742,093 || || RX_PSA1 Steering Cache Hit Pipe 1 || 177,158,314 || || RX_PSA1 Steering Cache Miss Pipe 0 || 161,513 || || RX_PSA1 Steering Cache Miss Pipe 1 || 0 || || TX_PSA0 Steering Pipe 0 || 0 || || TX_PSA0 Steering Pipe 1 || 0 || || TX_PSA0 Steering Cache Access Pipe 0 || 0 || || TX_PSA0 Steering Cache Access Pipe 1 || 0 || || TX_PSA0 Steering Cache Hit Pipe 0 || 0 || || TX_PSA0 Steering Cache Hit Pipe 1 || 0 || || TX_PSA0 Steering Cache Miss Pipe 0 || 0 || || TX_PSA0 Steering Cache Miss Pipe 1 || 0 || || TX_PSA1 Steering Pipe 0 || 0 || || TX_PSA1 Steering Pipe 1 || 0 || || TX_PSA1 Steering Cache Access Pipe 0 || 0 || || TX_PSA1 Steering Cache Access Pipe 1 || 0 || || TX_PSA1 Steering Cache Hit Pipe 0 || 0 || || TX_PSA1 Steering Cache Hit Pipe 1 || 0 || || TX_PSA1 Steering Cache Miss Pipe 0 || 0 || || TX_PSA1 Steering Cache Miss Pipe 1 || 0 || ==================================================================================