Hello,

We're observing an abrupt performance drop from 148 to 107 Mpps @ 64B packets
apparently caused by any rule that jumps out of ingress group 0
when using HWS (async API) instead of SWS (sync API).
Is it some known issue or temporary limitation?

NIC: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56; PCIe 4.0/3.0 x16;
FW: 22.40.1000
OFED: MLNX_OFED_LINUX-24.01-0.3.3.1
DPDK: v24.03-23-g76cef1af8b
TG is custom, traffic is Ethernet / VLAN / IPv4 / TCP SYN @ 148 Mpps.

Examples below do only the jump and miss all packets in group 1,
but the same is observed when dropping all the packets in group 1.

Software steering:

/root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=1 -- -i --rxq=1 --txq=1

flow create 0 ingress group 0 pattern end actions jump group 1 / end

Neohost (from OFED 5.7):

||===========================================================================
|||                               Packet Rate                               ||
||---------------------------------------------------------------------------
||| RX Packet Rate                      || 148,813,590   [Packets/Seconds]  ||
||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
||===========================================================================
|||                                 eSwitch                                 ||
||---------------------------------------------------------------------------
||| RX Hops Per Packet                  || 3.075         [Hops/Packet]      ||
||| RX Optimal Hops Per Packet Per Pipe || 1.5375        [Hops/Packet]      ||
||| RX Optimal Packet Rate Bottleneck   || 279.6695      [MPPS]             ||
||| RX Packet Rate Bottleneck           || 262.2723      [MPPS]             ||

(Full Neohost output is attached.)

Hardware steering:

/root/build/app/dpdk-testpmd -a 21:00.0,dv_flow_en=2 -- -i --rxq=1 --txq=1

port stop 0
flow configure 0 queues_number 1 queues_size 128 counters_number 16
port start 0
flow pattern_template 0 create pattern_template_id 1 ingress template end
flow actions_template 0 create ingress actions_template_id 1 template jump 
group 1 / end mask jump group 0xFFFFFFFF / end
flow template_table 0 create ingress group 0 table_id 1 pattern_template 1 
actions_template 1 rules_number 1
flow queue 0 create 0 template_table 1 pattern_template 0 actions_template 0 
postpone false pattern end actions jump group 1 / end
flow pull 0 queue 0

Neohost:

||===========================================================================
|||                               Packet Rate                               ||
||---------------------------------------------------------------------------
||| RX Packet Rate                      || 107,498,115   [Packets/Seconds]  ||
||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
||===========================================================================
|||                                 eSwitch                                 ||
||---------------------------------------------------------------------------
||| RX Hops Per Packet                  || 4.5503        [Hops/Packet]      ||
||| RX Optimal Hops Per Packet Per Pipe || 2.2751        [Hops/Packet]      ||
||| RX Optimal Packet Rate Bottleneck   || 188.9994      [MPPS]             ||
||| RX Packet Rate Bottleneck           || 182.5796      [MPPS]             ||

AFAIU, performance is not constrained by the complexity of the rules.

mlnx_perf -i enp33s0f0np0 -t 1:

       rx_steer_missed_packets: 108,743,272
      rx_vport_unicast_packets: 108,743,424
        rx_vport_unicast_bytes: 6,959,579,136 Bps    = 55,676.63 Mbps      
                tx_packets_phy: 7,537
                rx_packets_phy: 150,538,251
                  tx_bytes_phy: 482,368 Bps          = 3.85 Mbps           
                  rx_bytes_phy: 9,634,448,128 Bps    = 77,075.58 Mbps      
            tx_mac_control_phy: 7,536
             tx_pause_ctrl_phy: 7,536
               rx_discards_phy: 41,794,740
               rx_64_bytes_phy: 150,538,352 Bps      = 1,204.30 Mbps       
    rx_buffer_passed_thres_phy: 202
                rx_prio0_bytes: 9,634,520,256 Bps    = 77,076.16 Mbps      
              rx_prio0_packets: 108,744,322
             rx_prio0_discards: 41,795,050
               tx_global_pause: 7,537
      tx_global_pause_duration: 1,011,592

"rx_discards_phy" is described as follows [1]:

    The number of received packets dropped due to lack of buffers on a
    physical port. If this counter is increasing, it implies that the adapter
    is congested and cannot absorb the traffic coming from the network.

However, the adapter certainly *is* able to process 148 Mpps,
since it does so with SWS and it can deliver this much to SW (with MPRQ).

[1]: 
https://www.kernel.org/doc/Documentation/networking/device_drivers/ethernet/mellanox/mlx5/counters.rst
=============================================================================================================================================================
|| Counter Name                                              || Counter Value   
||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 0               
|||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 0               
||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               
||| RX BandWidth                        || 55.039        [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               
||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 0               
||===========================================================================
|| Level 0 MPT Cache Miss                                    || 0               
|||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               
||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               
||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               
||===========================================================================
|| ICM Cache Miss                                            || 38              
|||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 0               
||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               
||| PCIe Inbound Available BW           || 251.3851      [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               
||| PCIe Inbound BW Utilization         || 0.0027        [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               
||| PCIe Inbound Used BW                || 0.0069        [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               
||| PCIe Outbound Available BW          || 251.3851      [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               
||| PCIe Outbound BW Utilization        || 0.0025        [%]                ||
|| RX IPsec Packets                                          || 0               
||| PCIe Outbound Used BW               || 0.0062        [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               
||===========================================================================
|| Chip Frequency                                            || 429.9925        
|||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               
||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               
||| PCIe Avg Latency                    || 523           [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               
||| PCIe Max Latency                    || 548           [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               
||| PCIe Min Latency                    || 511           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 107,498,115     
||===========================================================================
|| Receive WQE Cache Hit                                     || 0               
|||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 0               
||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 0               
||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 107,498,116     
||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               
||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               
||===========================================================================
|| CQ All State Machines Busy                                || 0               
|||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               
||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 0               
||| RX Packet Rate                      || 107,498,115   [Packets/Seconds]  ||
|| Compressed CQEs                                           || 0               
||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               
||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               
|||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 0               
||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               
||| RX Hops Per Packet                  || 4.5503        [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               
||| RX Optimal Hops Per Packet Per Pipe || 2.2751        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               
||| RX Optimal Packet Rate Bottleneck   || 188.9994      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               
||| RX Packet Rate Bottleneck           || 182.5796      [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               
||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               
||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               
||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               
||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               
||===========================================================================
|| Line Transmitted Port 1                                   || 0               
||
|| Line Transmitted Port 2                                   || 0               
||
|| Line Transmitted Loop Back                                || 0               
||
|| RX_PSA0 Steering Pipe 0                                   || 253,168,409     
||
|| RX_PSA0 Steering Pipe 1                                   || 235,977,321     
||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 224,400,319     
||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 208,687,547     
||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 224,400,319     
||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 208,687,547     
||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 0               
||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 0               
||
|| RX_PSA1 Steering Pipe 0                                   || 253,168,409     
||
|| RX_PSA1 Steering Pipe 1                                   || 235,977,321     
||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 224,400,319     
||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 208,687,547     
||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 224,400,319     
||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 208,687,547     
||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 0               
||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               
||
|| TX_PSA0 Steering Pipe 0                                   || 0               
||
|| TX_PSA0 Steering Pipe 1                                   || 0               
||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               
||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               
||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               
||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               
||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               
||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               
||
|| TX_PSA1 Steering Pipe 0                                   || 0               
||
|| TX_PSA1 Steering Pipe 1                                   || 0               
||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               
||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               
||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               
||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               
||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               
||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               
||
==================================================================================
=============================================================================================================================================================
|| Counter Name                                              || Counter Value   
||| Performance Analysis                || Analysis Value [Units]           ||
=============================================================================================================================================================
|| Level 0 MTT Cache Hit                                     || 0               
|||                                Bandwidth                                ||
|| Level 0 MTT Cache Miss                                    || 0               
||---------------------------------------------------------------------------
|| Level 1 MTT Cache Hit                                     || 0               
||| RX BandWidth                        || 76.1926       [Gb/s]             ||
|| Level 1 MTT Cache Miss                                    || 0               
||| TX BandWidth                        || 0             [Gb/s]             ||
|| Level 0 MPT Cache Hit                                     || 0               
||===========================================================================
|| Level 0 MPT Cache Miss                                    || 0               
|||                                 Memory                                  ||
|| Level 1 MPT Cache Hit                                     || 0               
||---------------------------------------------------------------------------
|| Level 1 MPT Cache Miss                                    || 0               
||| RX Indirect Memory Keys Rate        || 0             [Keys/Packet]      ||
|| Indirect Memory Key Access                                || 0               
||===========================================================================
|| ICM Cache Miss                                            || 38              
|||                             PCIe Bandwidth                              ||
|| PCIe Internal Back Pressure                               || 0               
||---------------------------------------------------------------------------
|| Outbound Stalled Reads                                    || 0               
||| PCIe Inbound Available BW           || 251.385       [Gb/s]             ||
|| Outbound Stalled Writes                                   || 0               
||| PCIe Inbound BW Utilization         || 0.0027        [%]                ||
|| PCIe Read Stalled due to No Read Engines                  || 0               
||| PCIe Inbound Used BW                || 0.0069        [Gb/s]             ||
|| PCIe Read Stalled due to No Completion Buffer             || 0               
||| PCIe Outbound Available BW          || 251.385       [Gb/s]             ||
|| PCIe Read Stalled due to Ordering                         || 0               
||| PCIe Outbound BW Utilization        || 0.0025        [%]                ||
|| RX IPsec Packets                                          || 0               
||| PCIe Outbound Used BW               || 0.0062        [Gb/s]             ||
|| Back Pressure from RXD to PSA                             || 0               
||===========================================================================
|| Chip Frequency                                            || 429.9919        
|||                              PCIe Latency                               ||
|| Back Pressure from RXB Buffer to RXB FIFO                 || 0               
||---------------------------------------------------------------------------
|| Back Pressure from PSA switch to RXT                      || 0               
||| PCIe Avg Latency                    || 522           [NS]               ||
|| Back Pressure from PSA switch to RXB                      || 0               
||| PCIe Max Latency                    || 541           [NS]               ||
|| Back Pressure from PSA switch to RXD                      || 0               
||| PCIe Min Latency                    || 511           [NS]               ||
|| Back Pressure from Internal MMU to RX Descriptor Handling || 148,813,590     
||===========================================================================
|| Receive WQE Cache Hit                                     || 0               
|||                       PCIe Unit Internal Latency                        ||
|| Receive WQE Cache Miss                                    || 0               
||---------------------------------------------------------------------------
|| Back Pressure from PCIe to Packet Scatter                 || 0               
||| PCIe Internal Avg Latency           || 4             [NS]               ||
|| RX Steering Packets                                       || 148,813,592     
||| PCIe Internal Max Latency           || 4             [NS]               ||
|| RX Steering Packets Fast Path                             || 0               
||| PCIe Internal Min Latency           || 4             [NS]               ||
|| EQ All State Machines Busy                                || 0               
||===========================================================================
|| CQ All State Machines Busy                                || 0               
|||                               Packet Rate                               ||
|| MSI-X All State Machines Busy                             || 0               
||---------------------------------------------------------------------------
|| CQE Compression Sessions                                  || 0               
||| RX Packet Rate                      || 148,813,590   [Packets/Seconds]  ||
|| Compressed CQEs                                           || 0               
||| TX Packet Rate                      || 0             [Packets/Seconds]  ||
|| Compression Session Closed due to EQE                     || 0               
||===========================================================================
|| Compression Session Closed due to Timeout                 || 0               
|||                                 eSwitch                                 ||
|| Compression Session Closed due to Mismatch                || 0               
||---------------------------------------------------------------------------
|| Compression Session Closed due to PCIe Idle               || 0               
||| RX Hops Per Packet                  || 3.075         [Hops/Packet]      ||
|| Compression Session Closed due to S2CQE                   || 0               
||| RX Optimal Hops Per Packet Per Pipe || 1.5375        [Hops/Packet]      ||
|| Compressed CQE Strides                                    || 0               
||| RX Optimal Packet Rate Bottleneck   || 279.6695      [MPPS]             ||
|| Compression Session Closed due to LRO                     || 0               
||| RX Packet Rate Bottleneck           || 262.2723      [MPPS]             ||
|| TX Descriptor Handling Stopped due to Limited State       || 0               
||| TX Hops Per Packet                  || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to Limited VL          || 0               
||| TX Optimal Hops Per Packet Per Pipe || 0             [Hops/Packet]      ||
|| TX Descriptor Handling Stopped due to De-schedule         || 0               
||| TX Optimal Packet Rate Bottleneck   || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to Work Done           || 0               
||| TX Packet Rate Bottleneck           || 0             [MPPS]             ||
|| TX Descriptor Handling Stopped due to E2E Credits         || 0               
||===========================================================================
|| Line Transmitted Port 1                                   || 0               
||
|| Line Transmitted Port 2                                   || 0               
||
|| Line Transmitted Loop Back                                || 0               
||
|| RX_PSA0 Steering Pipe 0                                   || 243,977,877     
||
|| RX_PSA0 Steering Pipe 1                                   || 213,617,683     
||
|| RX_PSA0 Steering Cache Access Pipe 0                      || 203,526,803     
||
|| RX_PSA0 Steering Cache Access Pipe 1                      || 177,919,444     
||
|| RX_PSA0 Steering Cache Hit Pipe 0                         || 202,742,093     
||
|| RX_PSA0 Steering Cache Hit Pipe 1                         || 177,158,314     
||
|| RX_PSA0 Steering Cache Miss Pipe 0                        || 161,513         
||
|| RX_PSA0 Steering Cache Miss Pipe 1                        || 158,843         
||
|| RX_PSA1 Steering Pipe 0                                   || 243,977,877     
||
|| RX_PSA1 Steering Pipe 1                                   || 213,617,683     
||
|| RX_PSA1 Steering Cache Access Pipe 0                      || 203,526,803     
||
|| RX_PSA1 Steering Cache Access Pipe 1                      || 177,919,444     
||
|| RX_PSA1 Steering Cache Hit Pipe 0                         || 202,742,093     
||
|| RX_PSA1 Steering Cache Hit Pipe 1                         || 177,158,314     
||
|| RX_PSA1 Steering Cache Miss Pipe 0                        || 161,513         
||
|| RX_PSA1 Steering Cache Miss Pipe 1                        || 0               
||
|| TX_PSA0 Steering Pipe 0                                   || 0               
||
|| TX_PSA0 Steering Pipe 1                                   || 0               
||
|| TX_PSA0 Steering Cache Access Pipe 0                      || 0               
||
|| TX_PSA0 Steering Cache Access Pipe 1                      || 0               
||
|| TX_PSA0 Steering Cache Hit Pipe 0                         || 0               
||
|| TX_PSA0 Steering Cache Hit Pipe 1                         || 0               
||
|| TX_PSA0 Steering Cache Miss Pipe 0                        || 0               
||
|| TX_PSA0 Steering Cache Miss Pipe 1                        || 0               
||
|| TX_PSA1 Steering Pipe 0                                   || 0               
||
|| TX_PSA1 Steering Pipe 1                                   || 0               
||
|| TX_PSA1 Steering Cache Access Pipe 0                      || 0               
||
|| TX_PSA1 Steering Cache Access Pipe 1                      || 0               
||
|| TX_PSA1 Steering Cache Hit Pipe 0                         || 0               
||
|| TX_PSA1 Steering Cache Hit Pipe 1                         || 0               
||
|| TX_PSA1 Steering Cache Miss Pipe 0                        || 0               
||
|| TX_PSA1 Steering Cache Miss Pipe 1                        || 0               
||
==================================================================================

Reply via email to