RE: dpdk Tx falling short

Lombardo, Ed Tue, 08 Jul 2025 07:29:41 -0700

Hi Ivan,
Yes, only the user space created rings.  
Can you add more to your thoughts?


Ed

-----Original Message-----
From: Ivan Malov <[email protected]> 
Sent: Tuesday, July 8, 2025 10:19 AM
To: Lombardo, Ed <[email protected]>
Cc: Stephen Hemminger <[email protected]>; users <[email protected]>
Subject: RE: dpdk Tx falling short

External Email: This message originated outside of NETSCOUT. Do not click links 
or open attachments unless you recognize the sender and know the content is 
safe.

Hi Ed,

On Tue, 8 Jul 2025, Lombardo, Ed wrote:

> Hi Stephen,
> When I replace rte_eth_tx_burst() with mbuf free bulk I do not see the tx 
> ring fill up.  I think this is valuable information.  Also, perf analysis of 
> the tx thread shows common_ring_mp_enqueue and rte_atomic32_cmpset, where I 
> did not expect to see if I created all the Tx  rings as SP and SC (and the 
> workers and ack rings as well, essentially all the 16 rings).
>
> Perf report snippet:
> +   57.25%  DPDK_TX_1  test            [.] common_ring_mp_enqueue
> +   25.51%  DPDK_TX_1  test            [.] rte_atomic32_cmpset
> +    9.13%  DPDK_TX_1  test             [.] i40e_xmit_pkts
> +    6.50%  DPDK_TX_1  test             [.] rte_pause
>      0.21%  DPDK_TX_1  test              [.] 
> rte_mempool_ops_enqueue_bulk.isra.0
>      0.20%  DPDK_TX_1  test              [.] dpdk_tx_thread
>
> The traffic load is constant 10 Gbps 84 bytes packets with no idles.  The 
> burst size of 512 is a desired burst of mbufs, however the tx thread will 
> transmit what ever it can get from the Tx ring.
>
> I think if resolving why the perf analysis shows ring is MP when it has been 
> created as SP / SC should resolve this issue.

The 'common_ring_mp_enqueue' is the enqueue method of mempool variant 'ring', 
that is, based on RTE Ring internally. When you say that ring has been created 
as SP / SC you seemingly refer to the regular RTE ring created by your 
application logic, not the internal ring of the mempool. Am I missing something?

Thank you.

>
> Thanks,
> ed
>
> -----Original Message-----
> From: Stephen Hemminger <[email protected]>
> Sent: Tuesday, July 8, 2025 9:47 AM
> To: Lombardo, Ed <[email protected]>
> Cc: Ivan Malov <[email protected]>; users <[email protected]>
> Subject: Re: dpdk Tx falling short
>
> External Email: This message originated outside of NETSCOUT. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>
> On Tue, 8 Jul 2025 04:10:05 +0000
> "Lombardo, Ed" <[email protected]> wrote:
>
>> Hi Stephen,
>> I ensured that in every pipeline stage that enqueue or dequeues mbufs it 
>> uses the burst version, perf showed the repercussions of doing one mbuf 
>> dequeue and enqueue.
>> For the receive stage rte_eth_rx_burst() is used and Tx stage we use 
>> rte_eth_tx_burst().  The burst size used in tx_thread for dequeue burst is 
>> 512 Mbufs.
>
> You might try buffering like rte_eth_tx_buffer does.
> Need to add an additional mechanism to ensure that buffer gets flushed when 
> you detect idle period.
>

RE: dpdk Tx falling short

Reply via email to