I am writing a bridge program. Originally, I based my implementation on skeleton/basicfwd.c. I next wanted to support using multi-core, so I found l2fwd.c as a simple model for tying queues to cores. However, l2fwd.c uses rte_eth_tx_buffer. Not understanding enough about dpdk, I switched over to using rte_eth_tx_buffer because I wrongly thought that it had to be used with multi-core.
I have changed my code back to using rte_eth_tx_burst, and that has solved my problem. However, on very unbalanced traffic, using rte_eth_tx_buffer caused an 80% performance degradation. That seems rather extreme for such a small change, so I was inquiring to see if people understood why. And given this degradation, I'm surprised that l2fwd uses rte_eth_tx_buffer instead of rte_eth_tx_burst. -Bev ________________________________________ From: Manish Kumar <[email protected]> Sent: Monday, July 13, 2020 2:32 AM To: Suraj R Gupta Cc: Bev SCHWARTZ; [email protected] Subject: [External] Re: [dpdk-users] Significant performance degradation when using tx buffers rather than rte_eth_tx_burst I agree with Suraj on the same. @Bev : Were you trying to use rte_eth_tx_buffer function as part of just an experiment ? As per your email you already got performance with the rte_eth_tx_burst function. Regards Manish On Wed, Jul 8, 2020 at 1:42 PM Suraj R Gupta <[email protected]<mailto:[email protected]>> wrote: Hi bev, If my understanding is right, rte_eth_tx_burst transmits output packets immediately with a specified number of packets. While, 'rte_eth_tx_buffer' buffers the packet in the queue of the port, the packets would be transmitted only when buffer is or rte_eth_tx_buffer_flush is called. Since you are buffering packets one by one and then you are calling flush, this may have contributed to the delay. Thanks and Regards Suraj R Gupta On Wed, Jul 8, 2020 at 10:53 PM Bev SCHWARTZ <[email protected]<mailto:[email protected]>> wrote: > I am writing a bridge using DPDK, where I have traffic read from one port > transmitted to the other. Here is the core of the program, based on > basicfwd.c. > > while (!force_quit) { > nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE); > for (i = 0; i < nb_rx; i++) { > /* inspect packet */ > } > nb_tx = rte_eth_tx_burst(tx_port, tx_queue, bufs, nb_rx); > for (i = nb_tx; i < nb_rx; i++) { > rte_pktmbuf_free(bufs[i]); > } > } > > (A bunch of error checking and such left out for brevity.) > > This worked great, I got bandwidth equivalent to using a Linux Bridge. > > I then tried using tx buffers instead. (Initialization code left out for > brevity.) Here is the new loop. > > while (!force_quit) { > nb_rx = rte_eth_rx_burst(rx_port, rx_queue, bufs, BURST_SIZE); > for (i = 0; i < nb_rx; i++) { > /* inspect packet */ > rte_eth_tx_buffer(tx_port, tx_queue, tx_buffer, bufs[i]); > } > rte_eth_tx_buffer_flush(tx_port, tx_queue, tx_buffer); > } > > (Once again, error checking left out for brevity.) > > I am running this on 8 cores, each core has its own loop. (tx_buffer is > created for each core.) > > If I have well balanced traffic across the cores, then my performance goes > down, about 5% or so. If I have unbalanced traffic such as all traffic > coming from a single flow, my performance goes down 80% from about 10 gbs > to 2gbs. > > I want to stress that the ONLY thing that changed in this code is changing > how I transmit packets. Everything else is the same. > > Any idea why this would cause such a degradation in bit rate? > > -Bev -- Thanks and Regards Suraj R Gupta -- Thanks Manish Kumar
