> So, "deamon" and "server" may try using the same queue sometimes, correct? > Synchronizing all access to the single queue should work in this case.
That is correct. > BTW, rte_eth_tx_burst() returning >0 does not mean the packets have been sent. > It only means they have been enqueued for sending. > At some point the NIC will complete sending, > only then the PMD can free the mbuf (or decrement its reference count). > For most PMDs, this happens on a subsequent call to rte_eth_tx_burst(). > Which PMD and HW is it? Here is the output of 'dpdk-devbind.py --status': Network devices using DPDK-compatible driver ============================================ 0000:65:00.1 'Ethernet Controller 10G X550T 1563' drv=vfio-pci unused=uio_pci_generic > Have you tried to print as many stats as possible when rte_eth_tx_burst() > can't consume all packets (rte_eth_stats_get(), rte_eth_xstats_get())? In setting this up, I discovered that this error only occurs when the primary process on the other host exits (due to an error) or is not initially running (the NIC is "down" in this case?). It happens consistently when I only launch the processes on one of the two machines. ***But*** counterintuitively, it looks like packets are successfully "sent" by the daemon until the other process begins to run. In case it is useful, I summarize the stats for this case below. Note that I am also seeing another error. Sometimes, rather than tx failing, my app detects incorrect/corrupted mbuf contents and exits immediately. It appears that mbufs are being re-allocated when they should not be. I thought I had finally solved this (see my earlier threads) but with multi-core concurrency this problem has returned. It is very possible that this error is somewhere in my own library code, as it looks like the accompanying non-DPDK structures are also being corrupted (probably first). For background, I maintain a hash table of header structs to track individual mbufs. The sequence numbers in the headers should match those contained in the mbuf's payload. This check is failing after a few hundred successful data messages have been exchanged between the hosts. The sequence number in the mbuf shows that it is in the wrong hash bucket, and the sequence number in the header is a large corrupted value which is out of range for my sequence numbers (and also not matching the bucket). Back to the issue of failed tx bursts: Here are the stats I observed after a packet failed to send from the daemon (after only launching the primary+secondary processes on one of the machines). This failure occurred after the daemon had successfully "sent" hundreds of handshake packets (to nowhere, presumably?), and the failure occurred as soon as the second process had finished initialization: ipackets:0, opackets:0, ibytes:0, obytes:0, ierrors:0, oerrors:0 Got 146 xstats Port:0, tx_q0_packets:1138 Port:0, tx_q0_bytes:125180 Port:0, mac_local_errors:2 Port:0, out_pkts_untagged:5 (All other stats had a value of 0 and are omitted). I will continue investigating the corruption bug in the (likely) case that it is in my library code. In the meantime please let me know if I am using DPDK incorrectly. Thank you again! -Alan