Is there anything in the usage I described in my previous email which might explain this problem? Is there anything else wrong with what I'm doing driver-wise?
On Sun, Nov 10, 2024 at 12:31 PM Alan Beadle <ab.bea...@gmail.com> wrote: > > I'm using the vfio-pci module with Intel X550-T2 NICs. I believe this > means it will use the ixgbe driver? To be honest, I am a bit confused > about the use of drivers in DPDK. I am using the first setup that I > got to work and send/receive packets. Additional tips would be greatly > appreciated. After loading the vfio-pci module I run dpdk-devbind.py > --bind vfio-pci 65:00.1 and then I just use the standard DPDK API > calls in my app. I was meaning to revisit this once my app was more > complete. > > On Sun, Nov 10, 2024 at 12:12 PM Stephen Hemminger > <step...@networkplumber.org> wrote: > > > > On Sun, 10 Nov 2024 11:23:29 -0500 > > Alan Beadle <ab.bea...@gmail.com> wrote: > > > > > Hi everyone, > > > > > > I am using DPDK to send two-way traffic between a pair of machines. My > > > application has local readers, remote acknowledgments, as well as > > > automatic retries when a packet is lost. For these reasons I am using > > > rte_mbuf_refcnt_update() to prevent the NIC from freeing the packet > > > and recycling the mbuf before my local readers are done and the remote > > > reader has acknowledged the message. I was advised to do this in an > > > earlier thread on this mailing list. > > > > > > However, this does not seem to be working. After running my app for > > > awhile and exchanging about 1000 messages in this way, my queue of > > > unacknowledged mbufs is getting corrupted. The mbufs attached to my > > > queue seem to contain data for newer messages than what is supposed to > > > be in them, and in some cases contains a totally different type of > > > packet (an acknack for instance). Obviously this results in retries of > > > those messages failing to send the correct data and my application > > > gets stuck. > > > > > > I have ensured that the refcount is not reaching 0. Each new mbuf > > > immediately has the refcnt incremented by 1. I was concerned that > > > retries might need the refcnt bumped again, but if I bump the refcount > > > every time I resend a specific mbuf to the NIC, the refcounts just > > > keep getting higher. So it looks like re-bumping it on a resend is not > > > necessary. > > > > > > I have ruled out other possible explanations. The mbufs are being > > > reused by rte_pktmbuf_alloc. I even tried playing with the EAL > > > settings related to the number of mbuf descriptors and saw my changes > > > directly correlate with how long it takes this problem to occur. How > > > do I really prevent the driver from reusing packets that I still might > > > need to resend? > > > > > > Thanks in advance, > > > -Alan > > > > Which driver, could be a driver bug. > > > > Also, you should be able to trace mbuf functions, either with rte_trace > > or by other facility.