On Sun, 10 Nov 2024 11:23:29 -0500 Alan Beadle <ab.bea...@gmail.com> wrote:
> Hi everyone, > > I am using DPDK to send two-way traffic between a pair of machines. My > application has local readers, remote acknowledgments, as well as > automatic retries when a packet is lost. For these reasons I am using > rte_mbuf_refcnt_update() to prevent the NIC from freeing the packet > and recycling the mbuf before my local readers are done and the remote > reader has acknowledged the message. I was advised to do this in an > earlier thread on this mailing list. > > However, this does not seem to be working. After running my app for > awhile and exchanging about 1000 messages in this way, my queue of > unacknowledged mbufs is getting corrupted. The mbufs attached to my > queue seem to contain data for newer messages than what is supposed to > be in them, and in some cases contains a totally different type of > packet (an acknack for instance). Obviously this results in retries of > those messages failing to send the correct data and my application > gets stuck. > > I have ensured that the refcount is not reaching 0. Each new mbuf > immediately has the refcnt incremented by 1. I was concerned that > retries might need the refcnt bumped again, but if I bump the refcount > every time I resend a specific mbuf to the NIC, the refcounts just > keep getting higher. So it looks like re-bumping it on a resend is not > necessary. > > I have ruled out other possible explanations. The mbufs are being > reused by rte_pktmbuf_alloc. I even tried playing with the EAL > settings related to the number of mbuf descriptors and saw my changes > directly correlate with how long it takes this problem to occur. How > do I really prevent the driver from reusing packets that I still might > need to resend? > > Thanks in advance, > -Alan Which driver, could be a driver bug. Also, you should be able to trace mbuf functions, either with rte_trace or by other facility.