Hi everyone, I am using DPDK to send two-way traffic between a pair of machines. My application has local readers, remote acknowledgments, as well as automatic retries when a packet is lost. For these reasons I am using rte_mbuf_refcnt_update() to prevent the NIC from freeing the packet and recycling the mbuf before my local readers are done and the remote reader has acknowledged the message. I was advised to do this in an earlier thread on this mailing list.
However, this does not seem to be working. After running my app for awhile and exchanging about 1000 messages in this way, my queue of unacknowledged mbufs is getting corrupted. The mbufs attached to my queue seem to contain data for newer messages than what is supposed to be in them, and in some cases contains a totally different type of packet (an acknack for instance). Obviously this results in retries of those messages failing to send the correct data and my application gets stuck. I have ensured that the refcount is not reaching 0. Each new mbuf immediately has the refcnt incremented by 1. I was concerned that retries might need the refcnt bumped again, but if I bump the refcount every time I resend a specific mbuf to the NIC, the refcounts just keep getting higher. So it looks like re-bumping it on a resend is not necessary. I have ruled out other possible explanations. The mbufs are being reused by rte_pktmbuf_alloc. I even tried playing with the EAL settings related to the number of mbuf descriptors and saw my changes directly correlate with how long it takes this problem to occur. How do I really prevent the driver from reusing packets that I still might need to resend? Thanks in advance, -Alan