> Note that I am also seeing another error. Sometimes, rather than tx > failing, my app detects incorrect/corrupted mbuf contents and exits > immediately. It appears that mbufs are being re-allocated when they > should not be. I thought I had finally solved this (see my earlier > threads) but with multi-core concurrency this problem has returned. It > is very possible that this error is somewhere in my own library code, > as it looks like the accompanying non-DPDK structures are also being > corrupted (probably first). > > For background, I maintain a hash table of header structs to track > individual mbufs. The sequence numbers in the headers should match > those contained in the mbuf's payload. This check is failing after a > few hundred successful data messages have been exchanged between the > hosts. The sequence number in the mbuf shows that it is in the wrong > hash bucket, and the sequence number in the header is a large > corrupted value which is out of range for my sequence numbers (and > also not matching the bucket). >
There is definitely something going wrong with the mbuf allocator. Each run results in such different errors that it is difficult to add instrumentation for a specific one, but one frequent error is that a newly allocated mbuf already has a refcnt of 2, and contains data that I am still using elsewhere. At each call to rte_pktmbuf_alloc() (with locks around it) I immediately do a rte_mbuf_refcnt_read() and ensure that it is 1. Sometimes it is 2. This should never occur and I believe it proves that DPDK is not working as expected here for some reason. -Alan