> Note that I am also seeing another error. Sometimes, rather than tx
> failing, my app detects incorrect/corrupted mbuf contents and exits
> immediately. It appears that mbufs are being re-allocated when they
> should not be. I thought I had finally solved this (see my earlier
> threads) but with multi-core concurrency this problem has returned. It
> is very possible that this error is somewhere in my own library code,
> as it looks like the accompanying non-DPDK structures are also being
> corrupted (probably first).
>
> For background, I maintain a hash table of header structs to track
> individual mbufs. The sequence numbers in the headers should match
> those contained in the mbuf's payload. This check is failing after a
> few hundred successful data messages have been exchanged between the
> hosts. The sequence number in the mbuf shows that it is in the wrong
> hash bucket, and the sequence number in the header is a large
> corrupted value which is out of range for my sequence numbers (and
> also not matching the bucket).
>

There is definitely something going wrong with the mbuf allocator.
Each run results in such different errors that it is difficult to add
instrumentation for a specific one, but one frequent error is that a
newly allocated mbuf already has a refcnt of 2, and contains data that
I am still using elsewhere. At each call to rte_pktmbuf_alloc() (with
locks around it) I immediately do a rte_mbuf_refcnt_read() and ensure
that it is 1. Sometimes it is 2. This should never occur and I believe
it proves that DPDK is not working as expected here for some reason.

-Alan

Reply via email to