> -----Original Message-----
> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of
> vuon...@viettel.com.vn
> Sent: Monday, July 17, 2017 3:04 AM
> Cc: users@dpdk.org; d...@dpdk.org
> Subject: [dpdk-dev] Rx Can't receive anymore packet after received 1.5
> billion packet.
> 
> Hi DPDK team,
> Sorry when I send this email to both of group users and dev. But I have
> big problem: Rx core on my application can not receive anymore packet
> after I did the stress test to it (~1 day Rx core received ~ 1.5 billion
> packet). Rx core still alive but didn't receive any packet and didn't
> generate any log. Below is my system configuration:
> - OS: CentOS 7
> - Kernel: 3.10.0-514.16.1.el7.x86_64
> - Huge page: 32G: 16384 page 2M
> - NIC card: Intel 85299
> - DPDK version: 16.11
> - Architecture: Rx (lcore 1) received packet then queue to the ring
> ----- Worker (lcore 2) dequeue packet in the ring and free it (use
> rte_pktmbuf_free() function).
> - Mempool create: rte_pktmbuf_pool_create (
>                                           "rx_pool",                  /*
> name */
>                                           8192,                     /*
> number of elemements in the mbuf pool */
> 256,                                            /* Size of per-core
> object cache */
> 0,                                                 /* Size of
> application private are between rte_mbuf struct and data buffer */
>                                           RTE_MBUF_DEFAULT_BUF_SIZE, /*
> Size of data buffer in each mbuf (2048 + 128)*/
> 0                                                   /* socket id */
>                              );
> If I change "number of elemements in the mbuf pool" from 8192 to 512, Rx
> have same problem after shorter time (~ 30s).
> 
> Please tell me if you need more information. I am looking forward to
> hearing from you.
> 
> 
> Many thanks,
> Vuong Le

Hi Vuong,

This is likely to be a buffer leakage problem. You might have a path in your 
code where you are not freeing a buffer and therefore this buffer gets "lost", 
as the application is not able to use this buffer any more since it is not 
returned back to the pool, so the pool of free buffers shrinks over time up to 
the moment when it eventually becomes empty, so no more packets can be received.

You might want to periodically monitor the numbers of free buffers in your 
pool; if this is the root cause, then you should be able to see this number 
constantly decreasing until it becomes flat zero, otherwise you should be able 
to the number of free buffers oscillating around an equilibrium point.

Since it takes a relatively big number of packets to get to this issue, it is 
likely that the code path that has this problem is not executed very 
frequently: it might be a control plane packet that is not freed up, or an ARP 
request/reply pkt, etc.

Regards,
Cristian

Reply via email to