On 07/17/2017 05:31 PM, cristian.dumitre...@intel.com wrote:

-----Original Message-----
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of
vuon...@viettel.com.vn
Sent: Monday, July 17, 2017 3:04 AM
Cc: users@dpdk.org; d...@dpdk.org
Subject: [dpdk-dev] Rx Can't receive anymore packet after received 1.5
billion packet.

Hi DPDK team,
Sorry when I send this email to both of group users and dev. But I have
big problem: Rx core on my application can not receive anymore packet
after I did the stress test to it (~1 day Rx core received ~ 1.5 billion
packet). Rx core still alive but didn't receive any packet and didn't
generate any log. Below is my system configuration:
- OS: CentOS 7
- Kernel: 3.10.0-514.16.1.el7.x86_64
- Huge page: 32G: 16384 page 2M
- NIC card: Intel 85299
- DPDK version: 16.11
- Architecture: Rx (lcore 1) received packet then queue to the ring
----- Worker (lcore 2) dequeue packet in the ring and free it (use
rte_pktmbuf_free() function).
- Mempool create: rte_pktmbuf_pool_create (
                                           "rx_pool",                  /*
name */
                                           8192,                     /*
number of elemements in the mbuf pool */
256,                                            /* Size of per-core
object cache */
0,                                                 /* Size of
application private are between rte_mbuf struct and data buffer */
                                           RTE_MBUF_DEFAULT_BUF_SIZE, /*
Size of data buffer in each mbuf (2048 + 128)*/
0                                                   /* socket id */
                              );
If I change "number of elemements in the mbuf pool" from 8192 to 512, Rx
have same problem after shorter time (~ 30s).

Please tell me if you need more information. I am looking forward to
hearing from you.


Many thanks,
Vuong Le
Hi Vuong,

This is likely to be a buffer leakage problem. You might have a path in your code where 
you are not freeing a buffer and therefore this buffer gets "lost", as the 
application is not able to use this buffer any more since it is not returned back to the 
pool, so the pool of free buffers shrinks over time up to the moment when it eventually 
becomes empty, so no more packets can be received.

You might want to periodically monitor the numbers of free buffers in your 
pool; if this is the root cause, then you should be able to see this number 
constantly decreasing until it becomes flat zero, otherwise you should be able 
to the number of free buffers oscillating around an equilibrium point.

Since it takes a relatively big number of packets to get to this issue, it is 
likely that the code path that has this problem is not executed very 
frequently: it might be a control plane packet that is not freed up, or an ARP 
request/reply pkt, etc.

Regards,
Cristian
Hi Cristian,
Thanks for your response, I am doing your ideal. But let me show you another case i have tested before. I changed architecture of my application as below: - Architecture: Rx (lcore 1) received packet then queue to the ring ----- after that: Rx (lcore 1) dequeue packet in the ring and free it immediately.
(old architecture as above)
With new architecture Rx still receive packet after 2 day and everything look good. Unfortunately, My application must run in old architecture.

Any ideal for me?


Many thanks,
Vuong Le

Reply via email to