Dear Andrew I tested your patch and my problem still exist, but my service status changed and now there isn't any information about deadlock problem. Do you have any idea about how I can provide you more information?
root@MYRB:~# service vpp status * vpp.service - vector packet processing engine Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor preset: enabled) Active: inactive (dead) May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded plugin: udp_ping_test_plugin.so May 29 09:27:06 MYRB /usr/bin/vpp[30805]: load_one_vat_plugin:67: Loaded plugin: stn_test_plugin.so May 29 09:27:06 MYRB vpp[30805]: /usr/bin/vpp[30805]: dpdk: EAL init args: -c 1ff -n 4 --huge-dir /run/vpp/hugepages --file-prefix vpp -w 0000:08:00.0 -w 0000:08:00.1 -w 0000:08 May 29 09:27:06 MYRB /usr/bin/vpp[30805]: dpdk: EAL init args: -c 1ff -n 4 --huge-dir /run/vpp/hugepages --file-prefix vpp -w 0000:08:00.0 -w 0000:08:00.1 -w 0000:08:00.2 -w 000 May 29 09:27:07 MYRB vnet[30805]: dpdk_ipsec_process:1012: not enough DPDK crypto resources, default to OpenSSL May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received signal SIGCONT, PC 0x7fa535dfbac0 May 29 09:27:13 MYRB vnet[30805]: received SIGTERM, exiting... May 29 09:27:13 MYRB systemd[1]: Stopping vector packet processing engine... May 29 09:27:13 MYRB vnet[30805]: unix_signal_handler:124: received signal SIGTERM, PC 0x7fa534121867 May 29 09:27:13 MYRB systemd[1]: Stopped vector packet processing engine. ________________________________ From: Andrew 👽 Yourtchenko <ayour...@gmail.com> Sent: Monday, May 28, 2018 5:58 PM To: Rubina Bianchi Cc: vpp-dev@lists.fd.io Subject: Re: [vpp-dev] Rx stuck to 0 after a while Dear Rubina, Thanks for catching and reporting this! I suspect what might be happening is my recent change of using two unidirectional sessions in bihash vs. the single one triggered a race, whereby as the owning worker is deleting the session, the non-owning worker is trying to update it. That would logically explain the "BUG: .." line (since you don't change the interfaces nor moving the traffic around, the 5 tuples should not collide), and as well the later stop. To take care of this issue, I think I will split the deletion of the session in two stages: 1) deactivation of the bihash entries that steer the traffic 2) freeing up the per-worker session structure and have a little pause time inbetween these two so that the workers-in-progress could finish updating the structures. The below gerrit is the first cut: https://gerrit.fd.io/r/#/c/12770/ It passes the make test right now but I did not kick its tires too much yet, will do tomorrow. You can try this change out in your test setup as well and tell me how it feels. --a On 5/28/18, Rubina Bianchi <r_bian...@outlook.com> wrote: > Hi > > I run vpp v18.07-rc0~237-g525c9d0f with only 2 interface in stateful acl > (permit+reflect) and generated sfr traffic using trex v2.27. My rx will > become 0 after a short while, about 300 sec in my machine. Here is vpp > status: > > root@MYRB:~# service vpp status > * vpp.service - vector packet processing engine > Loaded: loaded (/lib/systemd/system/vpp.service; disabled; vendor preset: > enabled) > Active: failed (Result: signal) since Mon 2018-05-28 11:35:03 +0130; 37s > ago > Process: 32838 ExecStopPost=/bin/rm -f /dev/shm/db /dev/shm/global_vm > /dev/shm/vpe-api (code=exited, status=0/SUCCESS) > Process: 31754 ExecStart=/usr/bin/vpp -c /etc/vpp/startup.conf > (code=killed, signal=ABRT) > Process: 31750 ExecStartPre=/sbin/modprobe uio_pci_generic (code=exited, > status=0/SUCCESS) > Process: 31747 ExecStartPre=/bin/rm -f /dev/shm/db /dev/shm/global_vm > /dev/shm/vpe-api (code=exited, status=0/SUCCESS) > Main PID: 31754 (code=killed, signal=ABRT) > > May 28 16:32:47 MYRB vnet[31754]: acl_fa_node_fn:210: BUG: session > LSB16(sw_if_index) and 5-tuple collision! > May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received signal > SIGCONT, PC 0x7f1fb591cac0 > May 28 16:35:02 MYRB vnet[31754]: received SIGTERM, exiting... > May 28 16:35:02 MYRB systemd[1]: Stopping vector packet processing > engine... > May 28 16:35:02 MYRB vnet[31754]: unix_signal_handler:124: received signal > SIGTERM, PC 0x7f1fb3c40867 > May 28 16:35:03 MYRB vpp[31754]: vlib_worker_thread_barrier_sync_int: worker > thread deadlock > May 28 16:35:03 MYRB systemd[1]: vpp.service: Main process exited, > code=killed, status=6/ABRT > May 28 16:35:03 MYRB systemd[1]: Stopped vector packet processing engine. > May 28 16:35:03 MYRB systemd[1]: vpp.service: Unit entered failed state. > May 28 16:35:03 MYRB systemd[1]: vpp.service: Failed with result 'signal'. > > I attach my vpp configs to this email. I also run this test with the same > config and added 4 interface instead of two. But in this case nothing > happened to vpp and it was functional for a long time. > > Thanks, > RB >