See previous email about 18.01 support. Much has changed in the code base in the last year. You would be well-advised to move to 18.10 at your earliest possible convenience.
To start working out what’s wrong: grab one of the worker threads in gdb, and see what it’s doing. Possibilities include waiting for the bihash writer lock, I suppose. I can’t say that I’ve ever seen this kind of failure. HTH... Dave From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of siddarth rai Sent: Tuesday, November 20, 2018 3:38 AM To: vpp-dev@lists.fd.io Subject: [vpp-dev] VPP crashes out of vlib_worker_thread_barrier_sync_int while workers stuck in clib_bihash_add_del Hi, I am using VPP version v18.01.1-100~g3a6948c. VPP sometimes crashes out of vlib_worker_thread_barrier_sync_int when running with load. Here is the back trace : (gdb) bt #0 0x00002b9e5d45d207 in raise () from /lib64/libc.so.6 #1 0x00002b9e5d45e8f8 in abort () from /lib64/libc.so.6 #2 0x0000000000405f23 in os_panic () at /bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vpp/vnet/main.c:268 #3 0x00002b9e5b55c7ea in vlib_worker_thread_barrier_sync_int (vm=0x2b9e5b78c260 <vlib_global_main>) at /bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vlib/threads.c:1488 #4 0x00002b9e5b2d6e2a in vl_msg_api_handler_with_vm_node (am=0x2b9e5b5084a0 <api_main>, the_msg=0x304d0b2c, vm=0x2b9e5b78c260 <vlib_global_main>, node=<optimized out>) at /bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vlibapi/api_shared.c:506 #5 0x00002b9e5b2e645c in memclnt_process (vm=0x2b9e5b78c260 <vlib_global_main>, node=0x2b9e5e008000, f=<optimized out>) at /bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vlibmemory/memory_vlib.c:987 #6 0x00002b9e5b5386e6 in vlib_process_bootstrap (_a=<optimized out>) at /bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vlib/main.c:1231 #7 0x00002b9e5c8a48b8 in clib_calljmp () at /bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vppinfra/longjmp.S:110 #8 0x00002b9e60327e30 in ?? () #9 0x00002b9e5b539a59 in vlib_process_startup (f=0x0, p=0x2b9e5e008000, vm=0x2b9e5b78c260 <vlib_global_main>) at /bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vlib/main.c:1253 #10 dispatch_process (vm=0x2b9e5b78c260 <vlib_global_main>, p=0x2b9e5e008000, last_time_stamp=3140570395200949, f=0x0) at /bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vlib/main.c:1296 #11 0x0000000000000000 in ?? () Some of the worker threads seem to be stuck in Bihash add_del operation ( part of out implementation ) (gdb) info thr Id Target Id Frame 7 Thread 0x2ba2f991a700 (LWP 69610) 0x00002ba0e08cd184 in clib_bihash_add_del_40_8 (h=0x2b9e6050d030, add_v=0x2b9ea5ed8cf8, is_add=<optimized out>) at /spare/include/vppinfra/bihash_template.c:338 ... 5 Thread 0x2ba2f9317700 (LWP 69607) 0x00002ba0e08cd184 in clib_bihash_add_del_40_8 (h=0x2b9e6050cc10, add_v=0x2b9e7ca45620, is_add=<optimized out>) at /spare/srai/include/vppinfra/bihash_template.c:338 ... Is it possible for worker threads to be stuck at this place for some reason? Any help would be appreciated. Thanks, Siddarth
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11337): https://lists.fd.io/g/vpp-dev/message/11337 Mute This Topic: https://lists.fd.io/mt/28265837/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-