See previous email about 18.01 support. Much has changed in the code base in 
the last year. You would be well-advised to move to 18.10 at your earliest 
possible convenience.

To start working out what’s wrong: grab one of the worker threads in gdb, and 
see what it’s doing. Possibilities include waiting for the bihash writer lock, 
I suppose. I can’t say that I’ve ever seen this kind of failure.

HTH... Dave

From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of siddarth rai
Sent: Tuesday, November 20, 2018 3:38 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] VPP crashes out of vlib_worker_thread_barrier_sync_int while 
workers stuck in clib_bihash_add_del

Hi,
I am using VPP version v18.01.1-100~g3a6948c.

VPP sometimes crashes out of vlib_worker_thread_barrier_sync_int when running 
with load.

Here is the back trace :
(gdb) bt
#0  0x00002b9e5d45d207 in raise () from /lib64/libc.so.6
#1  0x00002b9e5d45e8f8 in abort () from /lib64/libc.so.6
#2  0x0000000000405f23 in os_panic () at 
/bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vpp/vnet/main.c:268
#3  0x00002b9e5b55c7ea in vlib_worker_thread_barrier_sync_int 
(vm=0x2b9e5b78c260 <vlib_global_main>)
    at 
/bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vlib/threads.c:1488
#4  0x00002b9e5b2d6e2a in vl_msg_api_handler_with_vm_node (am=0x2b9e5b5084a0 
<api_main>, the_msg=0x304d0b2c, vm=0x2b9e5b78c260 <vlib_global_main>, 
node=<optimized out>)
    at 
/bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vlibapi/api_shared.c:506
#5  0x00002b9e5b2e645c in memclnt_process (vm=0x2b9e5b78c260 
<vlib_global_main>, node=0x2b9e5e008000, f=<optimized out>)
    at 
/bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vlibmemory/memory_vlib.c:987
#6  0x00002b9e5b5386e6 in vlib_process_bootstrap (_a=<optimized out>) at 
/bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vlib/main.c:1231
#7  0x00002b9e5c8a48b8 in clib_calljmp () at 
/bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vppinfra/longjmp.S:110
#8  0x00002b9e60327e30 in ?? ()
#9  0x00002b9e5b539a59 in vlib_process_startup (f=0x0, p=0x2b9e5e008000, 
vm=0x2b9e5b78c260 <vlib_global_main>)
    at 
/bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vlib/main.c:1253
#10 dispatch_process (vm=0x2b9e5b78c260 <vlib_global_main>, p=0x2b9e5e008000, 
last_time_stamp=3140570395200949, f=0x0)
    at 
/bfs-build//2018-11-16-0505/third-party/vpp/vpp_1801/build-data/../src/vlib/main.c:1296
#11 0x0000000000000000 in ?? ()


Some of the worker threads seem to be stuck in Bihash add_del operation ( part 
of out implementation )

(gdb) info thr
  Id   Target Id         Frame
  7    Thread 0x2ba2f991a700 (LWP 69610) 0x00002ba0e08cd184 in 
clib_bihash_add_del_40_8 (h=0x2b9e6050d030, add_v=0x2b9ea5ed8cf8,
    is_add=<optimized out>) at /spare/include/vppinfra/bihash_template.c:338
...
  5    Thread 0x2ba2f9317700 (LWP 69607) 0x00002ba0e08cd184 in 
clib_bihash_add_del_40_8 (h=0x2b9e6050cc10, add_v=0x2b9e7ca45620,
    is_add=<optimized out>) at 
/spare/srai/include/vppinfra/bihash_template.c:338
...

Is it possible for worker threads to be stuck at this place for some reason? 
Any help would be appreciated.


Thanks,
Siddarth

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11337): https://lists.fd.io/g/vpp-dev/message/11337
Mute This Topic: https://lists.fd.io/mt/28265837/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to