Thanks Dave. Yes I do see some changes made in vlibmemory INFRA in latest
master. Earlier I had hard time back porting some fixes
for NTP and fragmentation. I will plan for rebasing to stable version.

On Thu, 3 Dec 2020 at 6:38 PM, <v...@barachs.net> wrote:

> Looks like a corrupt binary API segment heap to me. Signal 7 in
> mspace_malloc(...) is the root cause. The thread hangs due to recursion on
> the mspace lock trying to print / syslog from the signal handler.
>
>
>
> It is abnormal to allocate memory in vl_msg_api_alloc[_as_if_client] in
> the first place.
>
>
>
> As has been communicated multiple times, 19.08 is no longer supported.
>
>
>
> HTH... Dave
>
>
>
> #13 0x00007ffa9f72adf5 in unix_signal_handler (signum=7,
> si=0x7ffa6f6e50f0, uc=0x7ffa6f6e4fc0)
>     at /development/libvpp/src/vlib/unix/main.c:127
> #14 <signal handler called>
> #15 0x00007ffa9f417c03 in mspace_malloc (msp=0x130046010, bytes=77) at
> /development/libvpp/src/vppinfra/dlmalloc.c:4437
> #16 0x00007ffa9f416f6f in mspace_get_aligned (msp=0x130046010,
> n_user_data_bytes=77, align=1, align_offset=0)
>     at /development/libvpp/src/vppinfra/dlmalloc.c:4186
> #17 0x00007ffaa0c7d04f in clib_mem_alloc_aligned_at_offset (size=73,
> align=1, align_offset=0, os_out_of_memory_on_failure=1)
>     at /development/libvpp/src/vppinfra/mem.h:139
> #18 0x00007ffaa0c7d0a2 in clib_mem_alloc (size=73) at
> /development/libvpp/src/vppinfra/mem.h:155
> #19 0x00007ffaa0c7da0a in vl_msg_api_alloc_internal (nbytes=73, pool=0,
> may_return_null=0)
>     at /development/libvpp/src/vlibmemory/memory_shared.c:177
> #20 0x00007ffaa0c7db6f in vl_msg_api_alloc_as_if_client (nbytes=57) at
> /development/libvpp/src/vlibmemory/memory_shared.c:236
>
>
>
> *From:* vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> *On Behalf Of *Rajith
> PR via lists.fd.io
> *Sent:* Thursday, December 3, 2020 5:55 AM
> *To:* vpp-dev <vpp-dev@lists.fd.io>
> *Subject:* [vpp-dev]: Worker Thread Deadlock Detected from
> vl_api_clnt_node
>
>
>
> Hi All,
>
>
>
> We have hit a VPP Worker Thread Deadlock issue. And from the call stacks
> it looks like the main thread is waiting for workers to come back to their
> main loop( ie has taken the barrier lock) and one of the two workers is on
> spin lock to make an rpc to the main thread.
>
> I believe this lock is held by the main thread.
>
>
>
> We are using *19.08 version* and complete bt is pasted below. Also, Can
> someone explain the purpose of* vl_api_clnt_node*.
>
> 414 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L414> 
> */* *INDENT-OFF* */*
>
> 415 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L415> 
> VLIB_REGISTER_NODE 
> <http://rajith/lxr/http/ident?sn=vpp-19-08;i=VLIB_REGISTER_NODE> 
> (vl_api_clnt_node 
> <http://rajith/lxr/http/ident?sn=vpp-19-08;i=vl_api_clnt_node>) =
>
> 416 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L416> {
>
> 417 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L417>   
> .function <http://rajith/lxr/http/ident?sn=vpp-19-08;i=function> = 
> vl_api_clnt_process 
> <http://rajith/lxr/http/ident?sn=vpp-19-08;i=vl_api_clnt_process>,
>
> 418 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L418>   
> .type <http://rajith/lxr/http/ident?sn=vpp-19-08;i=type> = 
> VLIB_NODE_TYPE_PROCESS,
>
> 419 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L419>   
> .name <http://rajith/lxr/http/ident?sn=vpp-19-08;i=name> = 
> *"api-rx-from-ring"*,
>
> 420 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L420>   
> .state <http://rajith/lxr/http/ident?sn=vpp-19-08;i=state> = 
> VLIB_NODE_STATE_DISABLED,
>
> 421 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L421> };
>
> 422 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L422> 
> */* *INDENT-ON* */*
>
> *Complete Backtrace:*
>
> Thread 3 (Thread 0x7ffa511c9700 (LWP 448)):
> ---Type <return> to continue, or q <return> to quit---
> #0  0x00007ffa9f6bc276 in vlib_worker_thread_barrier_check () at 
> /development/libvpp/src/vlib/threads.h:430
> #1  0x00007ffa9f6c3f19 in vlib_main_or_worker_loop (vm=0x7ffa8797adc0, 
> is_main=0) at /development/libvpp/src/vlib/main.c:1757
> #2  0x00007ffa9f6c4fbd in vlib_worker_loop (vm=0x7ffa8797adc0) at 
> /development/libvpp/src/vlib/main.c:1988
> #3  0x00007ffa9f703ff1 in vlib_worker_thread_fn (arg=0x7ffa6ccc8640) at 
> /development/libvpp/src/vlib/threads.c:1803
> #4  0x00007ffa9f383560 in clib_calljmp () from 
> /usr/local/lib/libvppinfra.so.1.0.1
> #5  0x00007ffa511c8ec0 in ?? ()
> #6  0x00007ffa9f6fe588 in vlib_worker_thread_bootstrap_fn 
> (arg=0x7ffa6ccc8640) at /development/libvpp/src/vlib/threads.c:573
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>
> Thread 2 (Thread 0x7ffa519ca700 (LWP 447)):
> #0  0x00007ffaaae87ef7 in sched_yield () at 
> ../sysdeps/unix/syscall-template.S:78
> #1  0x00007ffa9f40fb49 in spin_acquire_lock (sl=0x130046384) at 
> /development/libvpp/src/vppinfra/dlmalloc.c:466
> #2  0x00007ffa9f4173a4 in mspace_malloc (msp=0x130046010, bytes=66) at 
> /development/libvpp/src/vppinfra/dlmalloc.c:4347
> #3  0x00007ffa9f4170de in mspace_get_aligned (msp=0x130046010, 
> n_user_data_bytes=66, align=16, align_offset=8)
>     at /development/libvpp/src/vppinfra/dlmalloc.c:4233
> #4  0x00007ffa9f4036fc in clib_mem_alloc_aligned_at_offset (size=46, align=8, 
> align_offset=8, os_out_of_memory_on_failure=1)
>     at /development/libvpp/src/vppinfra/mem.h:139
> #5  0x00007ffa9f403947 in vec_resize_allocate_memory (v=0x0, 
> length_increment=38, data_bytes=46, header_bytes=8, data_align=8)
>     at /development/libvpp/src/vppinfra/vec.c:59
> #6  0x00007ffa9f370aa5 in _vec_resize_inline (v=0x0, length_increment=38, 
> data_bytes=38, header_bytes=0, data_align=1)
>     at /development/libvpp/src/vppinfra/vec.h:147
> #7  0x00007ffa9f371a93 in do_percent (_s=0x7ffa6f6e4b58, fmt=0x7ffa9f4244e8 
> "%s:%d (%s) assertion `%s' fails", va=0x7ffa6f6e4c08)
>     at /development/libvpp/src/vppinfra/format.c:341
> #8  0x00007ffa9f371edb in va_format (s=0x0, fmt=0x7ffa9f4244e8 "%s:%d (%s) 
> assertion `%s' fails", va=0x7ffa6f6e4c08)
>     at /development/libvpp/src/vppinfra/format.c:404
> #9  0x00007ffa9f3629d0 in _clib_error (how_to_die=2, function_name=0x0, 
> line_number=0, fmt=0x7ffa9f4244e8 "%s:%d (%s) assertion `%s' fails")
>     at /development/libvpp/src/vppinfra/error.c:127
> #10 0x00007ffa9f370a3c in _vec_resize_inline (v=0x7ffa6dd00470, 
> length_increment=16, data_bytes=16, header_bytes=0, data_align=1)
>     at /development/libvpp/src/vppinfra/vec.h:136
> #11 0x00007ffa9f371ea2 in va_format (s=0x7ffa6dd00470 "received signal 
> SIGABRT, PC 0x7ffaaadc2fb7",
>     fmt=0x7ffa9f74923b "received signal %U, PC %U", va=0x7ffa6f6e4e08) at 
> /development/libvpp/src/vppinfra/format.c:403
> #12 0x00007ffa9f372048 in format (s=0x7ffa6dd00470 "received signal SIGABRT, 
> PC 0x7ffaaadc2fb7",
>     fmt=0x7ffa9f74923b "received signal %U, PC %U") at 
> /development/libvpp/src/vppinfra/format.c:428
> #13 0x00007ffa9f72adf5 in unix_signal_handler (signum=7, si=0x7ffa6f6e50f0, 
> uc=0x7ffa6f6e4fc0)
>     at /development/libvpp/src/vlib/unix/main.c:127
> #14 <signal handler called>
> #15 0x00007ffa9f417c03 in mspace_malloc (msp=0x130046010, bytes=77) at 
> /development/libvpp/src/vppinfra/dlmalloc.c:4437
> ---Type <return> to continue, or q <return> to quit---
> #16 0x00007ffa9f416f6f in mspace_get_aligned (msp=0x130046010, 
> n_user_data_bytes=77, align=1, align_offset=0)
>     at /development/libvpp/src/vppinfra/dlmalloc.c:4186
> #17 0x00007ffaa0c7d04f in clib_mem_alloc_aligned_at_offset (size=73, align=1, 
> align_offset=0, os_out_of_memory_on_failure=1)
>     at /development/libvpp/src/vppinfra/mem.h:139
> #18 0x00007ffaa0c7d0a2 in clib_mem_alloc (size=73) at 
> /development/libvpp/src/vppinfra/mem.h:155
> #19 0x00007ffaa0c7da0a in vl_msg_api_alloc_internal (nbytes=73, pool=0, 
> may_return_null=0)
>     at /development/libvpp/src/vlibmemory/memory_shared.c:177
> #20 0x00007ffaa0c7db6f in vl_msg_api_alloc_as_if_client (nbytes=57) at 
> /development/libvpp/src/vlibmemory/memory_shared.c:236
> #21 0x00007ffaa0c994fd in vl_api_rpc_call_main_thread_inline 
> (fp=0x7ffa9ffee83c <ip6_neighbor_set_unset_rpc_callback>,
>     data=0x7ffa6f6e58c0 "\001", data_length=32, force_rpc=0 '\000') at 
> /development/libvpp/src/vlibmemory/vlib_api.c:578
> #22 0x00007ffaa0c99673 in vl_api_rpc_call_main_thread (fp=0x7ffa9ffee83c 
> <ip6_neighbor_set_unset_rpc_callback>, data=0x7ffa6f6e58c0 "\001",
>     data_length=32) at /development/libvpp/src/vlibmemory/vlib_api.c:602
> #23 0x00007ffa9ffebb32 in set_unset_ip6_neighbor_rpc (vm=0x7ffa8797a640, 
> sw_if_index=53, a=0x7ffa6f6e5c18, mac=0x7ffa6f6e5c34, is_add=1,
>     flags=IP_NEIGHBOR_FLAG_NONE) at 
> /development/libvpp/src/vnet/ip/ip6_neighbor.c:441
> #24 0x00007ffa9ffed406 in vnet_set_ip6_ethernet_neighbor (vm=0x7ffa8797a640, 
> sw_if_index=53, a=0x7ffa6f6e5c18, mac=0x7ffa6f6e5c34,
>     flags=IP_NEIGHBOR_FLAG_NONE) at 
> /development/libvpp/src/vnet/ip/ip6_neighbor.c:809
> #25 0x00007ffa9fff2226 in icmp6_router_advertisement (vm=0x7ffa8797a640, 
> node=0x7ffa890f65c0, frame=0x7ffa88d14380)
>     at /development/libvpp/src/vnet/ip/ip6_neighbor.c:2298
> #26 0x00007ffa9f6c2011 in dispatch_node (vm=0x7ffa8797a640, 
> node=0x7ffa890f65c0, type=VLIB_NODE_TYPE_INTERNAL,
>     dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7ffa88d14380, 
> last_time_stamp=203681032322150)
>     at /development/libvpp/src/vlib/main.c:1213
> #27 0x00007ffa9f6c27d2 in dispatch_pending_node (vm=0x7ffa8797a640, 
> pending_frame_index=27, last_time_stamp=203681032322150)
>     at /development/libvpp/src/vlib/main.c:1381
> #28 0x00007ffa9f6c446c in vlib_main_or_worker_loop (vm=0x7ffa8797a640, 
> is_main=0) at /development/libvpp/src/vlib/main.c:1839
> #29 0x00007ffa9f6c4fbd in vlib_worker_loop (vm=0x7ffa8797a640) at 
> /development/libvpp/src/vlib/main.c:1988
> #30 0x00007ffa9f703ff1 in vlib_worker_thread_fn (arg=0x7ffa6ccc8540) at 
> /development/libvpp/src/vlib/threads.c:1803
> #31 0x00007ffa9f383560 in clib_calljmp () from 
> /usr/local/lib/libvppinfra.so.1.0.1
>
> #32 0x00007ffa519c9ec0 in ?? ()
> #33 0x00007ffa9f6fe588 in vlib_worker_thread_bootstrap_fn 
> (arg=0x7ffa6ccc8540) at /development/libvpp/src/vlib/threads.c:573
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>
> Thread 1 (Thread 0x7ffaabd62d00 (LWP 257)):
> #0  0x00007ffaaae68492 in __GI___waitpid (pid=17980, 
> stat_loc=stat_loc@entry=0x7ffa6d60fe18, options=options@entry=0)
>     at ../sysdeps/unix/sysv/linux/waitpid.c:30
> #1  0x00007ffaaadd3177 in do_system (line=<optimized out>) at 
> ../sysdeps/posix/system.c:149
> #2  0x00007ffaab8f5457 in bd_signal_handler_cb (signo=6) at 
> /development/librtbrickinfra/bd/src/bd.c:770
> #3  0x00007ffa9f72abf7 in rtb_bd_signal_handler (signo=6) at 
> /development/libvpp/src/vlib/unix/main.c:80
> ---Type <return> to continue, or q <return> to quit---
> #4  0x00007ffa9f72af92 in unix_signal_handler (signum=6, si=0x7ffa6d6106f0, 
> uc=0x7ffa6d6105c0)
>     at /development/libvpp/src/vlib/unix/main.c:180
> #5  <signal handler called>
> #6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #7  0x00007ffaaadc4921 in __GI_abort () at abort.c:79
> #8  0x00007ffaa0ed8086 in os_panic () at 
> /development/libvpp/src/vpp/vnet/main.c:559
> #9  0x00007ffa9f703342 in vlib_worker_thread_barrier_sync_int 
> (vm=0x7ffa9f964e80 <vlib_global_main>,
>     func_name=0x7ffaa105c2e0 <__FUNCTION__.40056> "vl_msg_api_barrier_sync") 
> at /development/libvpp/src/vlib/threads.c:1489
> #10 0x00007ffaa0ed80ed in vl_msg_api_barrier_sync () at 
> /development/libvpp/src/vpp/vnet/main.c:599
> #11 0x00007ffaa0c7bd52 in vl_mem_api_handle_rpc (vm=0x7ffa9f964e80 
> <vlib_global_main>, node=0x7ffa6d608000)
>     at /development/libvpp/src/vlibmemory/memory_api.c:757
> #12 0x00007ffaa0c981f5 in vl_api_clnt_process (vm=0x7ffa9f964e80 
> <vlib_global_main>, node=0x7ffa6d608000, f=0x0)
>     at /development/libvpp/src/vlibmemory/vlib_api.c:326
> #13 0x00007ffa9f6c2a57 in vlib_process_bootstrap (_a=140713942780864) at 
> /development/libvpp/src/vlib/main.c:1476
> #14 0x00007ffa9f383560 in clib_calljmp () from 
> /usr/local/lib/libvppinfra.so.1.0.1
> #15 0x00007ffa84930b90 in ?? ()
> #16 0x00007ffa9f6c2b5c in vlib_process_startup (vm=0xffffffffffffffff, 
> p=0xdf00000000, f=0x7ffa6d608000)
>     at /development/libvpp/src/vlib/main.c:1498
> #17 0x0000000000000000 in ?? ()
> (gdb)
>
>
>
> Thanks,
>
> Rajith
>
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18251): https://lists.fd.io/g/vpp-dev/message/18251
Mute This Topic: https://lists.fd.io/mt/78681829/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to