Hoi,
I've had a few instances of a recent VPP hanging - API and CLI go
unresponsive, forwarding stops (at least, I think), but the worker threads
are still consuming CPU.
Attaching GDB, I see the main thread is doing the following:
(gdb) bt
#0 0x00007f5f6f8f271b in sched_yield () at
../sysdeps/unix/syscall-template.S:78
#1 0x00007f5f6fb3df8b in spin_acquire_lock (sl=<optimized out>) at
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:468
#2 mspace_malloc (msp=0x130048040, bytes=72) at
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:4351
#3 0x00007f5f6fb66f81 in mspace_memalign (msp=0x130048040,
alignment=<optimized out>, bytes=72) at
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:4667
#4 clib_mem_heap_alloc_inline (heap=<optimized out>, size=72,
align=<optimized out>, os_out_of_memory_on_failure=1) at
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:608
#5 clib_mem_heap_alloc_aligned (heap=<optimized out>, size=72, align=8) at
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:664
#6 0x00007f5f6fba5157 in _vec_alloc_internal (n_elts=64, attr=<optimized
out>) at /home/pim/src/vpp/src/vppinfra/vec.c:35
#7 0x00007f5f6fb848c8 in _vec_resize (vp=<optimized out>, n_add=64,
hdr_sz=0, align=8, elt_sz=<optimized out>) at
/home/pim/src/vpp/src/vppinfra/vec.h:256
#8 serialize_vector_write (m=<optimized out>, s=0x7f5f0dbfebc0) at
/home/pim/src/vpp/src/vppinfra/serialize.c:908
#9 0x00007f5f6fb843c1 in serialize_write_not_inline (m=0x7f5f0dbfeb60,
s=<optimized out>, n_bytes_to_write=4, flags=<optimized out>) at
/home/pim/src/vpp/src/vppinfra/serialize.c:734
#10 0x00007f5f6fe5a053 in serialize_stream_read_write
(header=0x7f5f0dbfeb60, s=<optimized out>, n_bytes=4, flags=2) at
/home/pim/src/vpp/src/vppinfra/serialize.h:140
#11 serialize_get (m=0x7f5f0dbfeb60, n_bytes=4) at
/home/pim/src/vpp/src/vppinfra/serialize.h:180
#12 serialize_integer (m=0x7f5f0dbfeb60, x=<optimized out>, n_bytes=4) at
/home/pim/src/vpp/src/vppinfra/serialize.h:187
#13 vl_api_serialize_message_table (am=0x7f5f6fe66258 <api_global_main>,
vector=<optimized out>) at /home/pim/src/vpp/src/vlibapi/api_shared.c:210
#14 0x00007f5f6fe5a715 in vl_msg_api_trace_save (am=0x130048040,
which=<optimized out>, fp=0x13f0690, is_json=27 '\033') at
/home/pim/src/vpp/src/vlibapi/api_shared.c:410
#15 0x00007f5f6fe5c0ea in vl_msg_api_post_mortem_dump () at
/home/pim/src/vpp/src/vlibapi/api_shared.c:880
#16 0x00000000004068c6 in os_panic () at
/home/pim/src/vpp/src/vpp/vnet/main.c:415
#17 0x00007f5f6fb3feed in mspace_free (msp=0x130048040, mem=<optimized
out>) at /home/pim/src/vpp/src/vppinfra/dlmalloc.c:2954
#18 0x00007f5f6fb6bf8c in clib_mem_heap_free (heap=0x0, p=<optimized out>)
at /home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:768
#19 clib_mem_free (p=<optimized out>) at
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:774
#20 0x00007f5f2fa32b40 in ?? ()
#21 0x00007f5f3302f848 in ?? ()
#22 0x0000000000000000 in ?? ()
When I kill VPP, sometimes an api_post_mortem is emitted (although most of
the time they are empty), but subsequently trying to dump it, makes VPP
crash -
-rw------- 1 ipng ipng 35437 Jan 8 19:08 api_post_mortem.76724
-rw------- 1 ipng ipng 35368 Jan 8 19:08 api_post_mortem.76842
-rw------- 1 ipng ipng 0 Jan 8 19:08 api_post_mortem.76978
-rw------- 1 ipng ipng 0 Jan 8 19:08 api_post_mortem.84008
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff7fada5f in vl_msg_print_trace (msg=0x7fff9db73bd8 "",
ctx=0x7fff53b62ca0)
at /home/pim/src/vpp/src/vlibmemory/vlib_api_cli.c:693
#2 0x00007ffff66a55bb in vl_msg_traverse_trace (tp=0x7fff9b4e7998,
fn=0x7ffff7fad790
<vl_msg_print_trace>, ctx=0x7fff53b62ca0)
at /home/pim/src/vpp/src/vlibapi/api_shared.c:321
#3 0x00007ffff7fab854 in api_trace_command_fn (vm=0x7fff96000700,
input=0x7fff53b62f30,
cmd=<optimized out>)
at /home/pim/src/vpp/src/vlibmemory/vlib_api_cli.c:727
#4 0x00007ffff647fdad in vlib_cli_dispatch_sub_commands (vm=0x7fff96000700,
cm=<optimized out>, input=0x7fff53b62f30,
parent_command_index=<optimized out>) at
/home/pim/src/vpp/src/vlib/cli.c:650
#5 0x00007ffff647fb91 in vlib_cli_dispatch_sub_commands (vm=0x7fff96000700,
cm=<optimized out>, input=0x7fff53b62f30,
parent_command_index=<optimized out>) at
/home/pim/src/vpp/src/vlib/cli.c:607
#6 0x00007ffff647f0cd in vlib_cli_input (vm=0x7fff96000700,
input=0x7fff53b62f30,
function=<optimized out>, function_arg=<optimized out>)
at /home/pim/src/vpp/src/vlib/cli.c:753
#7 0x00007ffff64fd5c7 in unix_cli_process_input (cm=<optimized out>,
cli_file_index=0) at /home/pim/src/vpp/src/vlib/unix/cli.c:2616
#8 unix_cli_process (vm=<optimized out>, rt=0x7fff9b69bdc0, f=<optimized
out>) at /home/pim/src/vpp/src/vlib/unix/cli.c:2745
#9 0x00007ffff64a7837 in vlib_process_bootstrap (_a=<optimized out>) at
/home/pim/src/vpp/src/vlib/main.c:1221
#10 0x00007ffff63f9d94 in clib_calljmp () at
/home/pim/src/vpp/src/vppinfra/longjmp.S:123
#11 0x00007fff94700b00 in ?? ()
#12 0x00007ffff649f3d0 in vlib_process_startup (vm=0x7fff96000700,
p=0x7fff9b69bdc0,
f=0x0) at /home/pim/src/vpp/src/vlib/main.c:1246
#13 dispatch_process (vm=0x7fff96000700, p=0x7fff9b69bdc0, f=0x0,
last_time_stamp=<optimized out>) at /home/pim/src/vpp/src/vlib/main.c:1302
#14 0x0000000000000000 in ?? ()
Has anybody else seen API calls seemingly hang the VPP instance? Is there
an alternative way to pry loose the information in api_post_mortem.* files
? Or any other clues where to narrow down the issue?
It's a rare issue (running a dozen or so instances with 6mo+ of uptime, and
one of them had this hang/crash a few times in a row this week).
--
Pim van Pelt <[email protected]>
PBVP1-RIPE - http://www.ipng.nl/
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22432): https://lists.fd.io/g/vpp-dev/message/22432
Mute This Topic: https://lists.fd.io/mt/96136395/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-