Hello,

We're facing a VPP crash where our dataplane agent is disconnecting and
reconnecting to the VPP API socket when we are performing upgrades.  To
expedite the crash, we are sending the dataplane agent process into a
constant restart loop where we create a new client session between the API,
begin subscribing to l2fib events, unsubscribe and then close off the API
socket.

I have created a Gerritt with a suspected patch (
https://gerrit.fd.io/r/c/vpp/+/44561) to check that the vl_input_queue is
not uninitialized prior to passing it to vl_mem_api_can_send.  I will also
provide an example client to reproduce this crash.

Thanks

Catching the fault in gdb:

Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault.
vl_mem_api_can_send
(
q
=0x0) at
/root/vpp/src/vlibapi/memory_shared.c
:781 781 /root/vpp/src/vlibapi/memory_shared.c: No such file or directory.
(gdb) (gdb) (gdb) (gdb) bt full #0
vl_mem_api_can_send
(
q
=0x0) at
/root/vpp/src/vlibapi/memory_shared.c
:781 No locals. #1
0x00007fa5ff04bbf4
in
vl_api_can_send_msg
(
rp
=0x7fa544fc4440) at
/root/vpp/src/vlibmemory/api.h
:39 No locals.
#2
l2fib_scan
(
vm=vm@entry
=0x7fa540000ac0,
start_time
=<optimized out>,
event_only=event_only@entry
=1 '\001') at
/root/vpp/src/vnet/l2/l2_fib.c
:1262
bd_learn_counts
=
0x7fa545039dd8
last_start
=
<optimized out>
accum_t
=
<optimized out>
delta_t
=
<optimized out>
evt_idx
=
<optimized out>
learn_count
=
<optimized out>
--Type <RET> for more, q to quit, c to continue without paging--c
lm
=
<optimized out>
client
= 1
cl_idx
= 33554560
mp
=
0x1301c3970
reg
=
0x7fa544fc4440
fm
=
<optimized out>
i
=
<optimized out>
j
=
<optimized out>
k
=
<optimized out>
bd_index
= 3
h
=
<optimized out>
#3
0x00007fa5ff0475c1
in
l2fib_mac_age_scanner_process
(
vm
=0x7fa540000ac0,
rt
=<optimized out>,
f
=<optimized out>) at
/root/vpp/src/vnet/l2/l2_fib.c
:1334
scan
=
<optimized out>
SCAN_MAC_AGE
=
SCAN_MAC_AGE SCAN_MAC_EVENT
=
SCAN_MAC_EVENT SCAN_DISABLE
=
SCAN_DISABLE event_data
=
0x7fa544f5fb18
enabled
=
<optimized out>
next_age_scan_time
=
<optimized out>
event_type
= 140347804209272
start_time
= 6.5254760102106957e-06
fm
=
<optimized out>
lm
=
<optimized out>
#4
0x00007fa5fedc6eb7
in
vlib_process_bootstrap
(
_a
=<optimized out>) at
/root/vpp/src/vlib/main.c
:1162
a
=
<optimized out>
vm
=
0x7fa54503d078
p
=
0x7fa5408180c0
f
=
0x3
node
=
0x7fa5408180c0
n
=
<optimized out>
#5
0x00007fa5fed46d88
in
clib_calljmp
() at
/root/vpp/src/vppinfra/longjmp.S
:123 No locals. #6
0x00007fa5fcc15dc0
in
??
() No symbol table info available. #7
0x00007fa5fedc2901
in
vlib_process_startup
(
vm
=0x7fa540000ac0,
p
=0x7fa5408180c0,
f
=0x0) at
/root/vpp/src/vlib/main.c
:1187
a
=
<error reading variable a (Cannot access memory at address 0x7fa5fa8fc020)>
r
=
<optimized out>
#8
dispatch_process
(
vm
=0x7fa540000ac0,
p
=0x7fa5408180c0,
f
=0x0,
last_time_stamp
=
<error reading variable: Cannot access memory at address 0x7fa5fa8fc010>
) at
/root/vpp/src/vlib/main.c
:1259
nm
=
0x7fa540000c18
node_runtime
=
0x7fa5408180c0
node
=
0x7fa540817f60
t
=
<error reading variable t (Cannot access memory at address 0x7fa5fa8fc010)>
old_process_index
= 4294967295
n_vectors
=
<optimized out>
is_suspend
=
<optimized out>
Backtrace stopped: Cannot access memory at address 0x7fa5fa8fc068

With CLIB_DEBUG enabled:

vpp[42]: received signal SIGSEGV, PC 0x7f1e42a455c0, faulting address 0x60
vpp[42]: Code: 8b 4f 60 31 c0 3b 4f 64 0f 9c c0 c3 0f 1f 40 00 41 56 53 50
vpp[42]: #0 0x00007f1e42a455c0 vl_mem_api_can_send + 0x0 vpp[42]: from
/lib/x86_64-linux-gnu/libvlibapi.so.26.02 vpp[42]: #1 0x00007f1e42bf6bf4
l2fib_init + 0x31c4 vpp[42]: from /lib/x86_64-linux-gnu/libvnet.so.26.02
vpp[42]: #2 0x00007f1e42bf25c1 get_mac_table + 0x171 vpp[42]: from
/lib/x86_64-linux-gnu/libvnet.so.26.02 vpp[42]: #3 0x00007f1e42971eb7
vlib_exit_with_status + 0xb37 vpp[42]: from
/lib/x86_64-linux-gnu/libvlib.so.26.02 vpp[42]: #4 0x00007f1e428f1d88
clib_calljmp + 0x18 vpp[42]: from
/lib/x86_64-linux-gnu/libvppinfra.so.26.02 [1]+ Aborted (core dumped)
/usr/bin/vpp -c "${RUNTIME_DIR}/vpp.conf"
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#26705): https://lists.fd.io/g/vpp-dev/message/26705
Mute This Topic: https://lists.fd.io/mt/117139016/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/14379924/21656/631435203/xyzzy 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to