Re: [vpp-dev] VPP LCP: IS-IS does not work

2023-02-01 Thread Stanislav Zaikin
Hi Gennady,

Could you execute the following commands in gdb and show the output?
f 9
info locals
p lipi
p *lip

On Tue, 31 Jan 2023 at 11:16, Gennady Abramov  wrote:

> Hello Stanislav,
>
> The is-is itself is working, thank you!
> tn3# show isis neighbor
> Area myisis:
>   System Id   Interface   L  StateHoldtime SNPA
>  tn1 Ten0.1914   3  Up28   2020.2020.2020
>
> Unfortunately, both lcp lcp-auto-subint and lcp lcp-sync still looks
> broken. Note, I've applied your patches to 22.10 version as master branch
> was not stable enough; so if it is needed, I can also test on master.
> 1. LCP auto-subint:
> DBGvpp# set interface state TenGigabitEthernet1c/0/1 up
> DBGvpp# lcp lcp-auto-subint on
> DBGvpp# lcp lcp-
> lcp-auto-subint  lcp-sync
> DBGvpp# lcp lcp-sync on
> DBGvpp# lcp create 1 host-if Ten0
> DBGvpp# show lcp
> lcp default netns ''
> lcp lcp-auto-subint on
> lcp lcp-sync on
> lcp del-static-on-link-down off
> lcp del-dynamic-on-link-down off
> itf-pair: [0] TenGigabitEthernet1c/0/1 tap1 Ten0 1304 type tap
> DBGvpp#
> Then VPP crashes:
>
> Jan 31 10:05:57 tn3 vnet[1233293]:
> /home/abramov/vpp-p3-lcp/src/vnet/interface_funcs.h:60
> (vnet_get_sw_interface) assertion `! pool_is_free
> (vnm->interface_main.sw_interfaces, _e)' fails
> Jan 31 10:05:57 tn3 systemd-udevd[1233343]: ethtool: autonegotiation is
> unset or enabled, the speed and duplex are not writable.
> Jan 31 10:05:57 tn3 vnet[1233293]: received signal SIGABRT, PC
> 0x7f81e45b800b
> Jan 31 10:05:57 tn3 systemd-udevd[1233343]: Using default interface naming
> scheme 'v245'.
> Jan 31 10:05:57 tn3 vnet[1233293]: #0  0x7f81e4ab1c92
> unix_signal_handler + 0x1f2
> Jan 31 10:05:57 tn3 vnet[1233293]: #1  0x7f81e49af420 0x7f81e49af420
> Jan 31 10:05:57 tn3 vnet[1233293]: #2  0x7f81e45b800b gsignal + 0xcb
> Jan 31 10:05:57 tn3 vnet[1233293]: #3  0x7f81e4597859 abort + 0x12b
> Jan 31 10:05:57 tn3 vnet[1233293]: #4  0x004072f3 0x4072f3
> Jan 31 10:05:57 tn3 vnet[1233293]: #5  0x7f81e48e9109 debugger + 0x9
> Jan 31 10:05:57 tn3 vnet[1233293]: #6  0x7f81e48e8eca _clib_error +
> 0x2da
> Jan 31 10:05:57 tn3 vnet[1233293]: #7  0x7f81e4c94f68
> vnet_get_sw_interface + 0xa8
> Jan 31 10:05:57 tn3 vnet[1233293]: #8  0x7f81e4c94f9b
> vnet_get_sup_sw_interface + 0x1b
> Jan 31 10:05:57 tn3 vnet[1233293]: #9  0x7f81e4c9500b
> vnet_get_sup_hw_interface + 0x1b
> Jan 31 10:05:57 tn3 vnet[1233293]: #10 0x7f81e4c98bca
> vnet_create_sub_interface + 0x5a
> Jan 31 10:05:57 tn3 vnet[1233293]: #11 0x7f819ce8db6a
> lcp_router_link_add + 0x5ea
> Jan 31 10:05:57 tn3 vnet[1233293]: #12 0x7f819ce98fc3 nl_link_add +
> 0xd3
> Jan 31 10:05:57 tn3 vnet[1233293]: #13 0x7f819ce986a0
> nl_route_dispatch + 0xe0
> Jan 31 10:05:57 tn3 vnet[1233293]: #14 0x7f819cf29f52 0x7f819cf29f52
>
>
> Thread 1 "vpp_main" received signal SIGABRT, Aborted.
> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> 50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x76b0c859 in __GI_abort () at abort.c:79
> #2  0x004072f3 in os_panic () at
> /home/abramov/vpp-p3-lcp/src/vpp/vnet/main.c:417
> #3  0x76e5e109 in debugger () at
> /home/abramov/vpp-p3-lcp/src/vppinfra/error.c:84
> #4  0x76e5deca in _clib_error (how_to_die=2, function_name=0x0,
> line_number=0, fmt=0x77cc8208 "%s:%d (%s) assertion `%s' fails") at
> /home/abramov/vpp-p3-lcp/src/vppinfra/error.c:143
> #5  0x7720af68 in vnet_get_sw_interface (vnm=0x77f696e8
> , sw_if_index=1650550633) at
> /home/abramov/vpp-p3-lcp/src/vnet/interface_funcs.h:60
> #6  0x7720af9b in vnet_get_sup_sw_interface (vnm=0x77f696e8
> , sw_if_index=1650550633) at
> /home/abramov/vpp-p3-lcp/src/vnet/interface_funcs.h:83
> #7  0x7720b00b in vnet_get_sup_hw_interface (vnm=0x77f696e8
> , sw_if_index=1650550633) at
> /home/abramov/vpp-p3-lcp/src/vnet/interface_funcs.h:94
> #8  0x7720ebca in vnet_create_sub_interface
> (sw_if_index=1650550633, id=1914, flags=18, inner_vlan_id=0,
> outer_vlan_id=1914, sub_sw_if_index=0x7fffac7ad980) at
> /home/abramov/vpp-p3-lcp/src/vnet/ethernet/interface.c:1063
> #9  0x7fffaf404b6a in lcp_router_link_add (rl=0x5a3450,
> ctx=0x7fffbb9b18c8) at
> /home/abramov/vpp-p3-lcp/src/plugins/linux-cp/lcp_router.c:423
> #10 0x7fffaf40ffc3 in nl_link_add (rl=0x5a3450, arg=0x7fffbb9b18c8) at
> /home/abramov/vpp-p3-lcp/src/plugins/linux-cp/lcp_nl.c:280
> #11 0x7fffaf40f6a0 in nl_route_dispatch (obj=0x5a3450,
> arg=0x7fffbb9b18c8) at
> /home/abramov/vpp-p3-lcp/src/plugins/linux-cp/lcp_nl.c:323
> #12 0x7fffaf4a0f52 in ?? () from /lib/x86_64-linux-gnu/libnl-3.so.200
> #13 0x7fffaf441990 in ?? () from
> /lib/x86_64-linux-gnu/libnl-route-3.so.200
> #14 0x7fffaf49db52 in nl_cache_parse () from
> /lib/x86_64-linux-gnu/libnl-3.so.2

[vpp-dev] sigsegv and its handler

2023-02-01 Thread Stanislav Zaikin
Hello folks,

I've been experiencing rare crashes (one crash in 3 months or so), it looks
like the heap is corrupted somehow. Sometimes, the trace shows very
unexpected nodes (like ip6-map-t although I don't configure any ipv6 map)
or sometimes it's just a crash inside ip4-rewrite-node.

After a look I found that last 2 crashes occured in the same way:
1. vnet_feature_arc_start_w_cfg_index or vnet_feature_arc_start call
2. vnet_get_config_data call

But then VPP received and handled a SIGSEGV signal. It completely broke the
stack trace in the core dump (for the corresponding worker):
#0  0x7f44fa0812c6 in __GI_epoll_pwait (epfd=8, events=0x7f44babe52d8,
maxevents=, timeout=9, set=0x7f44fa5c66f8
) at
../sysdeps/unix/sysv/linux/epoll_pwait.c:42
#1  0x00089f6fab2b in ?? ()
#2  0x7f44babe52d8 in ?? ()
#3  0x00090100 in ?? ()
#4  0x7f44fa5c66f8 in _vlib_init_function_init_linux_epoll_input_init
() from /lib/x86_64-linux-gnu/libvlib.so.22.10.0
#5  0x in ?? ()

So, I can't analyze the core dump. Any ideas on how to catch this crash
correctly? Disable receiving SIGSEGV? Or is there a way to restore the
original stack trace of the worker?

For the reference, stack traces from syslog:
vnet[2856086]: received signal SIGSEGV, PC 0x7f44b76dbee3, faulting address
0xb0040114
vnet[2856086]: #0  0x7f44fa43885b 0x7f44fa43885b
(unix_signal_handler+379)
vnet[2856086]: #1  0x7f44fa34f3c0 0x7f44fa34f3c0 (__funlockfile)
vnet[2856086]: #2  0x7f44b76dbee3 0x7f44b76dbee3 (ip6_map_t+675)
vnet[2856086]: #3  0x7f44fa3c86fb vlib_worker_loop + 0x1b3b
vnet[2856086]: #4  0x7f44fa41aafa vlib_worker_thread_fn + 0xaa
vnet[2856086]: #5  0x7f44fa414e01 vlib_worker_thread_bootstrap_fn + 0x51
vnet[2856086]: #6  0x7f44fa343609 start_thread + 0xd9
vnet[2856086]: #7  0x7f44fa081163 clone + 0x43

vnet[944491]: received signal SIGSEGV, PC 0x7faf922ca6ae, faulting address
0x7fb3519530fc
vnet[944491]: #0  0x7faf9102785b 0x7faf9102785b
vnet[944491]: #1  0x7faf90f3e3c0 0x7faf90f3e3c0
vnet[944491]: #2  0x7faf922ca6ae ip4_rewrite_node_fn_skx + 0x149e
vnet[944491]: #3  0x7faf90fb76fb vlib_worker_loop + 0x1b3b
vnet[944491]: #4  0x7faf91009afa vlib_worker_thread_fn + 0xaa
vnet[944491]: #5  0x7faf91003e01 vlib_worker_thread_bootstrap_fn + 0x51
vnet[944491]: #6  0x7faf90f32609 start_thread + 0xd9
vnet[944491]: #7  0x7faf90c70163 clone + 0x43

Line information:
Line 135 of "/home/runner/work/vpp/vpp/src/vnet/config.h" starts at address
0x7f44b76dbee3  and ends at 0x7f44b76dbee7 .

Line 135 of "/home/runner/work/vpp/vpp/src/vnet/config.h" starts at address
0x7f44fb6db6ae  and ends at 0x7f44fb6db6b1
.

-- 
Best regards
Stanislav Zaikin

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22531): https://lists.fd.io/g/vpp-dev/message/22531
Mute This Topic: https://lists.fd.io/mt/96673497/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] sigsegv and its handler

2023-02-01 Thread Guangming
Hi , Stanislav Zaikin

  May be memroy overwrite issue, especial vlib buffer.   Becasue VPP use vector 
packet processing, not scalar packet processing, so you can see
a unexpected nodes even if not a stack corrupted .
  You can use debug version with CLIB_DEBUG>0 to location the root cause. 


zhangguangm...@baicells.com
 
From: Stanislav Zaikine 
Date: 2023-02-01 19:17
To: vpp-dev
Subject: [vpp-dev] sigsegv and its handler
Hello folks,

I've been experiencing rare crashes (one crash in 3 months or so), it looks 
like the heap is corrupted somehow. Sometimes, the trace shows very unexpected 
nodes (like ip6-map-t although I don't configure any ipv6 map) or sometimes 
it's just a crash inside ip4-rewrite-node.

After a look I found that last 2 crashes occured in the same way:
1. vnet_feature_arc_start_w_cfg_index or vnet_feature_arc_start call
2. vnet_get_config_data call

But then VPP received and handled a SIGSEGV signal. It completely broke the 
stack trace in the core dump (for the corresponding worker):
#0  0x7f44fa0812c6 in __GI_epoll_pwait (epfd=8, events=0x7f44babe52d8, 
maxevents=, timeout=9, set=0x7f44fa5c66f8 
) at 
../sysdeps/unix/sysv/linux/epoll_pwait.c:42
#1  0x00089f6fab2b in ?? ()
#2  0x7f44babe52d8 in ?? ()
#3  0x00090100 in ?? ()
#4  0x7f44fa5c66f8 in _vlib_init_function_init_linux_epoll_input_init () 
from /lib/x86_64-linux-gnu/libvlib.so.22.10.0
#5  0x in ?? ()

So, I can't analyze the core dump. Any ideas on how to catch this crash 
correctly? Disable receiving SIGSEGV? Or is there a way to restore the original 
stack trace of the worker?

For the reference, stack traces from syslog:
vnet[2856086]: received signal SIGSEGV, PC 0x7f44b76dbee3, faulting address 
0xb0040114
vnet[2856086]: #0  0x7f44fa43885b 0x7f44fa43885b (unix_signal_handler+379)
vnet[2856086]: #1  0x7f44fa34f3c0 0x7f44fa34f3c0 (__funlockfile)
vnet[2856086]: #2  0x7f44b76dbee3 0x7f44b76dbee3 (ip6_map_t+675)
vnet[2856086]: #3  0x7f44fa3c86fb vlib_worker_loop + 0x1b3b
vnet[2856086]: #4  0x7f44fa41aafa vlib_worker_thread_fn + 0xaa
vnet[2856086]: #5  0x7f44fa414e01 vlib_worker_thread_bootstrap_fn + 0x51
vnet[2856086]: #6  0x7f44fa343609 start_thread + 0xd9
vnet[2856086]: #7  0x7f44fa081163 clone + 0x43

vnet[944491]: received signal SIGSEGV, PC 0x7faf922ca6ae, faulting address 
0x7fb3519530fc
vnet[944491]: #0  0x7faf9102785b 0x7faf9102785b
vnet[944491]: #1  0x7faf90f3e3c0 0x7faf90f3e3c0
vnet[944491]: #2  0x7faf922ca6ae ip4_rewrite_node_fn_skx + 0x149e
vnet[944491]: #3  0x7faf90fb76fb vlib_worker_loop + 0x1b3b
vnet[944491]: #4  0x7faf91009afa vlib_worker_thread_fn + 0xaa
vnet[944491]: #5  0x7faf91003e01 vlib_worker_thread_bootstrap_fn + 0x51
vnet[944491]: #6  0x7faf90f32609 start_thread + 0xd9
vnet[944491]: #7  0x7faf90c70163 clone + 0x43

Line information:
Line 135 of "/home/runner/work/vpp/vpp/src/vnet/config.h" starts at address 
0x7f44b76dbee3  and ends at 0x7f44b76dbee7 .

Line 135 of "/home/runner/work/vpp/vpp/src/vnet/config.h" starts at address 
0x7f44fb6db6ae  and ends at 0x7f44fb6db6b1 
.

-- 
Best regards
Stanislav Zaikin

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22532): https://lists.fd.io/g/vpp-dev/message/22532
Mute This Topic: https://lists.fd.io/mt/96673497/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] sigsegv and its handler

2023-02-01 Thread Dave Barach
This seems consistent with a SIGSEGV compounded by a worker-thread stack 
overflow situation. In hopes of obtaining a clean core file, you might want to 
modify the SIGSEGV handler to simply abort() instead of trying to write a 
post-mortem API dump, syslog’ing a backtrace, etc.

 

Best of luck with it.  

 

From: vpp-dev@lists.fd.io  On Behalf Of Stanislav Zaikin
Sent: Wednesday, February 1, 2023 6:17 AM
To: vpp-dev 
Subject: [vpp-dev] sigsegv and its handler

 

Hello folks,

 

I've been experiencing rare crashes (one crash in 3 months or so), it looks 
like the heap is corrupted somehow. Sometimes, the trace shows very unexpected 
nodes (like ip6-map-t although I don't configure any ipv6 map) or sometimes 
it's just a crash inside ip4-rewrite-node.

 

After a look I found that last 2 crashes occured in the same way:

1. vnet_feature_arc_start_w_cfg_index or vnet_feature_arc_start call

2. vnet_get_config_data call

 

But then VPP received and handled a SIGSEGV signal. It completely broke the 
stack trace in the core dump (for the corresponding worker):

#0  0x7f44fa0812c6 in __GI_epoll_pwait (epfd=8, events=0x7f44babe52d8, 
maxevents=, timeout=9, set=0x7f44fa5c66f8 
) at 
../sysdeps/unix/sysv/linux/epoll_pwait.c:42
#1  0x00089f6fab2b in ?? ()
#2  0x7f44babe52d8 in ?? ()
#3  0x00090100 in ?? ()
#4  0x7f44fa5c66f8 in _vlib_init_function_init_linux_epoll_input_init () 
from /lib/x86_64-linux-gnu/libvlib.so.22.10.0
#5  0x in ?? ()

 

So, I can't analyze the core dump. Any ideas on how to catch this crash 
correctly? Disable receiving SIGSEGV? Or is there a way to restore the original 
stack trace of the worker?

 

For the reference, stack traces from syslog:

vnet[2856086]: received signal SIGSEGV, PC 0x7f44b76dbee3, faulting address 
0xb0040114
vnet[2856086]: #0  0x7f44fa43885b 0x7f44fa43885b (unix_signal_handler+379)
vnet[2856086]: #1  0x7f44fa34f3c0 0x7f44fa34f3c0 (__funlockfile)
vnet[2856086]: #2  0x7f44b76dbee3 0x7f44b76dbee3 (ip6_map_t+675)
vnet[2856086]: #3  0x7f44fa3c86fb vlib_worker_loop + 0x1b3b
vnet[2856086]: #4  0x7f44fa41aafa vlib_worker_thread_fn + 0xaa
vnet[2856086]: #5  0x7f44fa414e01 vlib_worker_thread_bootstrap_fn + 0x51
vnet[2856086]: #6  0x7f44fa343609 start_thread + 0xd9
vnet[2856086]: #7  0x7f44fa081163 clone + 0x43

vnet[944491]: received signal SIGSEGV, PC 0x7faf922ca6ae, faulting address 
0x7fb3519530fc
vnet[944491]: #0  0x7faf9102785b 0x7faf9102785b
vnet[944491]: #1  0x7faf90f3e3c0 0x7faf90f3e3c0
vnet[944491]: #2  0x7faf922ca6ae ip4_rewrite_node_fn_skx + 0x149e
vnet[944491]: #3  0x7faf90fb76fb vlib_worker_loop + 0x1b3b
vnet[944491]: #4  0x7faf91009afa vlib_worker_thread_fn + 0xaa
vnet[944491]: #5  0x7faf91003e01 vlib_worker_thread_bootstrap_fn + 0x51
vnet[944491]: #6  0x7faf90f32609 start_thread + 0xd9
vnet[944491]: #7  0x7faf90c70163 clone + 0x43

 

Line information:

Line 135 of "/home/runner/work/vpp/vpp/src/vnet/config.h" starts at address 
0x7f44b76dbee3  and ends at 0x7f44b76dbee7 .

 

Line 135 of "/home/runner/work/vpp/vpp/src/vnet/config.h" starts at address 
0x7f44fb6db6ae  and ends at 0x7f44fb6db6b1 
.

 

-- 

Best regards
Stanislav Zaikin


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22533): https://lists.fd.io/g/vpp-dev/message/22533
Mute This Topic: https://lists.fd.io/mt/96673497/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] VPP LCP: IS-IS does not work

2023-02-01 Thread Gennady Abramov
Hello Stanislav,

Here is it!

Thread 1 "vpp_main" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb)
(gdb) f 9
#9  0x7fffaf404b6a in lcp_router_link_add (rl=0x5a38e0, ctx=0x7fffbb9add18) 
at /home/abramov/vpp-p3-lcp/src/plugins/linux-cp/lcp_router.c:423
423   if (vnet_create_sub_interface (lip->lip_host_sw_if_index, 
vlan, 18,
(gdb) info locals
lip = 0x7fffbb9abd70
if_name = 0x0
sub_phy_sw_if_index = 3
sub_host_sw_if_index = 8
vlan = 1914
ns = 0x0
if_namev = 0xf90b93202e1d7baf 
lipi = 0
up = 0
vnm = 0x77f696e8 
(gdb) p lipi
$1 = 0
(gdb) p *lip
$2 = {lip_host_sw_if_index = 1745050366, lip_phy_sw_if_index = 32767, 
lip_host_name = 0x43 , 
lip_vif_index = 1,
lip_namespace = 0x50005 "", lip_host_type = (LCP_ITF_HOST_TUN | unknown: 
4294967294), lip_phy_adjs = {adj_index = {0, 0}}, lip_flags = (unknown: 0),
lip_rewrite_len = 0 '\000', lip_create_ts = 0}
(gdb)

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22534): https://lists.fd.io/g/vpp-dev/message/22534
Mute This Topic: https://lists.fd.io/mt/96476162/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] help with review

2023-02-01 Thread Pei, Yulong
Hi Benoit,  Please help to review below patches again, all your comments were 
addressed. Thanks a lot.

https://gerrit.fd.io/r/c/vpp/+/38008
https://gerrit.fd.io/r/c/vpp/+/38009

From: vpp-dev@lists.fd.io  On Behalf Of Pei, Yulong
Sent: Thursday, January 12, 2023 8:42 PM
To: vpp-dev ; Benoit Ganne (bganne) 
Subject: [vpp-dev] help review for patch about update af_xdp plugin to depend 
on libxdp

Hi Benoit and vpp-dev,   Could you help review for patch about update af_xdp 
plugin to depend on libxdp  https://gerrit.fd.io/r/c/vpp/+/37869 .

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22535): https://lists.fd.io/g/vpp-dev/message/22535
Mute This Topic: https://lists.fd.io/mt/96695094/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-