Re: [vpp-dev] Linux-cp Plugin Bird Routes Not Showing Up in VPP

2023-01-08 Thread Pim van Pelt via lists.fd.io
+Matthew Smith  and +Jon Loeliger  can
you let me know what you think?
Does Netgate value the 'lcp default netns' or ability to create LIPs in a
namespace other than 'dataplane'?
As I described, I think the ability to change these in the API, or set them
in 'lcp create' yields erratic behavior.

groet,
Pim

On Thu, Dec 22, 2022 at 4:28 PM Pim van Pelt  wrote:

> Hoi,
>
>
> On Thu, Dec 22, 2022 at 4:08 PM Matthew Smith via lists.fd.io  netgate@lists.fd.io> wrote:
>
>>
>> On Thu, Dec 22, 2022 at 7:09 AM Petr Boltík 
>> wrote:
>>
>>>
>>> - To make "plugin linux_nl_plugin.so" working, you need to run VPP
>>> inside netns dataplane (same as bird). This can be done by editing VPP
>>> systemd startup file (add something like "
>>> NetworkNamespacePath=/var/run/netns/dataplane" ) and ensuring that
>>> netns "dataplane" will be started first.
>>>
>>
>> I run VPP in the default netns and use FRR & iproute2 in the dataplane
>> netns and it works fine. I test this regularly on AWS, Azure, KVM, and bare
>> metal. I don't set the netns with vppctl CLI commands though, I set it in
>> startup.conf with 'linux-cp { default netns dataplane }'. I will look into
>> whether something is broken with the CLI command.
>>
> I run VPP in default netns and Bird2 & iproute2 in the dataplane netns. I
> set default netns dataplane in startup.conf also.
>
> I think OP has a different problem, because I do see their netlink
> messages arriving otherwise. One test that you can do, is in the network
> namespace, change the link attribute (like ip link set  mtu 1500; or
> ip link set  up; or down; and then see if 'vppctl show int' reflects
> that change. That would demonstrate that end-to-end, netlink messages are
> arriving from the dataplane netns, through kernel, through linux_nl plugin
> and finally into the dataplane.
>
> One plausible explanation for the behavior is that linux_nl starts the
> netlink listener immediately, based on startup.conf, and it does not change
> its mind when you specify 'lcp netns default' on the CLI, in other words:
> it will only listen to one namespace, namely the one it found when it
> started up. I think this is OP's issue, and if so, then 'lcp netns' is
> broken in linux-cp, it can never work, and actually should be considered
> harmful because there will only be one netns listened to, so changing it
> midflight will give erratic results. The same is true for the 'netns'
> argument to lcp create -- the only place where linux_nl will ever pick up
> netlink messages is in the very first namespace it started in, as specified
> in startup.conf.
>
> As a point of comparison - lcpng will start a netlink listener *once the
> first LIP is created*; which is why it will start the listener in either
> what was setup in startup.conf, or what it changed to with 'lcp default
> netns', to any value set before the very first interface pair is created.
> 'lcp default netns' works there, but as with linux_nl, if any LIP is
> created in a netns other than the initial one which has the (one and only)
> netlink listener in it), it will give unexpected results.
>
> I think we should:
> - remove lcp netns default from CLI
> - remove changing the netns from API
> - force use of only startup.conf to start the netlink listener in that,
> and only that, network namespace.
>
> Thoughts ?
>
> --
> Pim van Pelt 
> PBVP1-RIPE - http://www.ipng.nl/
>


-- 
Pim van Pelt 
PBVP1-RIPE - http://www.ipng.nl/

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22434): https://lists.fd.io/g/vpp-dev/message/22434
Mute This Topic: https://lists.fd.io/mt/95817807/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Possible VPP deadlock

2023-01-08 Thread Dave Barach
It looks like the root-cause is a corrupted heap. See also 
mspace_free()->check_top_chunk()->do_check_top_chunk(). One of the assertions 
is failing.

 

Once the heap is pickled, all bets are off in terms of getting a useful API 
trace. 

 

Since you have a couple of (possibly) useful post-mortem dumps, the first step 
would be to figure out why printing the api traces causes a NULL pointer 
dereference.

 

It should be comparatively simple to work out the NULL pointer dereference in 
the api trace printer. You can binary search for the offending message, walk 
through vl_msg_print_trace(...) and figure out what’s going on. It’s likely 
that the message in question simply doesn’t have a print function. See 
src/vlibmemory/vlib_api_cli.c:api_trace_command_fn() for details. In 
particular, “first” and “last” describe which messages should be printed (or 
replayed).

 

HTH... Dave

 

From: vpp-dev@lists.fd.io  On Behalf Of Pim van Pelt via 
lists.fd.io
Sent: Sunday, January 8, 2023 1:24 PM
To: vpp-dev 
Subject: [vpp-dev] Possible VPP deadlock

 

Hoi,

 

I've had a few instances of a recent VPP hanging - API and CLI go unresponsive, 
forwarding stops (at least, I think), but the worker threads are still 
consuming CPU.

Attaching GDB, I see the main thread is doing the following:

(gdb) bt

#0  0x7f5f6f8f271b in sched_yield () at 
../sysdeps/unix/syscall-template.S:78

#1  0x7f5f6fb3df8b in spin_acquire_lock (sl=) at 
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:468

#2  mspace_malloc (msp=0x130048040, bytes=72) at 
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:4351

#3  0x7f5f6fb66f81 in mspace_memalign (msp=0x130048040, 
alignment=, bytes=72) at 
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:4667

#4  clib_mem_heap_alloc_inline (heap=, size=72, align=, os_out_of_memory_on_failure=1) at 
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:608

#5  clib_mem_heap_alloc_aligned (heap=, size=72, align=8) at 
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:664

#6  0x7f5f6fba5157 in _vec_alloc_internal (n_elts=64, attr=) 
at /home/pim/src/vpp/src/vppinfra/vec.c:35

#7  0x7f5f6fb848c8 in _vec_resize (vp=, n_add=64, hdr_sz=0, 
align=8, elt_sz=) at /home/pim/src/vpp/src/vppinfra/vec.h:256

#8  serialize_vector_write (m=, s=0x7f5f0dbfebc0) at 
/home/pim/src/vpp/src/vppinfra/serialize.c:908

#9  0x7f5f6fb843c1 in serialize_write_not_inline (m=0x7f5f0dbfeb60, 
s=, n_bytes_to_write=4, flags=) at 
/home/pim/src/vpp/src/vppinfra/serialize.c:734

#10 0x7f5f6fe5a053 in serialize_stream_read_write (header=0x7f5f0dbfeb60, 
s=, n_bytes=4, flags=2) at 
/home/pim/src/vpp/src/vppinfra/serialize.h:140

#11 serialize_get (m=0x7f5f0dbfeb60, n_bytes=4) at 
/home/pim/src/vpp/src/vppinfra/serialize.h:180

#12 serialize_integer (m=0x7f5f0dbfeb60, x=, n_bytes=4) at 
/home/pim/src/vpp/src/vppinfra/serialize.h:187

#13 vl_api_serialize_message_table (am=0x7f5f6fe66258 , 
vector=) at /home/pim/src/vpp/src/vlibapi/api_shared.c:210

#14 0x7f5f6fe5a715 in vl_msg_api_trace_save (am=0x130048040, 
which=, fp=0x13f0690, is_json=27 '\033') at 
/home/pim/src/vpp/src/vlibapi/api_shared.c:410

#15 0x7f5f6fe5c0ea in vl_msg_api_post_mortem_dump () at 
/home/pim/src/vpp/src/vlibapi/api_shared.c:880

#16 0x004068c6 in os_panic () at 
/home/pim/src/vpp/src/vpp/vnet/main.c:415

#17 0x7f5f6fb3feed in mspace_free (msp=0x130048040, mem=) at 
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:2954

#18 0x7f5f6fb6bf8c in clib_mem_heap_free (heap=0x0, p=) at 
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:768

#19 clib_mem_free (p=) at 
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:774

#20 0x7f5f2fa32b40 in ?? ()

#21 0x7f5f3302f848 in ?? ()

#22 0x in ?? ()

 

When I kill VPP, sometimes an api_post_mortem is emitted (although most of the 
time they are empty), but subsequently trying to dump it, makes VPP crash -

 

-rw--- 1 ipng ipng 35437 Jan  8 19:08 api_post_mortem.76724

-rw--- 1 ipng ipng 35368 Jan  8 19:08 api_post_mortem.76842

-rw--- 1 ipng ipng 0 Jan  8 19:08 api_post_mortem.76978

-rw--- 1 ipng ipng 0 Jan  8 19:08 api_post_mortem.84008

 

#0  0x in ?? ()

#1  0x77fada5f in vl_msg_print_trace (msg=0x7fff9db73bd8 "", 
ctx=0x7fff53b62ca0) at /home/pim/src/vpp/src/vlibmemory/vlib_api_cli.c:693

#2  0x766a55bb in vl_msg_traverse_trace (tp=0x7fff9b4e7998, 
fn=0x77fad790 , ctx=0x7fff53b62ca0)

at /home/pim/src/vpp/src/vlibapi/api_shared.c:321

#3  0x77fab854 in api_trace_command_fn (vm=0x7fff96000700, 
input=0x7fff53b62f30, cmd=)

at /home/pim/src/vpp/src/vlibmemory/vlib_api_cli.c:727

#4  0x7647fdad in vlib_cli_dispatch_sub_commands (vm=0x7fff96000700, 
cm=, input=0x7fff53b62f30, 

parent_command_index=) at 
/home/pim/src/vpp/src/vlib/cli.c:650

#5  0x7647fb91 in vlib_cli_dispatch_sub_commands (vm=0x7fff96000700, 
cm=, input=0x7fff53b62f30, 

parent_command_index=) at 

[vpp-dev] Possible VPP deadlock

2023-01-08 Thread Pim van Pelt via lists.fd.io
Hoi,

I've had a few instances of a recent VPP hanging - API and CLI go
unresponsive, forwarding stops (at least, I think), but the worker threads
are still consuming CPU.
Attaching GDB, I see the main thread is doing the following:

(gdb) bt

#0  0x7f5f6f8f271b in sched_yield () at
../sysdeps/unix/syscall-template.S:78

#1  0x7f5f6fb3df8b in spin_acquire_lock (sl=) at
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:468

#2  mspace_malloc (msp=0x130048040, bytes=72) at
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:4351

#3  0x7f5f6fb66f81 in mspace_memalign (msp=0x130048040,
alignment=, bytes=72) at
/home/pim/src/vpp/src/vppinfra/dlmalloc.c:4667

#4  clib_mem_heap_alloc_inline (heap=, size=72,
align=, os_out_of_memory_on_failure=1) at
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:608

#5  clib_mem_heap_alloc_aligned (heap=, size=72, align=8) at
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:664

#6  0x7f5f6fba5157 in _vec_alloc_internal (n_elts=64, attr=) at /home/pim/src/vpp/src/vppinfra/vec.c:35

#7  0x7f5f6fb848c8 in _vec_resize (vp=, n_add=64,
hdr_sz=0, align=8, elt_sz=) at
/home/pim/src/vpp/src/vppinfra/vec.h:256

#8  serialize_vector_write (m=, s=0x7f5f0dbfebc0) at
/home/pim/src/vpp/src/vppinfra/serialize.c:908

#9  0x7f5f6fb843c1 in serialize_write_not_inline (m=0x7f5f0dbfeb60,
s=, n_bytes_to_write=4, flags=) at
/home/pim/src/vpp/src/vppinfra/serialize.c:734

#10 0x7f5f6fe5a053 in serialize_stream_read_write
(header=0x7f5f0dbfeb60, s=, n_bytes=4, flags=2) at
/home/pim/src/vpp/src/vppinfra/serialize.h:140

#11 serialize_get (m=0x7f5f0dbfeb60, n_bytes=4) at
/home/pim/src/vpp/src/vppinfra/serialize.h:180

#12 serialize_integer (m=0x7f5f0dbfeb60, x=, n_bytes=4) at
/home/pim/src/vpp/src/vppinfra/serialize.h:187

#13 vl_api_serialize_message_table (am=0x7f5f6fe66258 ,
vector=) at /home/pim/src/vpp/src/vlibapi/api_shared.c:210

#14 0x7f5f6fe5a715 in vl_msg_api_trace_save (am=0x130048040,
which=, fp=0x13f0690, is_json=27 '\033') at
/home/pim/src/vpp/src/vlibapi/api_shared.c:410

#15 0x7f5f6fe5c0ea in vl_msg_api_post_mortem_dump () at
/home/pim/src/vpp/src/vlibapi/api_shared.c:880

#16 0x004068c6 in os_panic () at
/home/pim/src/vpp/src/vpp/vnet/main.c:415

#17 0x7f5f6fb3feed in mspace_free (msp=0x130048040, mem=) at /home/pim/src/vpp/src/vppinfra/dlmalloc.c:2954

#18 0x7f5f6fb6bf8c in clib_mem_heap_free (heap=0x0, p=)
at /home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:768

#19 clib_mem_free (p=) at
/home/pim/src/vpp/src/vppinfra/mem_dlmalloc.c:774

#20 0x7f5f2fa32b40 in ?? ()

#21 0x7f5f3302f848 in ?? ()

#22 0x in ?? ()


When I kill VPP, sometimes an api_post_mortem is emitted (although most of
the time they are empty), but subsequently trying to dump it, makes VPP
crash -

-rw--- 1 ipng ipng 35437 Jan  8 19:08 api_post_mortem.76724

-rw--- 1 ipng ipng 35368 Jan  8 19:08 api_post_mortem.76842

-rw--- 1 ipng ipng 0 Jan  8 19:08 api_post_mortem.76978

-rw--- 1 ipng ipng 0 Jan  8 19:08 api_post_mortem.84008


#0  0x in ?? ()

#1  0x77fada5f in vl_msg_print_trace (msg=0x7fff9db73bd8 "",
ctx=0x7fff53b62ca0)
at /home/pim/src/vpp/src/vlibmemory/vlib_api_cli.c:693

#2  0x766a55bb in vl_msg_traverse_trace (tp=0x7fff9b4e7998,
fn=0x77fad790
, ctx=0x7fff53b62ca0)

at /home/pim/src/vpp/src/vlibapi/api_shared.c:321

#3  0x77fab854 in api_trace_command_fn (vm=0x7fff96000700,
input=0x7fff53b62f30,
cmd=)

at /home/pim/src/vpp/src/vlibmemory/vlib_api_cli.c:727

#4  0x7647fdad in vlib_cli_dispatch_sub_commands (vm=0x7fff96000700,
cm=, input=0x7fff53b62f30,

parent_command_index=) at
/home/pim/src/vpp/src/vlib/cli.c:650

#5  0x7647fb91 in vlib_cli_dispatch_sub_commands (vm=0x7fff96000700,
cm=, input=0x7fff53b62f30,

parent_command_index=) at
/home/pim/src/vpp/src/vlib/cli.c:607

#6  0x7647f0cd in vlib_cli_input (vm=0x7fff96000700,
input=0x7fff53b62f30,
function=, function_arg=)

at /home/pim/src/vpp/src/vlib/cli.c:753

#7  0x764fd5c7 in unix_cli_process_input (cm=,
cli_file_index=0) at /home/pim/src/vpp/src/vlib/unix/cli.c:2616

#8  unix_cli_process (vm=, rt=0x7fff9b69bdc0, f=) at /home/pim/src/vpp/src/vlib/unix/cli.c:2745

#9  0x764a7837 in vlib_process_bootstrap (_a=) at
/home/pim/src/vpp/src/vlib/main.c:1221

#10 0x763f9d94 in clib_calljmp () at
/home/pim/src/vpp/src/vppinfra/longjmp.S:123

#11 0x7fff94700b00 in ?? ()

#12 0x7649f3d0 in vlib_process_startup (vm=0x7fff96000700,
p=0x7fff9b69bdc0,
f=0x0) at /home/pim/src/vpp/src/vlib/main.c:1246

#13 dispatch_process (vm=0x7fff96000700, p=0x7fff9b69bdc0, f=0x0,
last_time_stamp=) at /home/pim/src/vpp/src/vlib/main.c:1302

#14 0x in ?? ()

Has anybody else seen API calls seemingly hang the VPP instance? Is there
an alternative way to pry loose the information in api_post_mortem.* files
? Or any other clues where to narrow down the issue?
It's a rare