Re: [vpp-dev] VPP hanging and running out of memory due to infinite loop related to nat44-hairpinning

2020-12-04 Thread Elias Rudberg
Hi Klement,

> Would you mind pushing it to gerrit?

Here: https://gerrit.fd.io/r/c/vpp/+/30284

>  It would be super cool if the change also contained a test case ;-)

Coolness is always my goal. Have a look, see if the patch qualifies.
:-)

/ Elias


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18252): https://lists.fd.io/g/vpp-dev/message/18252
Mute This Topic: https://lists.fd.io/mt/78662322/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev]: Worker Thread Deadlock Detected from vl_api_clnt_node

2020-12-04 Thread Rajith PR via lists.fd.io
Thanks Dave. Yes I do see some changes made in vlibmemory INFRA in latest
master. Earlier I had hard time back porting some fixes
for NTP and fragmentation. I will plan for rebasing to stable version.

On Thu, 3 Dec 2020 at 6:38 PM,  wrote:

> Looks like a corrupt binary API segment heap to me. Signal 7 in
> mspace_malloc(...) is the root cause. The thread hangs due to recursion on
> the mspace lock trying to print / syslog from the signal handler.
>
>
>
> It is abnormal to allocate memory in vl_msg_api_alloc[_as_if_client] in
> the first place.
>
>
>
> As has been communicated multiple times, 19.08 is no longer supported.
>
>
>
> HTH... Dave
>
>
>
> #13 0x7ffa9f72adf5 in unix_signal_handler (signum=7,
> si=0x7ffa6f6e50f0, uc=0x7ffa6f6e4fc0)
> at /development/libvpp/src/vlib/unix/main.c:127
> #14 
> #15 0x7ffa9f417c03 in mspace_malloc (msp=0x130046010, bytes=77) at
> /development/libvpp/src/vppinfra/dlmalloc.c:4437
> #16 0x7ffa9f416f6f in mspace_get_aligned (msp=0x130046010,
> n_user_data_bytes=77, align=1, align_offset=0)
> at /development/libvpp/src/vppinfra/dlmalloc.c:4186
> #17 0x7ffaa0c7d04f in clib_mem_alloc_aligned_at_offset (size=73,
> align=1, align_offset=0, os_out_of_memory_on_failure=1)
> at /development/libvpp/src/vppinfra/mem.h:139
> #18 0x7ffaa0c7d0a2 in clib_mem_alloc (size=73) at
> /development/libvpp/src/vppinfra/mem.h:155
> #19 0x7ffaa0c7da0a in vl_msg_api_alloc_internal (nbytes=73, pool=0,
> may_return_null=0)
> at /development/libvpp/src/vlibmemory/memory_shared.c:177
> #20 0x7ffaa0c7db6f in vl_msg_api_alloc_as_if_client (nbytes=57) at
> /development/libvpp/src/vlibmemory/memory_shared.c:236
>
>
>
> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Rajith
> PR via lists.fd.io
> *Sent:* Thursday, December 3, 2020 5:55 AM
> *To:* vpp-dev 
> *Subject:* [vpp-dev]: Worker Thread Deadlock Detected from
> vl_api_clnt_node
>
>
>
> Hi All,
>
>
>
> We have hit a VPP Worker Thread Deadlock issue. And from the call stacks
> it looks like the main thread is waiting for workers to come back to their
> main loop( ie has taken the barrier lock) and one of the two workers is on
> spin lock to make an rpc to the main thread.
>
> I believe this lock is held by the main thread.
>
>
>
> We are using *19.08 version* and complete bt is pasted below. Also, Can
> someone explain the purpose of* vl_api_clnt_node*.
>
> 414 
>  
> */* *INDENT-OFF* */*
>
> 415 
>  
> VLIB_REGISTER_NODE 
>  
> (vl_api_clnt_node 
> ) =
>
> 416 
>  {
>
> 417 
>    
> .function  = 
> vl_api_clnt_process 
> ,
>
> 418 
>    
> .type  = 
> VLIB_NODE_TYPE_PROCESS,
>
> 419 
>    
> .name  = 
> *"api-rx-from-ring"*,
>
> 420 
>    
> .state  = 
> VLIB_NODE_STATE_DISABLED,
>
> 421 
>  };
>
> 422 
>  
> */* *INDENT-ON* */*
>
> *Complete Backtrace:*
>
> Thread 3 (Thread 0x7ffa511c9700 (LWP 448)):
> ---Type  to continue, or q  to quit---
> #0  0x7ffa9f6bc276 in vlib_worker_thread_barrier_check () at 
> /development/libvpp/src/vlib/threads.h:430
> #1  0x7ffa9f6c3f19 in vlib_main_or_worker_loop (vm=0x7ffa8797adc0, 
> is_main=0) at /development/libvpp/src/vlib/main.c:1757
> #2  0x7ffa9f6c4fbd in vlib_worker_loop (vm=0x7ffa8797adc0) at 
> /development/libvpp/src/vlib/main.c:1988
> #3  0x7ffa9f703ff1 in vlib_worker_thread_fn (arg=0x7ffa6ccc8640) at 
> /development/libvpp/src/vlib/threads.c:1803
> #4  0x7ffa9f383560 in clib_calljmp () from 
> /usr/local/lib/libvppinfra.so.1.0.1
> #5  0x7ffa511c8ec0 in ?? ()
> #6  0x7ffa9f6fe588 in vlib_worker_thread_bootstrap_fn 
> (arg=0x7ffa6ccc8640) at /development/libvpp/src/vlib/threads.c:573
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>
> Thread 2 (Thread 0x7ffa519ca700 (LWP 447)):
> #0  0x7ffaaae87ef7 in sched_yield () at 
> ../sysdeps/unix/syscall-template.S:78
> #1  0x7ffa9f40fb49 in spin_acquire_lock (sl=0x130046384) at 
> /development/libvpp/src/vppinfra/d

[vpp-dev] Packet corruption over memif ( with libmemif )

2020-12-04 Thread Satya Murthy
Hi,

We are facing a strange issue inĀ  packet transfer over memif channel.

We have a memif connection setup with VPP as master and App as Client.

When the App sends a 64 messages continuously to VPP, the response from VPP is 
getting corrupted. We are doing coloring of the message bytes on both sides and 
we could see that some stomping of memory is happening at lower layers.

One main point here is: Each packet that is getting transferred over memif is 
64K of size. We have made memif interface MTU as 65535 to accomodate this.

When we reduce each message size, then the crash is NOT happening within 64 
messages.
So, basically, we are suspecting that some ring memory is getting congested due 
to which this issue is happening.

Fyi, following is the memif connection details for this.

interface memif0/0
remote-name "App"
remote-interface "app-conn1"
socket-id 0 id 0 mode ip
flags admin-up connected
listener-fd 46 conn-fd 45
num-s2m-rings 1 num-m2s-rings 1 buffer-size 0 num-regions 2
region 0 size 524544 fd 47
region 1 size 268435456 fd 48
master-to-slave ring 0:
region 0 offset 262272 ring-size 16384 int-fd 50
head 16486 tail 102 flags 0x interrupts 100
slave-to-master ring 0:
region 0 offset 0 ring-size 16384 int-fd 49
head 204 tail 204 flags 0x0001 interrupts 0

Any hints on what could be happening here and how to debug this problem.
Really Appreciate any help here, as we are stuck with this problem for the last 
two weeks and not able to debug what the issue is.

--
Thanks & Regards,
Murthy

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18250): https://lists.fd.io/g/vpp-dev/message/18250
Mute This Topic: https://lists.fd.io/mt/78708204/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-