Re: [vpp-dev] Facing issue in inter bridge routing .

2022-08-11 Thread rajith pr
Hi Pragya Nand,

The reachability from 1.1.1.1 to 2.2.2.3 on the host is failing as ARP
request 1.1.1.1 -> 2.2.2.3 gets dropped in VPP. Normally both target and
source IPs for ARP are expected to be in the same subnet of the Rx
interface. You can try enabling proxy ARP on the Rx interface for the dest
IP 2.2.2.3 and check once.

Thanks,
Rajith

On Thu, Aug 11, 2022 at 9:53 AM Pragya Nand Bhagat <
pragya.nand.bhaga...@gmail.com> wrote:

> Hi All,
>
> Inside vpp I have two bridge-domains and each of them has a loopback
> interface as bvi.
> Each bridge domain also has a host-interface .
>
> Following is the set of command used.
>
> Linux CLIs to configure veth:
>
> ip link add ve_A type veth peer name ve_B
> ip link add ve_C type veth peer name ve_D
> ifconfig ve_A 1.1.1.1/24
> ifconfig ve_C 2.2.2.2/24
>
> CLIs on VPP
>
> create host-interface name ve_B
> create host-interface name ve_D
> set interface state host-ve_B up
> set interface state host-ve_D up
>
> create loopback interface
> create loopback interface
> set interface ip address loop0 1.1.1.2/24
> set interface ip address loop1 2.2.2.3/24
> set interface state loop0 up
> set interface state loop1 up
>
> create bridge-domain 100
> create bridge-domain 200
>
> set interface l2 bridge host-ve_B 100
> set interface l2 bridge host-ve_D 200
> set interface l2 bridge loop0 100 bvi
> set interface l2 bridge loop1 200 bvi
>
> [image: image.png]
>
>
>  ping -I ve_A 1.1.1.2  from host is working.
>  ping -I ve_C 2.2.2.3 from host is working.
>
> but
>
>
> *ping -I ve_A 2.2.2.3 fails.*
> vpp# show mode
> l3 local0
> l2 bridge host-ve_B bd_id 100 shg 0
> l2 bridge host-ve_D bd_id 200 shg 0
> l2 bridge loop0 bd_id 100 bvi shg 0
> l2 bridge loop1 bd_id 200 bvi shg 0
>
> vpp# show interface address
> host-ve_B (up):
>   L2 bridge bd-id 100 idx 1 shg 0
> host-ve_D (up):
>   L2 bridge bd-id 200 idx 2 shg 0
> local0 (dn):
> loop0 (up):
>   L2 bridge bd-id 100 idx 1 shg 0 bvi
>   L3 1.1.1.2/24 ip4 table-id 4 fib-idx 1
> loop1 (up):
>   L2 bridge bd-id 200 idx 2 shg 0 bvi
>   L3 2.2.2.3/24 ip4 table-id 4 fib-idx 1
> And following is the output of show errors:
>
> vpp# show error
>Count  Node  Reason
>   Severity
>  3 arp-reply RX interface is unnumbered to
> diffe   error
>  3  af-packet-inputtimed out block
> error
>  3  af-packet-input  total received block
>  error
>  3  l2-learn   L2 learn packets
>  error
>  3  l2-input   L2 input packets
>  error
>  3  l2-flood   L2 flood packets
>  error
>
> Following is the packet trace.
>
> vpp# show trace
> --- Start of thread 0 vpp_main ---
> Packet 1
>
> 17:47:50:154354: af-packet-input
>   af_packet: hw_if_index 1 rx-queue 0 next-index 4
> block 132:
>   address 0x7fb423c4 version 2 seq_num 133 pkt_num 0
> tpacket3_hdr:
>   status 0x2001 len 42 snaplen 42 mac 92 net 106
>   sec 0x62f4832a nsec 0x31ace5eb vlan 0 vlan_tpid 0
> vnet-hdr:
>   flags 0x00 gso_type 0x00 hdr_len 0
>   gso_size 0 csum_start 0 csum_offset 0
> 17:47:50:154368: ethernet-input
>   ARP: 8a:f5:4b:3c:07:19 -> ff:ff:ff:ff:ff:ff
> 17:47:50:154376: l2-input
>   l2-input: sw_if_index 1 dst ff:ff:ff:ff:ff:ff src 8a:f5:4b:3c:07:19
> [l2-learn l2-flood ]
> 17:47:50:154386: l2-learn
>   l2-learn: sw_if_index 1 dst ff:ff:ff:ff:ff:ff src 8a:f5:4b:3c:07:19
> bd_index 1
> 17:47:50:154390: l2-flood
>   l2-flood: sw_if_index 1 dst ff:ff:ff:ff:ff:ff src 8a:f5:4b:3c:07:19
> bd_index 1
> 17:47:50:154395: arp-input
>   request, type ethernet/IP4, address size 6/4
>   8a:f5:4b:3c:07:19/1.1.1.1 -> 00:00:00:00:00:00/2.2.2.3
> 17:47:50:154397: arp-reply
>   request, type ethernet/IP4, address size 6/4
>   8a:f5:4b:3c:07:19/1.1.1.1 -> 00:00:00:00:00:00/2.2.2.3
> 17:47:50:154409: error-drop
>   rx:loop0
> 17:47:50:154411: drop
>   arp-reply: RX interface is unnumbered to different subnet
>
> Packet 2
>
> 17:47:51:174342: af-packet-input
>   af_packet: hw_if_index 1 rx-queue 0 next-index 4
> block 133:
>   address 0x7fb423c5 version 2 seq_num 134 pkt_num 0
> tpacket3_hdr:
>   status 0x2001 len 42 snaplen 42 mac 92 net 106
>   sec 0x62f4832b nsec 0x32721b32 vlan 0 vlan_tpid 0
> vnet-hdr:
>   flags 0x00 gso_type 0x00 hdr_len 0
>   gso_size 0 csum_start 0 csum_offset 0
> 17:47:51:174362: ethernet-input
>   ARP: 8a:f5:4b:3c:07:19 -> ff:ff:ff:ff:ff:ff
> 17:47:51:174370: l2-input
>   l2-input: sw_if_index 1 dst ff:ff:ff:ff:ff:ff src 8a:f5:4b:3c:07:19
> [l2-learn l2-flood ]
> 17:47:51:174374: l2-learn
>   l2-learn: sw_if_index 1 dst ff:ff:ff:ff:ff:ff src 8a:f5:4b:3c:07:19
> bd_index 1
> 17:47:51:174377: l2-flood
>   l2-flood: sw_if_index 1 dst ff:ff:ff:ff:ff:ff src 

Re: [vpp-dev] vpp hangs with bfd configuration along with mpls (inner and outer ctxt)

2022-03-18 Thread Rajith PR via lists.fd.io
HI Sastry,

For VPN v4 session,  labelled IPvx route(for ingressing) and mpls route(for
egressing) need to be set up. Can you check if they are getting
programmed correctly into VPP. The Labeled route seems to be OK from the
o/p you have pasted.

Thanks,
Rajith

On Fri, Mar 18, 2022 at 10:58 AM Sastry Sista 
wrote:

> Hi Rajith,
>Thank you for the clues but, I am not sure how a route
> can resolve this issue?
>
> I do not have any data traffic when we see this issue. Just a bgp is up
> with vpnv4/vpnv6 and configured follower bfd.
>
> Due to BFD events up/down, its calling adj_bfd_notify which goes through
> VRF table i.e inner context i.e FTN table. Its always going through outer
> and inner loops. Never comes out.
>
> Could you please suggest what is mpls php route? Is it for FTN or mpls
> table?
>
> With regards
> Sastry
> 
>
>

-- 
NOTICE TO
RECIPIENT This e-mail message and any attachments are 
confidential and may be
privileged. If you received this e-mail in error, 
any review, use,
dissemination, distribution, or copying of this e-mail is 
strictly
prohibited. Please notify us immediately of the error by return 
e-mail and
please delete this message from your system. For more 
information about Rtbrick, please visit us at www.rtbrick.com 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21056): https://lists.fd.io/g/vpp-dev/message/21056
Mute This Topic: https://lists.fd.io/mt/89811914/21656
Mute #mpls:https://lists.fd.io/g/vpp-dev/mutehashtag/mpls
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] vpp hangs with bfd configuration along with mpls (inner and outer ctxt)

2022-03-17 Thread Rajith PR via lists.fd.io
Hi Sastry,

The loop issue with BFD in our case we resolved by installing mpls php
route with eos. Earlier we had installed the mpls php route without eos.
Our case was mpls php with IPv4 forward. However from the config you shared
your case seems to be with MPLS ingress??


Thanks,
Rajith

On Fri, 18 Mar 2022 at 7:15 AM, Sastry Sista  wrote:

> Hi VPP Experts,
>   Neale try to guide us on this mpls features
> but, I am not sure if he is around or may be busy. We are stuck on mpls
> along with BFD protection.
>
> This is very consistent looping and due to which all the watchdog timers
> are getting expired and no clue if its fixed in later releases of 21.06?
>
> Any help on this is really appreciated so much. We reached last stage of
> feature completion using mpls but, BFD is holding us back very badly.
>
> With Regards
> Sastry
> 
>
>

-- 
NOTICE TO
RECIPIENT This e-mail message and any attachments are 
confidential and may be
privileged. If you received this e-mail in error, 
any review, use,
dissemination, distribution, or copying of this e-mail is 
strictly
prohibited. Please notify us immediately of the error by return 
e-mail and
please delete this message from your system. For more 
information about Rtbrick, please visit us at www.rtbrick.com 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21054): https://lists.fd.io/g/vpp-dev/message/21054
Mute This Topic: https://lists.fd.io/mt/89811914/21656
Mute #mpls:https://lists.fd.io/g/vpp-dev/mutehashtag/mpls
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev]: Question on VPP's hash table usage

2022-03-01 Thread Rajith PR via lists.fd.io
Hi All,

We are observing  a random crash in code that we have added in VPP. The
stack trace indicates an invalid memory access in _*hash_get*(). From the
hash table code we see that the hash table can auto resize and sink based
on the utilization.
So the question is whether we need to take a barrier lock before calling
*hash_set_mem*(). We have two workers used in our product.  The hash table
is created with 0 size, 32 byte key and 64 bit value.
hash_create_mem(0, 32,  sizeof(uword));

VPP Version:* 21.10*

Thread 2 (Thread 0x7f2db2c2e700 (LWP 395)):
#0  0x7f2dffd64492 in __GI___waitpid (pid=21256,
stat_loc=stat_loc@entry=0x7f2db342f918, options=options@entry=0)
at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x7f2dffccf177 in do_system (line=line@entry=0x7f2db342faa0
"/usr/local/bin/rtb-dump-core.sh '212' 'fibd'")
at ../sysdeps/posix/system.c:149
#2  0x7f2dffccf55a in __libc_system
(line=line@entry=0x7f2db342faa0 "/usr/local/bin/rtb-dump-core.sh '212'
'fibd'")
at ../sysdeps/posix/system.c:185
#3  0x7f2e0029b05d in bd_signal_handler_cb (signo=11) at
/development/rtbrick-infrastructure/code/bd/src/bdinfra/bd.c:753
#4  0x7f2df10edb67 in rtb_bd_signal_handler (signo=11) at
/development/libvpp/src/vlib/unix/main.c:101
#5  0x7f2df10edf59 in unix_signal_handler (signum=11,
si=0x7f2db342ff30, uc=0x7f2db342fe00)
at /development/libvpp/src/vlib/unix/main.c:202
#6  
#7  __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:1528
#8  0x7f2df08c06ce in mem_key_equal (h=0x7f2dcaf30bd0,
key1=21474836480, key2=139834257770336)
at /development/libvpp/src/vppinfra/hash.c:940
#9  0x7f2df08c8da0 in key_equal1 (h=0x7f2dcaf30bd0,
key1=21474836480, key2=139834257770336, e=0)
at /development/libvpp/src/vppinfra/hash.c:375
#10 0x7f2df08c809a in key_equal (h=0x7f2dcaf30bd0,
key1=21474836480, key2=139834257770336) at
/development/libvpp/src/vppinfra/hash.c:389
#11 0x7f2df08c8714 in get_indirect (v=0x7f2dcaf30c18,
pi=0x7f2dcaf30fb8, key=139834257770336)
at /development/libvpp/src/vppinfra/hash.c:407
#12 0x7f2df08bebf5 in lookup (v=0x7f2dcaf30c18,
key=139834257770336, op=GET, new_value=0x0, old_value=0x0)
at /development/libvpp/src/vppinfra/hash.c:598
---Type  to continue, or q  to quit---
#13 0x7f2df08be991 in _hash_get (v=0x7f2dcaf30c18,
key=139834257770336) at /development/libvpp/src/vppinfra/hash.c:641


Thanks,
Rajith

-- 
NOTICE TO
RECIPIENT This e-mail message and any attachments are 
confidential and may be
privileged. If you received this e-mail in error, 
any review, use,
dissemination, distribution, or copying of this e-mail is 
strictly
prohibited. Please notify us immediately of the error by return 
e-mail and
please delete this message from your system. For more 
information about Rtbrick, please visit us at www.rtbrick.com 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20939): https://lists.fd.io/g/vpp-dev/message/20939
Mute This Topic: https://lists.fd.io/mt/89496144/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev]: Integratiing VPP NAT for MPLS VPN

2022-02-20 Thread Rajith PR via lists.fd.io
Hi All,

We are exploring VPP's *NAT plugin *for PE router in an  MPLS VPN
deployment.  A reference diagram is given below.

[image: NAT-PE.png]

Private IP addresses are assigned to the hosts by the PE routers(NAT-PE and
PE-2). All the hosts in a VPN(Shop or Bank) are assigned unique IP
addresses by the local PE Router. The routes are distributed across the
edge routers through a routing protocol.
Thus local routing and remote routing is enabled using Private IP
addresses. Local Routing uses l2 rewrites and Remote Routing uses mpls + l2
rewrites(on ingress PE router) and mpls termination and L3 lookup in the
right VRF(on egress PE router).

NAT comes into picture when hosts want to access an internet gateway that
is not part of the VPN. In this case if the packet hits a default
route(internet route) NAT needs to translate the private IP to a public IP.
Other packets need to bypass NAT.
It is also possible for the hosts to access a shared server(192.1.1.4) that
is not part of VPN. In that case NATing needs to happen only if the packet
is destined to the shared servers and bypass NAT otherwise.

>From the study on *nat44ed* it looks like there is no way to apply a policy
to bypass/permit NAT based on destination. So if NAT is applied on an
inside interface all traffic gets NATed. Please let me know if the
understanding is correct.
Is there any way to solve this currently in VPP?

Thanks,
Rajith

-- 
NOTICE TO
RECIPIENT This e-mail message and any attachments are 
confidential and may be
privileged. If you received this e-mail in error, 
any review, use,
dissemination, distribution, or copying of this e-mail is 
strictly
prohibited. Please notify us immediately of the error by return 
e-mail and
please delete this message from your system. For more 
information about Rtbrick, please visit us at www.rtbrick.com 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20883): https://lists.fd.io/g/vpp-dev/message/20883
Mute This Topic: https://lists.fd.io/mt/89289163/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev]: Patches for Code Review and Merge

2022-02-12 Thread Rajith PR via lists.fd.io
Hi VPP Reviewers,

We have been rebasing our downstream vpp version to the upstream version
since 2019 quite successfully.
But due to various reasons we have not been able to upstream the few fixes
that we have made in our downstream version.

The following patches we are submitting for review. Do let us know if any
additional information is needed.
Next week a few more patches are planned for submission.

https://gerrit.fd.io/r/c/vpp/+/35289
https://gerrit.fd.io/r/c/vpp/+/35290
https://gerrit.fd.io/r/c/vpp/+/35291
https://gerrit.fd.io/r/c/vpp/+/35227

Thanks,
Rajith

-- 
NOTICE TO
RECIPIENT This e-mail message and any attachments are 
confidential and may be
privileged. If you received this e-mail in error, 
any review, use,
dissemination, distribution, or copying of this e-mail is 
strictly
prohibited. Please notify us immediately of the error by return 
e-mail and
please delete this message from your system. For more 
information about Rtbrick, please visit us at www.rtbrick.com 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20863): https://lists.fd.io/g/vpp-dev/message/20863
Mute This Topic: https://lists.fd.io/mt/89089764/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev]: Unable to run make test

2022-02-04 Thread Rajith PR via lists.fd.io
Hi Daw/Ole/Klement,

We have root caused the issue. The vpp downstream code we use, is bit
modified to integrate with our stack.
Basically, we have commented out building the vpp executable from the
CMakeFile inorder to provide our custom main.c.
After reverting the CMakeFile change vpp executable is getting built.

Thanks,
Rajith


On Fri, Feb 4, 2022 at 10:10 PM Dave Wallace  wrote:

> Rajith,
>
> What OS are you building on and what VPP branch are you trying to build?
>
> Ubuntu-20.04/master:HEAD works for me.
>
> Thanks,
> -daw-
>
> On 2/4/22 10:29 AM, Rajith PR via lists.fd.io wrote:
>
> Hi Ole/Klement,
>
> I changed the command to make test-debug TEST=ip4. But still the same
> issue. Under install-vpp_debug-native/vpp/bin *vpp* is not getting built
> for some reason. I can see all the shared libs getting built though.
>
> supervisor@rajith_upstream>srv3.nbg1.rtbrick.net:~/libvpp $ make
> test-debug TEST=ip4 V=1
> make -C /home/supervisor/libvpp/build-root PLATFORM=vpp TAG=vpp_debug
> vpp-install
> make[1]: Entering directory '/home/supervisor/libvpp/build-root'
>  Arch for platform 'vpp' is native 
>  Finding source for external 
>  Makefile fragment found in
> /home/supervisor/libvpp/build-data/packages/external.mk 
>  Source found in /home/supervisor/libvpp/build 
>  Arch for platform 'vpp' is native 
>  Finding source for vpp 
>  Makefile fragment found in
> /home/supervisor/libvpp/build-data/packages/vpp.mk 
>  Source found in /home/supervisor/libvpp/src 
> find: ‘/home/supervisor/libvpp/build-root/config.site’: No such file or
> directory
>  Configuring external: nothing to do 
>  Building external: nothing to do 
>  Installing external: nothing to do 
> find: ‘/home/supervisor/libvpp/build-root/config.site’: No such file or
> directory
>  Configuring vpp: nothing to do 
>  Building vpp in
> /home/supervisor/libvpp/build-root/build-vpp_debug-native/vpp 
> ninja: no work to do.
>  Installing vpp: nothing to do 
> make[1]: Leaving directory '/home/supervisor/libvpp/build-root'
> make -C test
> VPP_BUILD_DIR=/home/supervisor/libvpp/build-root/build-vpp_debug-native
> VPP_BIN=/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/bin/vpp
> VPP_PLUGIN_PATH=/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/lib/vpp_plugins:/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/lib64/vpp_plugins
> VPP_TEST_PLUGIN_PATH=/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/lib/vpp_api_test_plugins:/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/lib64/vpp_api_test_plugins
> VPP_INSTALL_PATH=/home/supervisor/libvpp/build-root/install-vpp_debug-native/
> LD_LIBRARY_PATH=/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/lib/:/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/lib64/
> EXTENDED_TESTS= PYTHON= OS_ID=ubuntu RND_SEED=1643988103.7971287
> CACHE_OUTPUT= test
> make[1]: Entering directory '/home/supervisor/libvpp/test'
> 15:21:46,418 Temporary dir is /tmp/vpp-unittest-SanityTestCase-x4melzcc,
> api socket is /tmp/vpp-unittest-SanityTestCase-x4melzcc/api.sock
> 15:21:46,418 vpp_cmdline args:
> ['/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/bin/vpp',
> 'unix', '{', 'nodaemon', '', 'full-coredump', 'coredump-size unlimited',
> 'runtime-dir', '/tmp/vpp-unittest-SanityTestCase-x4melzcc', '}',
> 'api-trace', '{', 'on', '}', 'api-segment', '{', 'prefix',
> 'vpp-unittest-SanityTestCase-x4melzcc', '}', 'cpu', '{', 'main-core', '0',
> '}', 'physmem', '{', 'max-size', '32m', '}', 'statseg', '{', 'socket-name',
> '/tmp/vpp-unittest-SanityTestCase-x4melzcc/stats.sock', '', '}', 'socksvr',
> '{', 'socket-name', '/tmp/vpp-unittest-SanityTestCase-x4melzcc/api.sock',
> '}', 'node { ', '', '}', 'api-fuzz {', 'off', '}', 'plugins', '{',
> 'plugin', 'dpdk_plugin.so', '{', 'disable', '}', 'plugin',
> 'rdma_plugin.so', '{', 'disable', '}', 'plugin', 'lisp_unittest_plugin.so',
> '{', 'enable', '}', 'plugin', 'unittest_plugin.so', '{', 'enable', '}',
> '}', 'plugin_path',
> '/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/lib/vpp_plugins:/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/lib64/vpp_plugins',
> 'test_plugin_path',
> '/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/lib/vpp_api_test_plugins:/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/lib64/vpp_api_test_plugins']
> 15:21:46,418 vpp_cmdline:
> /home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/bin/vpp
> unix { nodaemon  full-coredump coredump-size unlimited runtime-dir
> /tmp/vpp-unittest-SanityTestCase-x4melzcc } api-trace { on } api-segm

Re: [vpp-dev]: Unable to run make test

2022-02-04 Thread Rajith PR via lists.fd.io
vpp'
Traceback (most recent call last):
  File "sanity_run_vpp.py", line 34, in 
tc.setUpClass()
  File "/home/supervisor/libvpp/test/framework.py", line 688, in setUpClass
raise e
  File "/home/supervisor/libvpp/test/framework.py", line 636, in setUpClass
cls.run_vpp()
  File "/home/supervisor/libvpp/test/framework.py", line 536, in run_vpp
stderr=subprocess.PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory:
'/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/bin/vpp':
'/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/bin/vpp'
***
* Sanity check failed, cannot run vpp
***
Makefile:178: recipe for target 'sanity' failed
make[1]: *** [sanity] Error 1
make[1]: Leaving directory '/home/supervisor/libvpp/test'
Makefile:413: recipe for target 'test-debug' failed
make: *** [test-debug] Error 2

Thanks,
Rajith

On Fri, Feb 4, 2022 at 6:14 PM Klement Sekera  wrote:

> Also, running under root is a bad idea and not supported.
>
> Cheers,
> Klement
>
> On 4 Feb 2022, at 13:27, Ole Troan via lists.fd.io <
> otroan=employees@lists.fd.io> wrote:
>
> Rajith,
>
> Have you tried:
> make test-debug TEST=ip4
>
> Cheers,
> Ole
>
>
>
> > On 4 Feb 2022, at 13:18, Rajith PR via lists.fd.io <
> rajith=rtbrick@lists.fd.io> wrote:
> >
> > Hi All,
> >
> > We are trying to understand the VPP test framework. To get started we
> ran an example suite (ip4 test) but it seems that the dependent
> executable(vpp) is missing.
> > Please find the logs below.
> >
> > sudo make test TEST=test_ip4 vpp-install
> > make -C /home/supervisor/libvpp/build-root PLATFORM=vpp TAG=vpp
> vpp-install
> > make[1]: Entering directory '/home/supervisor/libvpp/build-root'
> >  Arch for platform 'vpp' is native 
> >  Finding source for external 
> >  Makefile fragment found in
> /home/supervisor/libvpp/build-data/packages/external.mk 
> >  Source found in /home/supervisor/libvpp/build 
> >  Arch for platform 'vpp' is native 
> >  Finding source for vpp 
> >  Makefile fragment found in
> /home/supervisor/libvpp/build-data/packages/vpp.mk 
> >  Source found in /home/supervisor/libvpp/src 
> > find: ‘/home/supervisor/libvpp/build-root/config.site’: No such file or
> directory
> >  Configuring external: nothing to do 
> >  Building external: nothing to do 
> >  Installing external: nothing to do 
> > find: ‘/home/supervisor/libvpp/build-root/config.site’: No such file or
> directory
> >  Configuring vpp: nothing to do 
> >  Building vpp: nothing to do 
> >  Installing vpp: nothing to do 
> > make[1]: Leaving directory '/home/supervisor/libvpp/build-root'
> > make -C test
> VPP_BUILD_DIR=/home/supervisor/libvpp/build-root/build-vpp-native
> VPP_BIN=/home/supervisor/libvpp/build-root/install-vpp-native/vpp/bin/vpp
> VPP_PLUGIN_PATH=/home/supervisor/libvpp/build-root/install-vpp-native/vpp/lib/vpp_plugins:/home/supervisor/libvpp/build-root/install-vpp-native/vpp/lib64/vpp_plugins
> VPP_TEST_PLUGIN_PATH=/home/supervisor/libvpp/build-root/install-vpp-native/vpp/lib/vpp_api_test_plugins:/home/supervisor/libvpp/build-root/install-vpp-native/vpp/lib64/vpp_api_test_plugins
> VPP_INSTALL_PATH=/home/supervisor/libvpp/build-root/install-vpp-native/
> LD_LIBRARY_PATH=/home/supervisor/libvpp/build-root/install-vpp-native/vpp/lib/:/home/supervisor/libvpp/build-root/install-vpp-native/vpp/lib64/
> EXTENDED_TESTS= PYTHON= OS_ID=ubuntu RND_SEED=1643973244.1148973
> CACHE_OUTPUT= test
> > make[1]: Entering directory '/home/supervisor/libvpp/test'
> > 11:14:05,997 Subprocess returned with OS error: (2) No such file or
> directory:
> '/home/supervisor/libvpp/build-root/install-vpp-native/vpp/bin/vpp'
> > Traceback (most recent call last):
> >   File "sanity_run_vpp.py", line 34, in 
> > tc.setUpClass()
> >   File "/home/supervisor/libvpp/test/framework.py", line 688, in
> setUpClass
> > raise e
> >   File "/home/supervisor/libvpp/test/framework.py", line 636, in
> setUpClass
> > cls.run_vpp()
> >   File "/home/supervisor/libvpp/test/framework.py", line 536, in run_vpp
> > stderr=subprocess.

[vpp-dev]: Unable to run make test

2022-02-04 Thread Rajith PR via lists.fd.io
Hi All,

We are trying to understand the VPP test framework. To get started we ran
an example suite (ip4 test) but it seems that the dependent executable(vpp)
is missing.
Please find the logs below.

*sudo make test TEST=test_ip4 vpp-install*
make -C /home/supervisor/libvpp/build-root PLATFORM=vpp TAG=vpp vpp-install
make[1]: Entering directory '/home/supervisor/libvpp/build-root'
 Arch for platform 'vpp' is native 
 Finding source for external 
 Makefile fragment found in /home/supervisor/libvpp/build-data/packages/
external.mk 
 Source found in /home/supervisor/libvpp/build 
 Arch for platform 'vpp' is native 
 Finding source for vpp 
 Makefile fragment found in /home/supervisor/libvpp/build-data/packages/
vpp.mk 
 Source found in /home/supervisor/libvpp/src 
find: ‘/home/supervisor/libvpp/build-root/config.site’: No such file or
directory
 Configuring external: nothing to do 
 Building external: nothing to do 
 Installing external: nothing to do 
find: ‘/home/supervisor/libvpp/build-root/config.site’: No such file or
directory
 Configuring vpp: nothing to do 
 Building vpp: nothing to do 
 Installing vpp: nothing to do 
make[1]: Leaving directory '/home/supervisor/libvpp/build-root'
make -C test
VPP_BUILD_DIR=/home/supervisor/libvpp/build-root/build-vpp-native
VPP_BIN=/home/supervisor/libvpp/build-root/install-vpp-native/vpp/bin/vpp
VPP_PLUGIN_PATH=/home/supervisor/libvpp/build-root/install-vpp-native/vpp/lib/vpp_plugins:/home/supervisor/libvpp/build-root/install-vpp-native/vpp/lib64/vpp_plugins
VPP_TEST_PLUGIN_PATH=/home/supervisor/libvpp/build-root/install-vpp-native/vpp/lib/vpp_api_test_plugins:/home/supervisor/libvpp/build-root/install-vpp-native/vpp/lib64/vpp_api_test_plugins
VPP_INSTALL_PATH=/home/supervisor/libvpp/build-root/install-vpp-native/
LD_LIBRARY_PATH=/home/supervisor/libvpp/build-root/install-vpp-native/vpp/lib/:/home/supervisor/libvpp/build-root/install-vpp-native/vpp/lib64/
EXTENDED_TESTS= PYTHON= OS_ID=ubuntu RND_SEED=1643973244.1148973
CACHE_OUTPUT= test
make[1]: Entering directory '/home/supervisor/libvpp/test'

*11:14:05,997 Subprocess returned with OS error: (2) No such file or
directory:
'/home/supervisor/libvpp/build-root/install-vpp-native/vpp/bin/vpp'*Traceback
(most recent call last):
  File "sanity_run_vpp.py", line 34, in 
tc.setUpClass()
  File "/home/supervisor/libvpp/test/framework.py", line 688, in setUpClass
raise e
  File "/home/supervisor/libvpp/test/framework.py", line 636, in setUpClass
cls.run_vpp()
  File "/home/supervisor/libvpp/test/framework.py", line 536, in run_vpp
stderr=subprocess.PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory:
'/home/supervisor/libvpp/build-root/install-vpp-native/vpp/bin/vpp':
'/home/supervisor/libvpp/build-root/install-vpp-native/vpp/bin/vpp'
***
* Sanity check failed, cannot run vpp
***
Makefile:178: recipe for target 'sanity' failed
make[1]: *** [sanity] Error 1
make[1]: Leaving directory '/home/supervisor/libvpp/test'
Makefile:409: recipe for target 'test' failed
make: *** [test] Error 2


*supervisor@rajith_fs6104>srv3.nbg1.rtbrick.net:~/libvpp/build-root/install-vpp-native/vpp/bin
$ ll*total 5784
drwxr-xr-x 2 root root4096 Feb  4 11:28 ./
drwxr-xr-x 7 root root4096 Feb  4 11:28 ../
-rwxr-xr-x 1 root root   66992 Feb  4 10:48 svmdbtool*
-rwxr-xr-x 1 root root  108224 Feb  4 10:48 svmtool*
-rwxr-xr-x 1 root root   29688 Feb  4 10:41 vapi_c_gen.py*
-rwxr-xr-x 1 root root8682 Feb  4 10:41 vapi_cpp_gen.py*
-rwxr-xr-x 1 root root 3145776 Feb  4 10:50 vapi_cpp_test*
-rwxr-xr-x 1 root root  539176 Feb  4 10:50 vapi_c_test*
-rwxr-xr-x 1 root root   19351 Feb  4 10:41 vapi_json_parser.py*
-rwxr-xr-x 1 root root  136400 Feb  4 10:50 vat2*
-rwxr-xr-x 1 root root   38530 Feb  4 10:41 vppapigen*
-rwxr-xr-x 1 root root 1174696 Feb  4 10:50 vpp_api_test*
-rwxr-xr-x 1 root root   39680 Feb  4 10:49 vppctl*
-rwxr-xr-x 1 root root  458096 Feb  4 10:50 vpp_echo*
-rwxr-xr-x 1 root root   26024 Feb  4 10:49 vpp_get_metrics*
-rwxr-xr-x 1 root root   28968 Feb  4 10:50 vpp_get_stats*
-rwxr-xr-x 1 root root   32912 Feb  4 10:50 vpp_prometheus_export*
-rwxr-xr-x 1 root root   23256 Feb  4 10:49 vpp_restart*

We use the *sudo make build-release* command for building. Can you let us
know what to see for the missing vpp executable?

Thanks,
Rajith

-- 
NOTICE TO
RECIPIENT This e-mail message and any attachments are 
confidential and may be
privileged. If you received this e-mail in error, 
any review, use,
dissemination, distribution, or 

[vpp-dev]: Segmentation fault in mspace_is_heap_object

2022-01-17 Thread Rajith PR via lists.fd.io
Hi All,

We are facing a random crash during the scale of MPLS tunnels(8000 mpls
tunnels). The crash has been observed multiple times and the call stack is
the same.
During the worker thread crash the main thread has executed the following
lines of code(between the barrier sync and release), please provide us
inputs to debug this issue.


  vlib_worker_thread_barrier_sync (vm);
  ret_val = rtb_vpp_route_rpath_create(route_mapping, ,
RTB_VPP_ADD);
  if (RTB_VPP_SUCCESS != ret_val) {
  goto Exit;
  }
  vec_add1(rpaths, rpath);
  tunnel_sw_if_index = vnet_mpls_tunnel_create(1, 0, NULL);
  vnet_mpls_tunnel_path_add(tunnel_sw_if_index, rpaths);
  vlib_worker_thread_barrier_release (vm);

VPP Version : 20.09

BT:

Thread 3 (Thread 0x7fcf87fff700 (LWP 22778)):
#0  0x7fd04613a492 in __GI___waitpid (pid=25786,
stat_loc=stat_loc@entry=0x7fd02514b9a8,
options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x7fd0460a5177 in do_system (line=line@entry=0x7fd02514bb30
"/usr/local/bin/rtb-dump-core.sh '22729' 'fibd'") at
../sysdeps/posix/system.c:149
#2












*  0x7fd0460a555a in __libc_system (line=line@entry=0x7fd02514bb30
"/usr/local/bin/rtb-dump-core.sh '22729' 'fibd'") at
../sysdeps/posix/system.c:185#3  0x7fd04667105d in bd_signal_handler_cb
(signo=11) at
/development/rtbrick-infrastructure/code/bd/src/bdinfra/bd.c:753#4
 0x7fd037b4b73f in rtb_bd_signal_handler (signo=11) at
/development/libvpp/src/vlib/unix/main.c:80#5  0x7fd037b4bba2 in
unix_signal_handler (signum=11, si=0x7fd02514bfb0, uc=0x7fd02514be80) at
/development/libvpp/src/vlib/unix/main.c:180#6  #7
 0x7fd0372f1389 in mspace_is_heap_object (msp=0x7fd008ec3010,
p=0x7fd0264c66c8) at /development/libvpp/src/vppinfra/dlmalloc.c:4134#8
 0x7fd037b1b482 in clib_mem_is_heap_object (p=0x7fd0264c66c8) at
/development/libvpp/src/vppinfra/mem.h:211#9  0x7fd037b0aaa6 in
_vec_resize_inline (v=0x7fd0264c66d0, length_increment=1, data_bytes=59368,
header_bytes=0, data_align=8, numa_id=255)at
/development/libvpp/src/vppinfra/vec.h:154#10 0x7fd037b15fbe in
vlib_worker_thread_node_refork () at
/development/libvpp/src/vlib/threads.c:1133#11 0x7fd037ac8196 in
vlib_worker_thread_barrier_check () at
/development/libvpp/src/vlib/threads.h:482#12 0x7fd037ac259e in
vlib_main_or_worker_loop (vm=0x7fd00c58ac40, is_main=0) at
/development/libvpp/src/vlib/main.c:1788#13 0x7fd037ac1db7 in
vlib_worker_loop (vm=0x7fd00c58ac40) at
/development/libvpp/src/vlib/main.c:2008#14 0x7fd037b19b1a in
vlib_worker_thread_fn* (arg=0x7fd00925e700) at
/development/libvpp/src/vlib/threads.c:1862
#15 0x7fd037340c34 in clib_calljmp () at
/development/libvpp/src/vppinfra/longjmp.S:123
#16 0x7fcf87ffed00 in ?? ()
#17 0x7fd037b11cc3 in vlib_worker_thread_bootstrap_fn
(arg=0x7fd00925e700) at /development/libvpp/src/vlib/threads.c:585
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 2 (Thread 0x7fcf8cbff700 (LWP 22777)):
#0  0x7fd037ac81c6 in vlib_worker_thread_barrier_check () at
/development/libvpp/src/vlib/threads.h:485
#1  0x7fd037ac259e in vlib_main_or_worker_loop (vm=0x7fd024530e40,
is_main=0) at /development/libvpp/src/vlib/main.c:1788
#2  0x7fd037ac1db7 in vlib_worker_loop (vm=0x7fd024530e40) at
/development/libvpp/src/vlib/main.c:2008
#3  0x7fd037b19b1a in vlib_worker_thread_fn (arg=0x7fd00925e600) at
/development/libvpp/src/vlib/threads.c:1862
#4  0x7fd037340c34 in clib_calljmp () at
/development/libvpp/src/vppinfra/longjmp.S:123
#5  0x7fcf8cbfed00 in ?? ()
#6  0x7fd037b11cc3 in vlib_worker_thread_bootstrap_fn
(arg=0x7fd00925e600) at /development/libvpp/src/vlib/threads.c:585
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 1 (Thread 0x7fd046adb400 (LWP 22729)):
#0  vlib_time_now (vm=0x7fd037d81c40 ) at
/development/libvpp/src/vlib/main.h:345
#1  0x7fd037b1863c in vlib_worker_thread_barrier_release
(vm=0x7fd037d81c40 ) at
/development/libvpp/src/vlib/threads.c:1631
#2  0x7fd039719985 in rtb_vpp_l2_xconnect_route_add_handle
(route_mapping=0x7fcff329d220) at
/development/libvpp/src/vpp/rtbrick/rtb_vpp_l2_xconnect.c:97
#3  0x7fd03971a229 in rtb_vpp_l2_xconnect_route_handle
(route_mapping=0x7fcff329d220, action=0 '\000') at
/development/libvpp/src/vpp/rtbrick/rtb_vpp_l2_xconnect.c:179
#4  0x7fd0396692a7 in rtb_vpp_route_mapping_process
(route_mapping=0x7fcff329d220, action=0 '\000') at
/development/libvpp/src/vpp/rtbrick/rtb_vpp_route.c:258
#5  0x7fd03966cbd4 in rtb_vpp_adj_adjacency_route_handle
(adj_api_out=0x7fcff329d410, action=0 '\000') at
/development/libvpp/src/vpp/rtbrick/rtb_vpp_adj.c:247
#6  0x7fd03966cd30 in rtb_vpp_adj_api_out_process
(table=0x56425216b120, object=0x56425cfa6b30, action=0 '\000') at
/development/libvpp/src/vpp/rtbrick/rtb_vpp_adj.c:283
#7  0x7fd03966cdaf in rtb_vpp_adj_api_out_add_cb 

Re: [vpp-dev] Unable to configure mixed NAT and non-NAT traffic

2022-01-14 Thread Rajith PR via lists.fd.io
Hi all,

Just to add to the query, I have observed that in interface configuration
is optional for NAT to work. All traffic get NATed if out interface is set
with output-feature.

Thanks,
Rajith

On Thu, 13 Jan 2022 at 7:06 AM, alekcejk via lists.fd.io  wrote:

> Hi all,
>
> I am trying to get setup for mixed NAT and non-NAT traffic.
>
> In GNS3 I created VPP VM with three interfaces (1 external, 2 internal).
>
> External interface GigabitEthernet0/5/0 with public IP address
> 203.0.113.1/30 connected to host with IP 203.0.113.2/30 and route to
> 198.51.100.0/24 via 203.0.113.1
> Internal interface GigabitEthernet0/6/0 with private IP address
> 172.16.0.1/24 connected to host with IP 172.16.0.2/24
> Internal interface GigabitEthernet0/7/0 with public IP address
> 198.51.100.1/25 connected to host with IP 198.51.100.2/25
>
> Internal traffic from/to 198.51.100.0/25 should be forwarded without NAT.
> NAT address 198.51.100.128 should be applied on external interface
> only for internal traffic from 172.16.0.0/24.
>
> Here my setup for VPP 21.01.1 (running on CentOS 8)
>
> /etc/vpp/startup.conf:
> unix {
>   nodaemon
>   startup-config /etc/vpp/startup-config
>   log /var/log/vpp/vpp.log
>   full-coredump
>   cli-listen /run/vpp/cli.sock
>   cli-history-limit 100
>   cli-no-banner
>   poll-sleep-usec 10
>   gid vpp
> }
>
> api-trace {
>   on
> }
>
> api-segment {
>   gid vpp
> }
>
> dpdk {
>   dev :00:05.0
>   dev :00:06.0
>   dev :00:07.0
> }
>
> plugins {
>   plugin default { disable }
>   plugin dpdk_plugin.so { enable }
>   plugin nat_plugin.so { enable }
>   plugin arping_plugin.so { enable }
>   plugin ping_plugin.so { enable }
> }
>
> logging {
>default-log-level debug
>default-syslog-log-level debug
> }
>
> ethernet {
>   default-mtu 1500
> }
>
> /etc/vpp/startup-config:
> set interface state GigabitEthernet0/5/0 up
> set interface state GigabitEthernet0/6/0 up
> set interface state GigabitEthernet0/7/0 up
> set interface ip address GigabitEthernet0/5/0 203.0.113.1/30
> set interface ip address GigabitEthernet0/6/0 172.16.0.1/24
> set interface ip address GigabitEthernet0/7/0 198.51.100.1/25
> nat44 enable sessions 5 endpoint-dependent
> nat44 forwarding enable
> nat44 add address 198.51.100.128
> set interface nat44 in GigabitEthernet0/6/0 output-feature
> set interface nat44 out GigabitEthernet0/5/0 output-feature
>
> If I run ping from internal host 172.16.0.2 to external host
> 203.0.113.2 then translation works correctly
> 02:44:23.420497 IP 198.51.100.128 > 203.0.113.2: ICMP echo request, id
> 64233, seq 4, length 64
> 02:44:23.420516 IP 203.0.113.2 > 198.51.100.128: ICMP echo reply, id
> 64233, seq 4, length 64
>
> But if I run ping 203.0.113.2 from internal host 198.51.100.2 then NAT
> also applied even though I didn't set nat in on the
> GigabitEthernet0/7/0
> 02:47:15.242598 IP 198.51.100.128 > 203.0.113.2: ICMP echo request, id
> 22324, seq 127, length 64
> 02:47:15.242620 IP 203.0.113.2 > 198.51.100.128: ICMP echo reply, id
> 22324, seq 127, length 64
>
> vpp# show nat44 interfaces
> NAT44 interfaces:
>  GigabitEthernet0/6/0 output-feature in
>  GigabitEthernet0/5/0 output-feature out
>
> If I remove "output-feature" then translation not applied at all with
> enabled "nat44 forwarding enable".
>
>
>
> In setup for VPP 21.10 I removed "endpoint-dependent" but if
> "output-feature" will stay on internal interface GigabitEthernet0/6/0
> I see new problem.
>
> Only one correct response received on internal host 172.16.0.2 when
> running ping 203.0.113.2, second response comes with source IP
> 198.51.100.128 instead of 203.0.113.2.
> 03:06:18.420787 IP 172.16.0.2 > 203.0.113.2: ICMP echo request, id
> 405, seq 1, length 64
> 03:06:18.427246 IP 203.0.113.2 > 172.16.0.2: ICMP echo reply, id 405,
> seq 1, length 64
> 03:06:19.424157 IP 172.16.0.2 > 203.0.113.2: ICMP echo request, id
> 405, seq 2, length 64
> 03:06:19.424441 IP 198.51.100.128 > 172.16.0.2: ICMP echo reply, id
> 59651, seq 2, length 64
>
> So I removed "output-feature" from internal interface GigabitEthernet0/6/0
>
> /etc/vpp/startup-config:
> set interface state GigabitEthernet0/5/0 up
> set interface state GigabitEthernet0/6/0 up
> set interface state GigabitEthernet0/7/0 up
> set interface ip address GigabitEthernet0/5/0 203.0.113.1/30
> set interface ip address GigabitEthernet0/6/0 172.16.0.1/24
> set interface ip address GigabitEthernet0/7/0 198.51.100.1/25
> nat44 enable sessions 5
> nat44 forwarding enable
> nat44 add address 198.51.100.128
> set interface nat44 in GigabitEthernet0/6/0
> set interface nat44 out GigabitEthernet0/5/0 output-feature
>
> vpp# show nat44 interfaces
> NAT44 interfaces:
>  GigabitEthernet0/6/0 in
>  GigabitEthernet0/5/0 output-feature in out
>
> With this setup NAT also applied to both 172.16.0.0/24 and 198.51.100.0/25
> .
>
> Can someone point me to what is wrong with my settings and what needs
> to be changed in order for the NAT to work as required in my case?
>
> 

Re: [vpp-dev]: SIGSEV in mpls_tunnel_collect_forwarding

2021-10-21 Thread Rajith PR via lists.fd.io
Thanks Neale and Stanislav.

I verified the patch provided, there is no crash now.

On Thu, Oct 21, 2021 at 10:08 PM Neale Ranns  wrote:

> Hi Rajith,
>
>
>
> Can you try this:
>
>   https://gerrit.fd.io/r/c/vpp/+/34200
>
>
>
> I think Stanislav’s solution would also work, but it would mean a no-op
> restack for the tunnel. Not walking the new child is more efficient.
>
>
>
> /neale
>
>
>
> *From: *vpp-dev@lists.fd.io  on behalf of Stanislav
> Zaikin via lists.fd.io 
> *Date: *Thursday, 21 October 2021 at 17:58
> *To: *Rajith PR 
> *Cc: *vpp-dev 
> *Subject: *Re: [vpp-dev]: SIGSEV in mpls_tunnel_collect_forwarding
>
> Hi Rajith,
>
>
>
> Looks like each tunnel uses the same path-list (and different path
> extensions), and when the path has more than FIB_PATH_LIST_POPULAR (64)
> children - it updates all the children with that "popular" flag. And at
> that point there are no path extensions yet on the last child. So, I
> suppose that it should be okay to add a check something like
>
> if (NULL == path_ext)
>
> {
>
> return (FIB_PATH_LIST_WALK_CONTINUE);
>
> }
>
>
>
> On Thu, 21 Oct 2021 at 08:29, Rajith PR via lists.fd.io  rtbrick@lists.fd.io> wrote:
>
> Hi All,
>
>
>
> We are seeing below crash when creating mpls tunnels. The issue is easily
> reproducible just have to create around 100 mpls tunnels. It seems path_ext
> is coming as NULL.
>
>
>
> Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault.
> 0x7fe017bba268 in mpls_tunnel_collect_forwarding (pl_index=90,
> path_index=144, arg=0x7fdfa84acdd8) at
> /home/supervisor/libvpp/src/vnet/mpls/mpls_tunnel.c:125
> 125path_ext->fpe_mpls_flags |= FIB_PATH_EXT_MPLS_FLAG_NO_IP_TTL_DECR;
>
>
>
> VPP Version : *20.09*
>
> Call Stack:
>
>
>
> Thread 1 (Thread 0x7fe02606d400 (LWP 436)):
> #0  0x7fe017bba268 in *mpls_tunnel_collect_forwarding* (pl_index=90,
> path_index=144, arg=0x7fdfa84acdd8) at
> /home/supervisor/libvpp/src/vnet/mpls/mpls_tunnel.c:125
> #1  0x7fe017f85d6d in fib_path_list_walk (path_list_index=90,
> func=0x7fe017bba210 , ctx=0x7fdfa84acdd8)
> at /home/supervisor/libvpp/src/vnet/fib/fib_path_list.c:1408
> #2  0x7fe017bba079 in mpls_tunnel_mk_lb (mt=0x7fdff6701cd8,
> linkt=VNET_LINK_MPLS, fct=FIB_FORW_CHAIN_TYPE_ETHERNET,
> dpo_lb=0x7fdfa84ace20) at
> /home/supervisor/libvpp/src/vnet/mpls/mpls_tunnel.c:170
> #3  0x7fe017bb8b12 in mpls_tunnel_restack (mt=0x7fdff6701cd8) at
> /home/supervisor/libvpp/src/vnet/mpls/mpls_tunnel.c:324
> #4  0x7fe017bbb6c1 in mpls_tunnel_back_walk (node=0x7fdff6701cd8,
> ctx=0x7fdfa84acea8) at
> /home/supervisor/libvpp/src/vnet/mpls/mpls_tunnel.c:1005
>
> #5  0x7fe017f613c2 in fib_node_back_walk_one (ptr=0x7fdfa84acec8,
> ctx=0x7fdfa84acea8) at /home/supervisor/libvpp/src/vnet/fib/fib_node.c:161
> #6  0x7fe017f4be6a in fib_walk_advance (fwi=0) at
> /home/supervisor/libvpp/src/vnet/fib/fib_walk.c:368
> #7  0x7fe017f4ca00 in fib_walk_sync
> (parent_type=FIB_NODE_TYPE_PATH_LIST, parent_index=90, ctx=0x7fdfa84acf70)
> at /home/supervisor/libvpp/src/vnet/fib/fib_walk.c:792
> #8  0x7fe017f85ae3 in fib_path_list_child_add (path_list_index=90,
> child_type=FIB_NODE_TYPE_MPLS_TUNNEL, child_index=46) at
> /home/supervisor/libvpp/src/vnet/fib/fib_path_list.c:1343
> #9  0x7fe017bb89b4 in vnet_mpls_tunnel_path_add (sw_if_index=97,
> rpaths=0x7fdff66448d0) at
> /home/supervisor/libvpp/src/vnet/mpls/mpls_tunnel.c:679
> #10 0x7fe017bbaa30 in vnet_create_mpls_tunnel_command_fn
> (vm=0x7fe017088c40 , input=0x7fdfa84ad820,
> cmd=0x7fdfe235ed98) at
> /home/supervisor/libvpp/src/vnet/mpls/mpls_tunnel.c:868
> #11 0x7fe016d84439 in vlib_cli_dispatch_sub_commands
> (vm=0x7fe017088c40 , cm=0x7fe017088e90
> , input=0x7fdfa84ad820, parent_command_index=818)
> at /home/supervisor/libvpp/src/vlib/cli.c:572
> #12 0x7fe016d841ad in vlib_cli_dispatch_sub_commands
> (vm=0x7fe017088c40 , cm=0x7fe017088e90
> , input=0x7fdfa84ad820, parent_command_index=0)
> at /home/supervisor/libvpp/src/vlib/cli.c:529
> #13 0x7fe016d8335f in vlib_cli_input (vm=0x7fe017088c40
> , input=0x7fdfa84ad820, function=0x0, function_arg=0) at
> /home/supervisor/libvpp/src/vlib/cli.c:674
> #14 0x7fe016e4d879 in unix_cli_exec (vm=0x7fe017088c40
> , input=0x7fdfa84ade40, cmd=0x7fdfe2367888) at
> /home/supervisor/libvpp/src/vlib/unix/cli.c:3399
> #15 0x7fe016d84439 in vlib_cli_dispatch_sub_commands
> (vm=0x7fe017088c40 , cm=0x7fe017088e90
> , input=0x7fdfa84ade40, parent_command_index=0)
> at /home/supervisor/libvpp/src/vlib/cli.c:572
> #16 0x7fe016d8335f in vlib_cli_input (vm=0x7fe017088c40

Re: [vpp-dev] assert in pool_elt_at_index

2021-10-13 Thread Rajith PR via lists.fd.io
HI Stanislav,

My guess is you don't have the commit below.

commit 8341f76fd1cd4351961cd8161cfed2814fc55103
Author: Dave Barach 
Date:   Wed Jun 3 08:05:15 2020 -0400

fib: add barrier sync, pool/vector expand cases

load_balance_alloc_i(...) is not thread safe when the
load_balance_pool or combined counter vectors expand.

Type: fix

Signed-off-by: Dave Barach 
Change-Id: I7f295ed77350d1df0434d5ff461eedafe79131de

Thanks,
Rajith

On Thu, Oct 14, 2021 at 3:57 AM Florin Coras  wrote:

> Hi Stanislav,
>
> The only thing I can think of is that main thread grows the pool, or the
> pool’s bitmap, without a worker barrier while the worker that asserts is
> trying to access it. Is main thread busy doing something (e.g., adding
> routes/interfaces) when the assert happens?
>
> Regards,
> Florin
>
> On Oct 13, 2021, at 2:52 PM, Stanislav Zaikin  wrote:
>
> Hi Florin,
>
> I wasn't aware of those helper functions, thanks! But yeah, it also
> returns 0 (sorry, but there's the trace of another crash)
>
> Thread 3 "vpp_wk_0" received signal SIGABRT, Aborted.
> [Switching to Thread 0x7f9cc0f6a700 (LWP 3546)]
> __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x7f9d61542921 in __GI_abort () at abort.c:79
> #2  0x7f9d624da799 in os_panic () at
> /home/vpp/vpp/src/vppinfra/unix-misc.c:177
> #3  0x7f9d62420f49 in debugger () at
> /home/vpp/vpp/src/vppinfra/error.c:84
> #4  0x7f9d62420cc7 in _clib_error (how_to_die=2, function_name=0x0,
> line_number=0, fmt=0x7f9d644348d0 "%s:%d (%s) assertion `%s' fails") at
> /home/vpp/vpp/src/vppinfra/error.c:143
> #5  0x7f9d636695b4 in load_balance_get (lbi=4569) at
> /home/vpp/vpp/src/vnet/dpo/load_balance.h:222
> #6  0x7f9d63668247 in mpls_lookup_node_fn_hsw (vm=0x7f9ceb0138c0,
> node=0x7f9ceee6f700, from_frame=0x7f9cef9c9240) at
> /home/vpp/vpp/src/vnet/mpls/mpls_lookup.c:229
> #7  0x7f9d63008076 in dispatch_node (vm=0x7f9ceb0138c0,
> node=0x7f9ceee6f700, type=VLIB_NODE_TYPE_INTERNAL,
> dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7f9cef9c9240,
> last_time_stamp=1837178878370487) at /home/vpp/vpp/src/vlib/main.c:1217
> #8  0x7f9d630089e7 in dispatch_pending_node (vm=0x7f9ceb0138c0,
> pending_frame_index=2, last_time_stamp=1837178878370487) at
> /home/vpp/vpp/src/vlib/main.c:1376
> #9  0x7f9d63002441 in vlib_main_or_worker_loop (vm=0x7f9ceb0138c0,
> is_main=0) at /home/vpp/vpp/src/vlib/main.c:1904
> #10 0x7f9d630012e7 in vlib_worker_loop (vm=0x7f9ceb0138c0) at
> /home/vpp/vpp/src/vlib/main.c:2038
> #11 0x7f9d6305995d in vlib_worker_thread_fn (arg=0x7f9ce1b88540) at
> /home/vpp/vpp/src/vlib/threads.c:1868
> #12 0x7f9d62445214 in clib_calljmp () at
> /home/vpp/vpp/src/vppinfra/longjmp.S:123
> #13 0x7f9cc0f69c90 in ?? ()
> #14 0x7f9d63051b83 in vlib_worker_thread_bootstrap_fn
> (arg=0x7f9ce1b88540) at /home/vpp/vpp/src/vlib/threads.c:585
> #15 0x7f9cda360355 in eal_thread_loop (arg=0x0) at
> ../src-dpdk/lib/librte_eal/linux/eal_thread.c:127
> #16 0x7f9d629246db in start_thread (arg=0x7f9cc0f6a700) at
> pthread_create.c:463
> #17 0x7f9d6162371f in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> (gdb) select 5
> (gdb)
> *print pifi( load_balance_pool, 4569 )$1 = 0*
> (gdb) source ~/vpp/extras/gdb/gdbinit
> Loading vpp functions...
> Load vlLoad pe
> Load pifi
> Load node_name_from_index
> Load vnet_buffer_opaque
> Load vnet_buffer_opaque2
> Load bitmap_get
> Done loading vpp functions...
> (gdb) pifi load_balance_pool 4569
> pool_is_free_index (load_balance_pool, 4569)$2 = 0
>
> On Wed, 13 Oct 2021 at 21:55, Florin Coras  wrote:
>
>> Hi Stanislav,
>>
>> Just to make sure the gdb macro is okay, could you run from gdb:
>> pifi(pool, index)? The function is defined in gdb_funcs.c.
>>
>> Regards,
>> Florin
>>
>> On Oct 13, 2021, at 11:30 AM, Stanislav Zaikin  wrote:
>>
>> Hello folks,
>>
>> I'm facing a strange issue with 2 worker threads. Sometimes I get a crash
>> either in "ip6-lookup" or "mpls-lookup" nodes. They happen with assert in
>> the *pool_elt_at_index* macro and always inside the "*load_balance_get*"
>> function. But the load_balance dpo looks perfectly good, I mean it still
>> has a lock and on regular deletion (in the case when the load_balance dpo
>> is deleted) it should be erased properly (with dpo_reset). It happens
>> usually when the main core is executing
>> vlib_worker_thread_barrier_sync_int(), and the other worker is executing
>> vlib_worker_thread_barrier_check().
>> And the strangest thing is, when I run the vpp's gdb helper for checking
>> "pool_index_is_free" or pifi, it shows me that the index isn't free (and
>> the macro in that case shouldn't fire).
>>
>> Any thoughts and inputs are appreciated.
>>
>> Thread 3 "vpp_wk_0" received signal SIGABRT, Aborted.
>> [Switching to Thread 

Re: [vpp-dev]: ASSERT in load_balance_get()

2021-10-13 Thread Rajith PR via lists.fd.io
Yes, mpls routes and ip routes that push label are not MP safe. I fixed
this in our node that programs these routes. Fix is taking the barrier lock
before programming such routes and releasing it once done.

-Rajith

On Wed, 13 Oct 2021 at 1:12 PM, Stanislav Zaikin  wrote:

> Hi Rajith,
>
> Did you find the root cause? I'm facing the same problem, the load_balance
> element in the pool seems to be good. And event a helper from gdbinit says
> that the element isn't free:
>
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #1  0x7f7848cce921 in __GI_abort () at abort.c:79
> #2  0x004079a5 in os_exit (code=1) at
> /home/vpp/src/vpp/vnet/main.c:433
> #3  0x7f784a81b6a8 in unix_signal_handler (signum=6,
> si=0x7f77a86b5cb0, uc=0x7f77a86b5b80) at /home/vpp/src/vlib/unix/main.c:187
> #4  ;
> #5  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> #6  0x7f7848cce921 in __GI_abort () at abort.c:79
> #7  0x7f7849c66799 in os_panic () at
> /home/vpp/src/vppinfra/unix-misc.c:177
> #8  0x7f7849bacf49 in debugger () at /home/vpp/src/vppinfra/error.c:84
> #9  0x7f7849baccc7 in _clib_error (how_to_die=2, function_name=0x0,
> line_number=0, fmt=0x7f784bbc0110 %s:%d (%s) assertion `%s
> fails) at /home/vpp/src/vppinfra/error.c:143
> #10 0x7f784adf55b4 in load_balance_get (lbi=16) at
> /home/vpp/src/vnet/dpo/load_balance.h:222
> #11 0x7f784adf4031 in mpls_lookup_node_fn_hsw (vm=0x7f77d279af40,
> node=0x7f77d615bc80, from_frame=0x7f77d4d62cc0) at
> /home/vpp/src/vnet/mpls/mpls_lookup.c:197
> #12 0x7f784a794076 in dispatch_node (vm=0x7f77d279af40,
> node=0x7f77d615bc80, type=VLIB_NODE_TYPE_INTERNAL,
> dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7f77d4d62cc0,
> last_time_stamp=1518151280278706) at /home/vpp/src/vlib/main.c:1217
> #13 0x7f784a7949e7 in dispatch_pending_node (vm=0x7f77d279af40,
> pending_frame_index=2, last_time_stamp=1518151280278706) at
> /home/vpp/src/vlib/main.c:1376
> #14 0x7f784a78e441 in vlib_main_or_worker_loop (vm=0x7f77d279af40,
> is_main=0) at /home/vpp/src/vlib/main.c:1904
> #15 0x7f784a78d2e7 in vlib_worker_loop (vm=0x7f77d279af40) at
> /home/vpp/src/vlib/main.c:2038
> #16 0x7f784a7e595d in vlib_worker_thread_fn (arg=0x7f77c931a0c0) at
> /home/vpp/src/vlib/threads.c:1868
> #17 0x7f7849bd1214 in clib_calljmp () at
> /home/vpp/src/vppinfra/longjmp.S:123
> #18 0x7f67a37fec90 in ?? ()
> #19 0x7f784a7ddb83 in vlib_worker_thread_bootstrap_fn
> (arg=0x7f77c931a0c0) at /home/vpp/src/vlib/threads.c:585
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> (gdb) select 10
> (gdb) source ~/vpp/extras/gdb/gdbinit
> Loading vpp functions...
> Load vlLoad pe
> Load pifi
> Load node_name_from_index
> Load vnet_buffer_opaque
> Load vnet_buffer_opaque2
> Load bitmap_get
> Done loading vpp functions...
> (gdb) pifi load_balance_pool 16
> pool_is_free_index (load_balance_pool, 16)$12 = 0
>
> On Fri, 17 Jul 2020 at 14:54, Rajith PR via lists.fd.io  rtbrick@lists.fd.io> wrote:
>
>> Hi All,
>>
>> The crash has occurred again in load_balance_get().  This
>> time lbi=320017171. May be an invalid value??. Also, this time the flow is
>> from mpls_lookup_node_fn_avx2().
>> Any pointes to fix the issue would be very helpful.
>>
>> Thread 10 (Thread 0x7fc1c089b700 (LWP 438)):
>> #0  0x7fc27b1a8722 in __GI___waitpid (pid=5267,
>> stat_loc=stat_loc@entry=0x7fc1fefc2a18, options=options@entry=0)
>> at ../sysdeps/unix/sysv/linux/waitpid.c:30
>> #1  0x7fc27b113107 in do_system (line=) at
>> ../sysdeps/posix/system.c:149
>> #2  0x7fc27bc2c76b in bd_signal_handler_cb (signo=6) at
>> /development/librtbrickinfra/bd/src/bd.c:770
>> #3  0x7fc26f5670ac in rtb_bd_signal_handler (signo=6) at
>> /development/libvpp/src/vlib/unix/main.c:80
>> #4  0x7fc26f567447 in unix_signal_handler (signum=6,
>> si=0x7fc1fefc31f0, uc=0x7fc1fefc30c0)
>> at /development/libvpp/src/vlib/unix/main.c:180
>> #5  
>> #6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
>> #7  0x7fc27b1048b1 in __GI_abort () at abort.c:79
>> #8  0x7fc270d09f86 in os_panic () at
>> /development/libvpp/src/vpp/vnet/main.c:559
>> #9  0x7fc26f1a0825 in debugger () at
>> /development/libvpp/src/vppinfra/error.c:84
>> #10 0x7fc26f1a0bf4 in _clib_error (how_to_die=2, function_name=0x0,
>> line_number=0, fmt=0x7fc27055f7a8 "%s:%d (%s) assertion `%s' fails

Re: [vpp-dev]: Assert in vnet_mpls_tunnel_del()

2021-09-14 Thread Rajith PR via lists.fd.io
Hi Neale,

You are right, I think there is a possibility of a route getting resolved
through the tunnel, when the tunnel delete was called.
Though what is being resolved is not the mpls tunnel by itself nor the
rpath with the extension, but the plain nexthop.

Thanks a lot for the pointer. But how can this be fixed in our application?
I can check the ret_val of *vnet_mpls_tunnel_path_remove*() before calling
*vnet_mpls_tunnel_del*.
If paths are not removed, bail out and do the vnet_mpls_tunnel_del at a
later point of time.  Is there any better way to do it? Can you please
suggest.

Thanks,
Rajith




On Tue, Sep 14, 2021 at 5:27 PM Neale Ranns  wrote:

>
>
> Hi Rajiyh,
>
>
>
> Maybe there’s something that still resolves through the tunnel when it’s
> deleted?
>
>
>
> /neale
>
>
>
> *From: *vpp-dev@lists.fd.io  on behalf of Rajith PR
> via lists.fd.io 
> *Date: *Tuesday, 14 September 2021 at 13:17
> *To: *vpp-dev 
> *Subject: *[vpp-dev]: Assert in vnet_mpls_tunnel_del()
>
> Hi All,
>
>
>
> We recently started using the VPP's mpls tunnel constructs for our L2
> cross connect application. In certain test scenarios we are seeing a crash
> in the delete path of the mpls tunnel.
>
> Any pointers to fix the issue would be really helpful.
>
>
>
> Version: *20.09*
>
> Call Stack:
>
>
>
> Thread 1 (Thread 0x7f854cdd3400 (LWP 14261)):
>
> #0  0x7f854c41b492 in __GI___waitpid (pid=21116, 
> stat_loc=stat_loc@entry=0x7f84f79abc28, options=options@entry=0) at 
> ../sysdeps/unix/sysv/linux/waitpid.c:30
>
> #1  0x7f854c386177 in do_system (line=) at 
> ../sysdeps/posix/system.c:149
>
> #2  0x7f854c96918d in bd_signal_handler_cb () from 
> /usr/local/lib/libbd-infra.so
>
> #3  0x7f853db8953f in rtb_bd_signal_handler () from 
> /usr/local/lib/libvlib.so.1.0.1
>
> #4  0x7f853db899a2 in unix_signal_handler () from 
> /usr/local/lib/libvlib.so.1.0.1
>
> #5  
>
> #6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
>
> #7  0x7f854c377921 in __GI_abort () at abort.c:79
>
> #8  0x7f853f58a9e3 in os_panic () from /usr/local/lib/librtbvpp.so
>
> #9  0x7f853d35aaa9 in ?? () from /usr/local/lib/libvppinfra.so.1.0.1
>
> #10 0x7f853d35a827 in _clib_error () from 
> /usr/local/lib/libvppinfra.so.1.0.1
>
> #11 0x7f853ecb2d75 in ?? () from /usr/local/lib/libvnet.so.1.0.1
>
> #12 0x7f853ecb302d in fib_path_list_get_n_paths () from 
> /usr/local/lib/libvnet.so.1.0.1
>
> #13 0x7f853e8f0b5c in ?? () from /usr/local/lib/libvnet.so.1.0.1
>
> #14 0x7f853e8ef942 in ?? () from /usr/local/lib/libvnet.so.1.0.1
>
> #15 0x7f853e8ed51a in ?? () from /usr/local/lib/libvnet.so.1.0.1
>
> #16 0x7f853e2e9e1a in ?? () from /usr/local/lib/libvnet.so.1.0.1
>
> #17 0x7f853e2ea0e8 in vnet_sw_interface_set_flags () from 
> /usr/local/lib/libvnet.so.1.0.1
>
> #18 0x7f853e2eb6e4 in vnet_delete_sw_interface () from 
> /usr/local/lib/libvnet.so.1.0.1
>
> #19 0x7f853e2eede9 in vnet_delete_hw_interface () from 
> /usr/local/lib/libvnet.so.1.0.1
>
> #20 0x7f853e8ee368 in vnet_mpls_tunnel_del () from 
> /usr/local/lib/libvnet.so.1.0.1
>
> #21 0x7f853f74535a in rtb_vpp_l2_xconnect_route_del_handle () from 
> /usr/local/lib/librtbvpp.so
>
> #22 0x7f853f7453fa in rtb_vpp_l2_xconnect_route_handle () from 
> /usr/local/lib/librtbvpp.so
>
> #23 0x7f853f69551b in rtb_vpp_route_mapping_process () from 
> /usr/local/lib/librtbvpp.so
>
> #24 0x7f853f696a67 in rtb_vpp_route_adjacency_handle () from 
> /usr/local/lib/librtbvpp.so
>
> #25 0x7f853f696d22 in rtb_vpp_route_api_out_process () from 
> /usr/local/lib/librtbvpp.so
>
> #26 0x7f85406b3975 in fib_route_api_out_del () from 
> /usr/local/lib/libfibd.so
>
> #27 0x7f85406b3a83 in fib_route_api_out_tbl_vpp_wlk () from 
> /usr/local/lib/libfibd.so
>
> #28 0x7f85406a550b in fib_job_tmr_cb () from /usr/local/lib/libfibd.so
>
> #29 0x7f854b27e6c2 in bds_qrunner_dispatch () from 
> /usr/local/lib/libbds.so
>
> #30 0x7f854b27f77c in bds_qrunner_dispatch_type () from 
> /usr/local/lib/libbds.so
>
> #31 0x7f854b27f9aa in bds_qrunner_dispatch_prepare () from 
> /usr/local/lib/libbds.so
>
> #32 0x7f854b27faa8 in bds_qrunner_expire () from /usr/local/lib/libbds.so
>
> #33 0x7f854a67f616 in ?? () from /usr/local/lib/libqb.so
>
> #34 0x7f854a67cfa7 in ?? () from /usr/local/lib/libqb.so
>
> #35 0x7f854a67d797 in qb_loop_run_vpp_wrapper () from 
> /usr/local/lib/libqb.so
>
> #36 0x7f854a6890e8 in lib_qb_service_start_event_wrapper_loop () from 
> /usr/local/lib/libqb.so
>
>

[vpp-dev]: Assert in vnet_mpls_tunnel_del()

2021-09-14 Thread Rajith PR via lists.fd.io
Hi All,

We recently started using the VPP's mpls tunnel constructs for our L2 cross
connect application. In certain test scenarios we are seeing a crash in the
delete path of the mpls tunnel.
Any pointers to fix the issue would be really helpful.

Version: *20.09*
Call Stack:

Thread 1 (Thread 0x7f854cdd3400 (LWP 14261)):

#0  0x7f854c41b492 in __GI___waitpid (pid=21116,
stat_loc=stat_loc@entry=0x7f84f79abc28, options=options@entry=0) at
../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x7f854c386177 in do_system (line=) at
../sysdeps/posix/system.c:149
#2  0x7f854c96918d in bd_signal_handler_cb () from
/usr/local/lib/libbd-infra.so
#3  0x7f853db8953f in rtb_bd_signal_handler () from
/usr/local/lib/libvlib.so.1.0.1
#4  0x7f853db899a2 in unix_signal_handler () from
/usr/local/lib/libvlib.so.1.0.1
#5  
#6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#7  0x7f854c377921 in __GI_abort () at abort.c:79
#8  0x7f853f58a9e3 in os_panic () from /usr/local/lib/librtbvpp.so
#9  0x7f853d35aaa9 in ?? () from /usr/local/lib/libvppinfra.so.1.0.1
#10 0x7f853d35a827 in _clib_error () from
/usr/local/lib/libvppinfra.so.1.0.1
#11 0x7f853ecb2d75 in ?? () from /usr/local/lib/libvnet.so.1.0.1
#12 0x7f853ecb302d in fib_path_list_get_n_paths () from
/usr/local/lib/libvnet.so.1.0.1
#13 0x7f853e8f0b5c in ?? () from /usr/local/lib/libvnet.so.1.0.1
#14 0x7f853e8ef942 in ?? () from /usr/local/lib/libvnet.so.1.0.1
#15 0x7f853e8ed51a in ?? () from /usr/local/lib/libvnet.so.1.0.1
#16 0x7f853e2e9e1a in ?? () from /usr/local/lib/libvnet.so.1.0.1
#17 0x7f853e2ea0e8 in vnet_sw_interface_set_flags () from
/usr/local/lib/libvnet.so.1.0.1
#18 0x7f853e2eb6e4 in vnet_delete_sw_interface () from
/usr/local/lib/libvnet.so.1.0.1
#19 0x7f853e2eede9 in vnet_delete_hw_interface () from
/usr/local/lib/libvnet.so.1.0.1
#20 0x7f853e8ee368 in vnet_mpls_tunnel_del () from
/usr/local/lib/libvnet.so.1.0.1
#21 0x7f853f74535a in rtb_vpp_l2_xconnect_route_del_handle () from
/usr/local/lib/librtbvpp.so
#22 0x7f853f7453fa in rtb_vpp_l2_xconnect_route_handle () from
/usr/local/lib/librtbvpp.so
#23 0x7f853f69551b in rtb_vpp_route_mapping_process () from
/usr/local/lib/librtbvpp.so
#24 0x7f853f696a67 in rtb_vpp_route_adjacency_handle () from
/usr/local/lib/librtbvpp.so
#25 0x7f853f696d22 in rtb_vpp_route_api_out_process () from
/usr/local/lib/librtbvpp.so
#26 0x7f85406b3975 in fib_route_api_out_del () from
/usr/local/lib/libfibd.so
#27 0x7f85406b3a83 in fib_route_api_out_tbl_vpp_wlk () from
/usr/local/lib/libfibd.so
#28 0x7f85406a550b in fib_job_tmr_cb () from /usr/local/lib/libfibd.so
#29 0x7f854b27e6c2 in bds_qrunner_dispatch () from /usr/local/lib/libbds.so
#30 0x7f854b27f77c in bds_qrunner_dispatch_type () from
/usr/local/lib/libbds.so
#31 0x7f854b27f9aa in bds_qrunner_dispatch_prepare () from
/usr/local/lib/libbds.so
#32 0x7f854b27faa8 in bds_qrunner_expire () from /usr/local/lib/libbds.so
#33 0x7f854a67f616 in ?? () from /usr/local/lib/libqb.so
#34 0x7f854a67cfa7 in ?? () from /usr/local/lib/libqb.so
#35 0x7f854a67d797 in qb_loop_run_vpp_wrapper () from
/usr/local/lib/libqb.so
#36 0x7f854a6890e8 in lib_qb_service_start_event_wrapper_loop ()
from /usr/local/lib/libqb.so
#37 0x7f854c96b262 in bd_event_loop_run_once () from
/usr/local/lib/libbd-infra.so
#38 0x7f85060444f8 in ?? () from
/usr/local/lib/vpp_plugins/rtbrick_plugin.so
#39 0x7f853db058dd in ?? () from /usr/local/lib/libvlib.so.1.0.1
#40 0x7f853d37ec34 in clib_calljmp () from
/usr/local/lib/libvppinfra.so.1.0.1
#41 0x7f85255f7a10 in ?? ()
#42 0x7f853db0531f in ?? () from /usr/local/lib/libvlib.so.1.0.1

Code Snippet for Creating Tunnel (rpath has the out-labels and  nexthop):


  vec_add1(rpaths, rpath);
  tunnel_sw_if_index = vnet_mpls_tunnel_create(1, 0, NULL);
  vnet_mpls_tunnel_path_add(tunnel_sw_if_index, rpaths);
  vnet_sw_interface_admin_up(vnm, tunnel_sw_if_index);


Code Snippet for Deleting Tunnel.

 vec_add1(rpaths, rpath);
 vnet_mpls_tunnel_path_remove(sw_if_index, rpaths);
 vec_free(rpaths);
 vnet_mpls_tunnel_del(sw_if_index);


Thanks,
Rajith

-- 
NOTICE TO
RECIPIENT This e-mail message and any attachments are 
confidential and may be
privileged. If you received this e-mail in error, 
any review, use,
dissemination, distribution, or copying of this e-mail is 
strictly
prohibited. Please notify us immediately of the error by return 
e-mail and
please delete this message from your system. For more 
information about Rtbrick, please visit us at www.rtbrick.com 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20125): https://lists.fd.io/g/vpp-dev/message/20125
Mute This Topic: https://lists.fd.io/mt/85599060/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: 

Re: [vpp-dev] : Worker Thread Deadlock Detected from vl_api_clnt_node

2021-07-09 Thread Rajith PR via lists.fd.io
Hi Satya,

We migrated to 20.09 in March 2021. The crash has not been observed after
that. Not sure if some commit went between 20.05 and 20.09  that has fixed
or improved the situation.

Thanks,
Rajith

On Fri, Jul 9, 2021 at 10:19 AM Satya Murthy 
wrote:

> Hi Rajith / Dave,
>
> We are on fdio.2005 version and see this same crash when we are doing
> packet tracing.
> Is there any specific patch/commit that improves the situation of this
> locking.
>
> If so, Can you please let us know the commit info.
>
> --
> Thanks & Regards,
> Murthy
> 
>
>

-- 
NOTICE TO
RECIPIENT This e-mail message and any attachments are 
confidential and may be
privileged. If you received this e-mail in error, 
any review, use,
dissemination, distribution, or copying of this e-mail is 
strictly
prohibited. Please notify us immediately of the error by return 
e-mail and
please delete this message from your system. For more 
information about Rtbrick, please visit us at www.rtbrick.com 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19731): https://lists.fd.io/g/vpp-dev/message/19731
Mute This Topic: https://lists.fd.io/mt/78681829/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev]: Unable to run VPP with ASAN enabled

2021-05-27 Thread Rajith PR via lists.fd.io
Hi Ben,

The problem seems to be due to external libraries that we have linked with
VPP. These external libraries have not been compiled with ASAN.
I could see that when those external libraries were suppressed through the
MyASAN.supp file, VPP started running with ASAN enabled.

Thanks,
Rajith

On Wed, May 26, 2021 at 2:25 PM Benoit Ganne (bganne) 
wrote:

> Hi Rajith,
>
> > I was able to proceed further after setting LD_PRELOAD to the asan
> > library. After this i get SIGSEGV crash in asan. These dont seem to be
> > related to our code, as without ASAN they have been perfectly working.
>
> I suspect the opposite  - ASan detects errors we do not detect in
> release or debug mode, esp. out-of-bound access and use-after-free. Look
> carefully at /home/supervisor/libvpp/src/vpp/rtbrick/rtb_vpp_ifp.c:287
>
> Best
> ben
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19484): https://lists.fd.io/g/vpp-dev/message/19484
Mute This Topic: https://lists.fd.io/mt/83071228/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev]: Unable to run VPP with ASAN enabled

2021-05-25 Thread Rajith PR via lists.fd.io
 in qb_loop_run_level (level=level@entry=0x611000b0)
at /development/rtbrick-infrastructure/code/qb/lib/loop.c:43
#33 0x766bf54c in qb_loop_run (lp=) at
/development/rtbrick-infrastructure/code/qb/lib/loop.c:210
#34 0x766cb19a in lib_qb_service_start_event_loop () at
/development/rtbrick-infrastructure/code/qb/lib/wrapper/lib_qb_service.c:257
#35 0x502d in main (argc=3, argv=) at
/development/rtbrick-infrastructure/code/bd/src/bd/bd_main.c:136

Thanks,
Rajith



On Tue, May 25, 2021 at 3:10 PM Benoit Ganne (bganne) 
wrote:

> How are you starting VPP? If this is through 'make test' then chances are
> the culprit is the interaction of asan, clang and python [1].
> The easy way to fix is to rebuild with gcc instead of clang, eg.
> ~# make rebuild VPP_EXTRA_CMAKE_ARGS=-DVPP_ENABLE_SANITIZE_ADDR=ON CC=gcc-9
>
> Best
> ben
>
> [1]
> https://gerrit.fd.io/r/c/vpp/+/27268/3/src/vpp-api/python/vpp_papi/vpp_ffi.py.in#1
>
> > -Original Message-
> > From: vpp-dev@lists.fd.io  On Behalf Of Rajith PR
> via
> > lists.fd.io
> > Sent: mardi 25 mai 2021 09:51
> > To: vpp-dev 
> > Subject: [vpp-dev]: Unable to run VPP with ASAN enabled
> >
> > Hi All,
> >
> > I am not able to run VPP with ASAN. Though we have been using VPP for
> > sometime this is the first time we enabled ASAN in the build.
> > I have followed the steps as mentioned in the sanitizer doc, can someone
> > please let me know what is missed here.
> >
> > Run Time Error(Missing symbol):
> >
> > /usr/local/lib/librtbvpp.so:/home/supervisor/libvpp/build-root/install-
> > vpp_debug-native/vpp/lib/libvppinfra.so.1.0.1: undefined symbol:
> > __asan_option_detect_stack_use_after_return
> >
> >
> > VPP Version : 20.09
> >
> >
> > Build Command : make rebuild VPP_EXTRA_CMAKE_ARGS=-
> > DVPP_ENABLE_SANITIZE_ADDR=ON
> >
> >
> > Build Summary :
> >
> > VPP version : 1.0.1-3032~g4b28254fc-dirty
> > VPP library version : 1.0.1
> > GIT toplevel dir: /home/supervisor/libvpp
> > Build type  : debug
> > C flags : -fsanitize=address -DCLIB_SANITIZE_ADDR -Wno-
> > address-of-packed-member -g -fPIC -Werror -Wall -march=corei7 -
> > mtune=corei7-avx -O0 -DCLIB_DEBUG -fstack-protector -DFORTIFY_SOURCE=2 -
> > fno-common
> > Linker flags (apps) : -fsanitize=address
> > Linker flags (libs) : -fsanitize=address
> > Host processor  : x86_64
> > Target processor: x86_64
> > Prefix path :
> > /opt/vpp/external/x86_64;/home/supervisor/libvpp/build-root/install-
> > vpp_debug-native/external
> > Install prefix  : /home/supervisor/libvpp/build-root/install-
> > vpp_debug-native/vpp
> >
> >
> >
> > Reference : https://fd.io/docs/vpp/master/troubleshooting/sanitizer.html
> >
> >
> > Thanks,
> > Rajith
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19462): https://lists.fd.io/g/vpp-dev/message/19462
Mute This Topic: https://lists.fd.io/mt/83071228/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev]: Unable to run VPP with ASAN enabled

2021-05-25 Thread Rajith PR via lists.fd.io
Hi All,

I am not able to run VPP with ASAN. Though we have been using VPP for
sometime this is the first time we enabled ASAN in the build.
I have followed the steps as mentioned in the sanitizer doc, can someone
please let me know what is missed here.

*Run Time Error(Missing symbol):*

/usr/local/lib/librtbvpp.so:/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp/lib/libvppinfra.so.1.0.1:
undefined symbol: __asan_option_detect_stack_use_after_return

*VPP Version* : 20.09

*Build Command* : make rebuild VPP_EXTRA_CMAKE_ARGS=
-DVPP_ENABLE_SANITIZE_ADDR=ON

*Build Summary *:

VPP version : 1.0.1-3032~g4b28254fc-dirty
VPP library version : 1.0.1
GIT toplevel dir: /home/supervisor/libvpp
Build type  : debug
C flags : -fsanitize=address -DCLIB_SANITIZE_ADDR
-Wno-address-of-packed-member -g -fPIC -Werror -Wall -march=corei7
-mtune=corei7-avx -O0 -DCLIB_DEBUG -fstack-protector -DFORTIFY_SOURCE=2
-fno-common
Linker flags (apps) : -fsanitize=address
Linker flags (libs) : -fsanitize=address
Host processor  : x86_64
Target processor: x86_64
Prefix path :
/opt/vpp/external/x86_64;/home/supervisor/libvpp/build-root/install-vpp_debug-native/external
Install prefix  :
/home/supervisor/libvpp/build-root/install-vpp_debug-native/vpp

*Reference *: https://fd.io/docs/vpp/master/troubleshooting/sanitizer.html

Thanks,
Rajith

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19450): https://lists.fd.io/g/vpp-dev/message/19450
Mute This Topic: https://lists.fd.io/mt/83071228/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev]: Query: Socket state -1 on 20.09

2021-03-15 Thread Rajith PR via lists.fd.io
Hi All,

We did a VPP version upgrade from 19.08 to 20.09. I am seeing that the
socket state is -1 in 20.09 on one of the devices. When does this happen?

*20.09*
DBGvpp# show threads
ID NameTypeLWP Sched Policy (Priority)
 lcore  Core   Socket State
0  vpp_main254 other (0)
  1  1  4294967
1  vpp_wk_0workers 539 other (0)2
   2  4294967
2  vpp_wk_1workers 540 other (0)3
   3  4294967
DBGvpp# quit

*19.08*
DBGvpp# show threads
ID NameTypeLWP Sched Policy (Priority)
 lcore  Core   Socket State
0  vpp_main245 other (0)1
   0  0
1  vpp_wk_0workers 556 other (0)  2  2
 0
2  vpp_wk_1workers 557 other (0)  3  3
 0

Thanks,
Rajith

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18919): https://lists.fd.io/g/vpp-dev/message/18919
Mute This Topic: https://lists.fd.io/mt/81347323/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev]: Worker Thread Deadlock Detected from vl_api_clnt_node

2020-12-04 Thread Rajith PR via lists.fd.io
Thanks Dave. Yes I do see some changes made in vlibmemory INFRA in latest
master. Earlier I had hard time back porting some fixes
for NTP and fragmentation. I will plan for rebasing to stable version.

On Thu, 3 Dec 2020 at 6:38 PM,  wrote:

> Looks like a corrupt binary API segment heap to me. Signal 7 in
> mspace_malloc(...) is the root cause. The thread hangs due to recursion on
> the mspace lock trying to print / syslog from the signal handler.
>
>
>
> It is abnormal to allocate memory in vl_msg_api_alloc[_as_if_client] in
> the first place.
>
>
>
> As has been communicated multiple times, 19.08 is no longer supported.
>
>
>
> HTH... Dave
>
>
>
> #13 0x7ffa9f72adf5 in unix_signal_handler (signum=7,
> si=0x7ffa6f6e50f0, uc=0x7ffa6f6e4fc0)
> at /development/libvpp/src/vlib/unix/main.c:127
> #14 
> #15 0x7ffa9f417c03 in mspace_malloc (msp=0x130046010, bytes=77) at
> /development/libvpp/src/vppinfra/dlmalloc.c:4437
> #16 0x7ffa9f416f6f in mspace_get_aligned (msp=0x130046010,
> n_user_data_bytes=77, align=1, align_offset=0)
> at /development/libvpp/src/vppinfra/dlmalloc.c:4186
> #17 0x7ffaa0c7d04f in clib_mem_alloc_aligned_at_offset (size=73,
> align=1, align_offset=0, os_out_of_memory_on_failure=1)
> at /development/libvpp/src/vppinfra/mem.h:139
> #18 0x7ffaa0c7d0a2 in clib_mem_alloc (size=73) at
> /development/libvpp/src/vppinfra/mem.h:155
> #19 0x7ffaa0c7da0a in vl_msg_api_alloc_internal (nbytes=73, pool=0,
> may_return_null=0)
> at /development/libvpp/src/vlibmemory/memory_shared.c:177
> #20 0x7ffaa0c7db6f in vl_msg_api_alloc_as_if_client (nbytes=57) at
> /development/libvpp/src/vlibmemory/memory_shared.c:236
>
>
>
> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Rajith
> PR via lists.fd.io
> *Sent:* Thursday, December 3, 2020 5:55 AM
> *To:* vpp-dev 
> *Subject:* [vpp-dev]: Worker Thread Deadlock Detected from
> vl_api_clnt_node
>
>
>
> Hi All,
>
>
>
> We have hit a VPP Worker Thread Deadlock issue. And from the call stacks
> it looks like the main thread is waiting for workers to come back to their
> main loop( ie has taken the barrier lock) and one of the two workers is on
> spin lock to make an rpc to the main thread.
>
> I believe this lock is held by the main thread.
>
>
>
> We are using *19.08 version* and complete bt is pasted below. Also, Can
> someone explain the purpose of* vl_api_clnt_node*.
>
> 414 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L414> 
> */* *INDENT-OFF* */*
>
> 415 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L415> 
> VLIB_REGISTER_NODE 
> <http://rajith/lxr/http/ident?sn=vpp-19-08;i=VLIB_REGISTER_NODE> 
> (vl_api_clnt_node 
> <http://rajith/lxr/http/ident?sn=vpp-19-08;i=vl_api_clnt_node>) =
>
> 416 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L416> {
>
> 417 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L417>   
> .function <http://rajith/lxr/http/ident?sn=vpp-19-08;i=function> = 
> vl_api_clnt_process 
> <http://rajith/lxr/http/ident?sn=vpp-19-08;i=vl_api_clnt_process>,
>
> 418 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L418>   
> .type <http://rajith/lxr/http/ident?sn=vpp-19-08;i=type> = 
> VLIB_NODE_TYPE_PROCESS,
>
> 419 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L419>   
> .name <http://rajith/lxr/http/ident?sn=vpp-19-08;i=name> = 
> *"api-rx-from-ring"*,
>
> 420 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L420>   
> .state <http://rajith/lxr/http/ident?sn=vpp-19-08;i=state> = 
> VLIB_NODE_STATE_DISABLED,
>
> 421 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L421> };
>
> 422 
> <http://rajith/lxr/http/source/src/vlibmemory/vlib_api.c?sn=vpp-19-08#L422> 
> */* *INDENT-ON* */*
>
> *Complete Backtrace:*
>
> Thread 3 (Thread 0x7ffa511c9700 (LWP 448)):
> ---Type  to continue, or q  to quit---
> #0  0x7ffa9f6bc276 in vlib_worker_thread_barrier_check () at 
> /development/libvpp/src/vlib/threads.h:430
> #1  0x7ffa9f6c3f19 in vlib_main_or_worker_loop (vm=0x7ffa8797adc0, 
> is_main=0) at /development/libvpp/src/vlib/main.c:1757
> #2  0x7ffa9f6c4fbd in vlib_worker_loop (vm=0x7ffa8797adc0) at 
> /development/libvpp/src/vlib/main.c:1988
> #3  0x7ffa9f703ff1 in vlib_worker_thread_fn (arg=0x7ffa6ccc8640) at 
> /development/libvpp/src/vlib/threads.c:1803
> #4  0x7ffa9f383560 in clib_calljmp () from 
> /usr/local/lib/libvppinfra.so.1.0.1
&

[vpp-dev]: Worker Thread Deadlock Detected from vl_api_clnt_node

2020-12-03 Thread Rajith PR via lists.fd.io
Hi All,

We have hit a VPP Worker Thread Deadlock issue. And from the call stacks it
looks like the main thread is waiting for workers to come back to their
main loop( ie has taken the barrier lock) and one of the two workers is on
spin lock to make an rpc to the main thread.
I believe this lock is held by the main thread.

We are using *19.08 version* and complete bt is pasted below. Also, Can
someone explain the purpose of* vl_api_clnt_node*.

414 
*/* *INDENT-OFF* */*415

VLIB_REGISTER_NODE

(vl_api_clnt_node
) =416

{417 
  .function  =
vl_api_clnt_process
,418

  .type  =
VLIB_NODE_TYPE_PROCESS,419

  .name  =
*"api-rx-from-ring"*,420

  .state  =
VLIB_NODE_STATE_DISABLED,421

};422 

*/* *INDENT-ON* */*

*Complete Backtrace:*

Thread 3 (Thread 0x7ffa511c9700 (LWP 448)):
---Type  to continue, or q  to quit---
#0  0x7ffa9f6bc276 in vlib_worker_thread_barrier_check () at
/development/libvpp/src/vlib/threads.h:430
#1  0x7ffa9f6c3f19 in vlib_main_or_worker_loop (vm=0x7ffa8797adc0,
is_main=0) at /development/libvpp/src/vlib/main.c:1757
#2  0x7ffa9f6c4fbd in vlib_worker_loop (vm=0x7ffa8797adc0) at
/development/libvpp/src/vlib/main.c:1988
#3  0x7ffa9f703ff1 in vlib_worker_thread_fn (arg=0x7ffa6ccc8640)
at /development/libvpp/src/vlib/threads.c:1803
#4  0x7ffa9f383560 in clib_calljmp () from
/usr/local/lib/libvppinfra.so.1.0.1
#5  0x7ffa511c8ec0 in ?? ()
#6  0x7ffa9f6fe588 in vlib_worker_thread_bootstrap_fn
(arg=0x7ffa6ccc8640) at /development/libvpp/src/vlib/threads.c:573
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 2 (Thread 0x7ffa519ca700 (LWP 447)):
#0  0x7ffaaae87ef7 in sched_yield () at
../sysdeps/unix/syscall-template.S:78
#1  0x7ffa9f40fb49 in spin_acquire_lock (sl=0x130046384) at
/development/libvpp/src/vppinfra/dlmalloc.c:466
#2  0x7ffa9f4173a4 in mspace_malloc (msp=0x130046010, bytes=66) at
/development/libvpp/src/vppinfra/dlmalloc.c:4347
#3  0x7ffa9f4170de in mspace_get_aligned (msp=0x130046010,
n_user_data_bytes=66, align=16, align_offset=8)
at /development/libvpp/src/vppinfra/dlmalloc.c:4233
#4  0x7ffa9f4036fc in clib_mem_alloc_aligned_at_offset (size=46,
align=8, align_offset=8, os_out_of_memory_on_failure=1)
at /development/libvpp/src/vppinfra/mem.h:139
#5  0x7ffa9f403947 in vec_resize_allocate_memory (v=0x0,
length_increment=38, data_bytes=46, header_bytes=8, data_align=8)
at /development/libvpp/src/vppinfra/vec.c:59
#6  0x7ffa9f370aa5 in _vec_resize_inline (v=0x0,
length_increment=38, data_bytes=38, header_bytes=0, data_align=1)
at /development/libvpp/src/vppinfra/vec.h:147
#7  0x7ffa9f371a93 in do_percent (_s=0x7ffa6f6e4b58,
fmt=0x7ffa9f4244e8 "%s:%d (%s) assertion `%s' fails",
va=0x7ffa6f6e4c08)
at /development/libvpp/src/vppinfra/format.c:341
#8  0x7ffa9f371edb in va_format (s=0x0, fmt=0x7ffa9f4244e8 "%s:%d
(%s) assertion `%s' fails", va=0x7ffa6f6e4c08)
at /development/libvpp/src/vppinfra/format.c:404
#9  0x7ffa9f3629d0 in _clib_error (how_to_die=2,
function_name=0x0, line_number=0, fmt=0x7ffa9f4244e8 "%s:%d (%s)
assertion `%s' fails")
at /development/libvpp/src/vppinfra/error.c:127
#10 0x7ffa9f370a3c in _vec_resize_inline (v=0x7ffa6dd00470,
length_increment=16, data_bytes=16, header_bytes=0, data_align=1)
at /development/libvpp/src/vppinfra/vec.h:136
#11 0x7ffa9f371ea2 in va_format (s=0x7ffa6dd00470 "received signal
SIGABRT, PC 0x7ffaaadc2fb7",
fmt=0x7ffa9f74923b "received signal %U, PC %U", va=0x7ffa6f6e4e08)
at /development/libvpp/src/vppinfra/format.c:403
#12 0x7ffa9f372048 in format (s=0x7ffa6dd00470 "received signal
SIGABRT, PC 0x7ffaaadc2fb7",
fmt=0x7ffa9f74923b "received signal %U, PC %U") at
/development/libvpp/src/vppinfra/format.c:428
#13 0x7ffa9f72adf5 in unix_signal_handler (signum=7,
si=0x7ffa6f6e50f0, uc=0x7ffa6f6e4fc0)
at 

Re: [vpp-dev]: Crash in memclnt_queue_callback().

2020-11-17 Thread Rajith PR via lists.fd.io
Thanks Dave, will check it out.

-Rajith

On Tue, Nov 17, 2020 at 8:40 PM  wrote:

> Let’s be clear: you’re seeing a crash in a *modified fork* of vpp-19.08.
> I’ve never seen such a crash myself, nor has one such been reported by
> anyone else to my knowledge.
>
>
>
> That having been written, all signs point to the volatile int ** vector
> vl_api_queue_cursizes having had an accident:
>
>
>
> static void
>
> memclnt_queue_callback (vlib_main_t * vm)
>
> {
>
> 
>
>   for (i = 0; i < vec_len (vl_api_queue_cursizes); i++)
>
> {
>
>   if (*vl_api_queue_cursizes[i])
>
>   {
>
> vm->queue_signal_pending = 1;
>
> vm->api_queue_nonempty = 1;
>
> vlib_process_signal_event (vm, vl_api_clnt_node.index,
>
>/* event_type */ QUEUE_SIGNAL_EVENT,
>
>/* event_data */ 0);
>
> break;
>
>   }
>
> }
>
> 
>
> }
>
>
>
> Try a debug image. Try capturing “i”, and the value
> vl_api_queue_cursizes[i] before dereferencing as a pointer. Add a couple of
> global variables with names which won’t collide with anything else:
>
>
>
> void int oingo_save_i;
>
> void oingo_save_cursizep;
>
>
>
> In the loop, set:
>
>oingo_save_i = i;
>
>oingo_save_cursizep = vl_api_queue_cursizes[i];
>
>
>
>if(*vl_api_queue_cursizes[i])
>
>  
>
>
>
> Capture a coredump. It should be obvious why the reference blows up. If
> you can, change your custom signal handler so that the faulting virtual
> address is as obvious as possible.
>
>
>
> Beyond that, you’re on your own.
>
>
>
> HTH... Dave
>
>
>
> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Rajith
> PR via lists.fd.io
> *Sent:* Tuesday, November 17, 2020 7:03 AM
> *To:* vpp-dev 
> *Subject:* [vpp-dev]: Crash in memclnt_queue_callback().
>
>
>
> Hi All,
>
>
>
> We are seeing a random crash in *VPP-19.08*. The crash is occurring in 
> memclnt_queue_callback
> and it is in code that we are not using. Any pointers to fix the crash
> would be helpful.
>
>
>
> *Complete Call Stack:*
>
>
>
> Thread 1 (Thread 0x7fe728f43d00 (LWP 189)):
>
> #0  0x7fe728049492 in __GI___waitpid (pid=732, 
> stat_loc=stat_loc@entry=0x7fe6f9ebeed8, options=options@entry=0)
>
> at ../sysdeps/unix/sysv/linux/waitpid.c:30
>
> #1  0x7fe727fb4177 in do_system (line=) at 
> ../sysdeps/posix/system.c:149
>
> #2  0x7fe728ad6457 in bd_signal_handler_cb (signo=11) at 
> /development/librtbrickinfra/bd/src/bd.c:770
>
> #3  0x7fe71c90fbf7 in rtb_bd_signal_handler (signo=11) at 
> /development/libvpp/src/vlib/unix/main.c:80
>
> #4  0x7fe71c90ff92 in unix_signal_handler (signum=11, si=0x7fe6f9ebf7b0, 
> uc=0x7fe6f9ebf680)
>
> at /development/libvpp/src/vlib/unix/main.c:180
>
> #5  
>
> #6  memclnt_queue_callback (vm=0x7fe71cb49e80 ) at 
> /development/libvpp/src/vlibmemory/memory_api.c:96
>
> #7  0x7fe71c8a9258 in vlib_main_or_worker_loop (vm=0x7fe71cb49e80 
> , is_main=1)
>
> at /development/libvpp/src/vlib/main.c:1799
>
> #8  0x7fe71c8a9f9d in vlib_main_loop (vm=0x7fe71cb49e80 
> ) at /development/libvpp/src/vlib/main.c:1982
>
> #9  0x7fe71c8aac7b in vlib_main (vm=0x7fe71cb49e80 , 
> input=0x7fe6f9ebffb0) at /development/libvpp/src/vlib/main.c:2209
>
> #10 0x7fe71c911745 in thread0 (arg=140630595772032) at 
> /development/libvpp/src/vlib/unix/main.c:666
>
> #11 0x7fe71c568560 in clib_calljmp () from 
> /usr/local/lib/libvppinfra.so.1.0.1
>
> #12 0x7ffe85672480 in ?? ()
>
> #13 0x7fe71c911cbb in vlib_unix_main (argc=42, argv=0x563be4aaa5a0) at 
> /development/libvpp/src/vlib/unix/main.c:736
>
> #14 0x7fe71e0bc9eb in rtb_vpp_core_init (argc=42, argv=0x563be4aaa5a0) at 
> /development/libvpp/src/vpp/vnet/main.c:483
>
> #15 0x7fe71e18fba2 in rtb_vpp_main () at 
> /development/libvpp/src/vpp/rtbrick/rtb_vpp_main.c:113
>
> #16 0x7fe728ad5e46 in bd_load_daemon_lib (dmn_lib_cfg=0x7fe728cf2820 
> )
>
> ---Type  to continue, or q  to quit---
>
> at /development/librtbrickinfra/bd/src/bd.c:627
>
> #17 0x7fe728ad5ef1 in bd_load_all_daemon_libs () at 
> /development/librtbrickinfra/bd/src/bd.c:646
>
> #18 0x7fe728ad7362 in bd_start_process () at 
> /development/librtbrickinfra/bd/src/bd.c:1128
>
> #19 0x7fe72583c860 in bds_bd_init () at 
> /development/librtbrickinfra/libbds/code/bds/src/bds.c:657
>
> #20 0x7fe7258c8a30 in pubsub_bd_init_expiry (data=0x0) at 
> /development/librtbri

[vpp-dev]: Crash in memclnt_queue_callback().

2020-11-17 Thread Rajith PR via lists.fd.io
Hi All,

We are seeing a random crash in *VPP-19.08*. The crash is occurring in
memclnt_queue_callback
and it is in code that we are not using. Any pointers to fix the crash
would be helpful.

*Complete Call Stack:*

Thread 1 (Thread 0x7fe728f43d00 (LWP 189)):

#0  0x7fe728049492 in __GI___waitpid (pid=732,
stat_loc=stat_loc@entry=0x7fe6f9ebeed8, options=options@entry=0)
at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x7fe727fb4177 in do_system (line=) at
../sysdeps/posix/system.c:149
#2  0x7fe728ad6457 in bd_signal_handler_cb (signo=11) at
/development/librtbrickinfra/bd/src/bd.c:770
#3  0x7fe71c90fbf7 in rtb_bd_signal_handler (signo=11) at
/development/libvpp/src/vlib/unix/main.c:80
#4  0x7fe71c90ff92 in unix_signal_handler (signum=11,
si=0x7fe6f9ebf7b0, uc=0x7fe6f9ebf680)
at /development/libvpp/src/vlib/unix/main.c:180
#5  
#6  memclnt_queue_callback (vm=0x7fe71cb49e80 ) at
/development/libvpp/src/vlibmemory/memory_api.c:96
#7  0x7fe71c8a9258 in vlib_main_or_worker_loop (vm=0x7fe71cb49e80
, is_main=1)
at /development/libvpp/src/vlib/main.c:1799
#8  0x7fe71c8a9f9d in vlib_main_loop (vm=0x7fe71cb49e80
) at /development/libvpp/src/vlib/main.c:1982
#9  0x7fe71c8aac7b in vlib_main (vm=0x7fe71cb49e80
, input=0x7fe6f9ebffb0) at
/development/libvpp/src/vlib/main.c:2209
#10 0x7fe71c911745 in thread0 (arg=140630595772032) at
/development/libvpp/src/vlib/unix/main.c:666
#11 0x7fe71c568560 in clib_calljmp () from
/usr/local/lib/libvppinfra.so.1.0.1
#12 0x7ffe85672480 in ?? ()
#13 0x7fe71c911cbb in vlib_unix_main (argc=42,
argv=0x563be4aaa5a0) at /development/libvpp/src/vlib/unix/main.c:736
#14 0x7fe71e0bc9eb in rtb_vpp_core_init (argc=42,
argv=0x563be4aaa5a0) at /development/libvpp/src/vpp/vnet/main.c:483
#15 0x7fe71e18fba2 in rtb_vpp_main () at
/development/libvpp/src/vpp/rtbrick/rtb_vpp_main.c:113
#16 0x7fe728ad5e46 in bd_load_daemon_lib
(dmn_lib_cfg=0x7fe728cf2820 )
---Type  to continue, or q  to quit---
at /development/librtbrickinfra/bd/src/bd.c:627
#17 0x7fe728ad5ef1 in bd_load_all_daemon_libs () at
/development/librtbrickinfra/bd/src/bd.c:646
#18 0x7fe728ad7362 in bd_start_process () at
/development/librtbrickinfra/bd/src/bd.c:1128
#19 0x7fe72583c860 in bds_bd_init () at
/development/librtbrickinfra/libbds/code/bds/src/bds.c:657
#20 0x7fe7258c8a30 in pubsub_bd_init_expiry (data=0x0) at
/development/librtbrickinfra/libbds/code/pubsub/src/pubsub_helper.c:1444
#21 0x7fe7285d6640 in timer_dispatch (item=0x563be68209b0,
p=QB_LOOP_HIGH) at
/development/librtbrickinfra/libqb/lib/loop_timerlist.c:56
#22 0x7fe7285d25d6 in qb_loop_run_level (level=0x563be47a17a0) at
/development/librtbrickinfra/libqb/lib/loop.c:43
#23 0x7fe7285d2d4b in qb_loop_run (lp=0x563be47a1730) at
/development/librtbrickinfra/libqb/lib/loop.c:210
#24 0x7fe7285e461e in lib_qb_service_start_event_loop () at
/development/librtbrickinfra/libqb/lib/wrapper/lib_qb_service.c:257
#25 0x563be3d9f153 in main ()
(gdb)


*Code Snippet:*


 94   for (i = 0; i < vec_len (vl_api_queue_cursizes); i++)

 95 {
 96   if (*vl_api_queue_cursizes[i])
<-Crashed here
 97 {
 98   vm->queue_signal_pending = 1;
 99   vm->api_queue_nonempty = 1;
100   vlib_process_signal_event (vm, vl_api_clnt_node.index,
101  /* event_type */ QUEUE_SIGNAL_EVENT,
102  /* event_data */ 0);
103   break;
104 }
105 }


Thanks,

Rajith

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18053): https://lists.fd.io/g/vpp-dev/message/18053
Mute This Topic: https://lists.fd.io/mt/78314224/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] Lockless queue/ring buffer

2020-09-17 Thread Rajith PR via lists.fd.io
Hi All,

We are integrating a *Linux pthread* with a *vpp thread* and are looking
for a *lockless queue/ring buffer implementation* that can be used.
In vpp infra i could see fifo and ring. But not sure if they can be used
for enqueue/dequeue from a pthread that VPP is not aware off.
Do you have any reference code for such integration or any suggestions?

Thanks,
Rajith

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#17439): https://lists.fd.io/g/vpp-dev/message/17439
Mute This Topic: https://lists.fd.io/mt/76905306/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev]: Crash in Timer wheel infra

2020-09-09 Thread Rajith PR via lists.fd.io
Hi Andreas/Dave,

I did some experiments to debug the crash.

Firstly, I added some profiling code in vlib/main.c. The code basically is
added to know the timer_wheel slips that can cause such havocs(as mentioned
by Andreas). There are slippages as you can see from the data collected
from the core file.

Total slips = 21489 out of total 98472987 runs.

10 usec for the process timer wheel is something we may not be able to
achieve as we have a process node in which our solution runs. We would like
to increase 10 usec to 100 usec and observe the behaviour.  I tried
increasing the interval from 10 usec to 100 usec but then the process nodes
were scheduled very slow. What is the correct way to increase the interval?

Profiling code added,

   tw_start_time = vlib_time_now
<http://rajith/lxr/http/ident?sn=vpp-19-08;i=vlib_time_now> (vm);

 <http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1869>
   if (tw_start_time > tw_last_start_time) {
   interval = tw_start_time - tw_last_start_time;
   if (interval > PROCESS_TW_TIMER_INTERVAL) {
<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1872>
  tw_slips++;
<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1877>
  }
   tw_total_run++;
<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1879>
  }
   tw_last_start_time = tw_start_time;

   nm->data_from_advancing_timing_wheel =
<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1883>
TW <http://rajith/lxr/http/ident?sn=vpp-19-08;i=TW>
(tw_timer_expire_timers_vec
<http://rajith/lxr/http/ident?sn=vpp-19-08;i=tw_timer_expire_timers_vec>)
<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1884>
((TWT <http://rajith/lxr/http/ident?sn=vpp-19-08;i=TWT>
(tw_timer_wheel) *) nm->timing_wheel, vlib_time_now
<http://rajith/lxr/http/ident?sn=vpp-19-08;i=vlib_time_now> (vm),
  nm->data_from_advancing_timing_wheel);

Secondly, during the debugging we got another crash (line 1904 of
vlib/main.c) below.
>From gdb we found that vec_len of nm->data_from_advancing_timing_wheel is
1. But nm->data_from_advancing_timing_wheel[0] = ~0.

1896 <http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1896>
  if (PREDICT_FALSE
<http://rajith/lxr/http/ident?sn=vpp-19-08;i=PREDICT_FALSE>1897
<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1897>
  (_vec_len
<http://rajith/lxr/http/ident?sn=vpp-19-08;i=_vec_len>
(nm->data_from_advancing_timing_wheel) > 0))1898
<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1898>
{1899 <http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1899>
  uword
<http://rajith/lxr/http/ident?sn=vpp-19-08;i=uword> i
<http://rajith/lxr/http/ident?sn=vpp-19-08;i=i>;1900
<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1900>
1901 <http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1901>
  for (i <http://rajith/lxr/http/ident?sn=vpp-19-08;i=i> =
0; i <http://rajith/lxr/http/ident?sn=vpp-19-08;i=i> < _vec_len
<http://rajith/lxr/http/ident?sn=vpp-19-08;i=_vec_len>
(nm->data_from_advancing_timing_wheel);1902
<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1902>
   i
<http://rajith/lxr/http/ident?sn=vpp-19-08;i=i>++)1903
<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1903>
{1904
<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1904>
  u32 <http://rajith/lxr/http/ident?sn=vpp-19-08;i=u32> d
= nm->data_from_advancing_timing_wheel[i
<http://rajith/lxr/http/ident?sn=vpp-19-08;i=i>];1905
<http://rajith/lxr/http/source/src/vlib/main.c?sn=vpp-19-08#L1905>
  u32 <http://rajith/lxr/http/ident?sn=vpp-19-08;i=u32> di
<http://rajith/lxr/http/ident?sn=vpp-19-08;i=di> =
vlib_timing_wheel_data_get_index
<http://rajith/lxr/http/ident?sn=vpp-19-08;i=vlib_timing_wheel_data_get_index>
(d);


Thanks,
Rajith

On Wed, Sep 2, 2020 at 8:15 PM Dave Barach (dbarach) 
wrote:

> It looks like vpp is crashing while expiring timers from the main thread
> process timer wheel. That’s not been reported before.
>
>
>
> You might want to dust off .../extras/deprecated/vlib/unix/cj.[ch], and
> make a circular log of timer pool_put operations to work out what’s
> happening.
>
>
>
> D.
>
>
>
> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Rajith
> PR via lists.fd.io
> *Sent:* Wednesday, September 2, 2020 9:42 AM
> *To:* Dave Barach (dbarach) 
> *Cc:* vpp-dev 
> *Subject:* Re: [vpp-dev]: Crash in Timer wheel infra
>
>
>
> Thanks Dave for the quick analysis. Are there some Debug CLIs that I can
> run to analyse?
>
> We are

Re: [vpp-dev]: Crash in Timer wheel infra

2020-09-02 Thread Rajith PR via lists.fd.io
Thanks Dave for the quick analysis. Are there some Debug CLIs that I can
run to analyse?
We are not using the VPP timers as we have our own timer library. In VPP,
we have added a couple of VPP nodes(process, internal and input). Could
these be causing the problem?

Thanks,
Rajith

On Wed, Sep 2, 2020 at 6:43 PM Dave Barach (dbarach) 
wrote:

> Given the amount of soak-time / perf/scale / stress testing which the
> tw_timer code has experienced, it’s reasonably likely that your application
> is responsible.
>
>
>
> Caution is required when dealing with timers other than the timer which
> has expired.
>
>
>
> If you have > 1 timer per object and you manipulate timer B when timer A
> expires, there’s no guarantee that timer B isn’t already on the expired
> timer list. That’s almost always good for trouble.
>
>
>
> HTH... Dave
>
>
>
> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Rajith
> PR via lists.fd.io
> *Sent:* Wednesday, September 2, 2020 12:39 AM
> *To:* vpp-dev 
> *Subject:* [vpp-dev]: Crash in Timer wheel infra
>
>
>
> Hi All,
>
>
>
> We are facing a crash in VPP's Timer wheel INFRA. Please find the details
> below.
>
>
>
> Version : *19.08*
>
> Configuration: *2 workers and the main thread.*
>
> Bactraces: thread apply all bt
>
>
>
> Thread 1 (Thread 0x7ff41d586d00 (LWP 253)):
>
> ---Type  to continue, or q  to quit---
>
> #0  0x7ff41c696722 in __GI___waitpid (pid=707,
>
> stat_loc=stat_loc@entry=0x7ff39f18ca18, options=options@entry=0)
>
> at ../sysdeps/unix/sysv/linux/waitpid.c:30
>
> #1  0x7ff41c601107 in do_system (line=)
>
> at ../sysdeps/posix/system.c:149
>
> #2  0x7ff41d11a76b in bd_signal_handler_cb (signo=6)
>
> at /development/librtbrickinfra/bd/src/bd.c:770
>
> #3  0x7ff410ce907b in rtb_bd_signal_handler (signo=6)
>
> at /development/libvpp/src/vlib/unix/main.c:80
>
> #4  0x7ff410ce9416 in unix_signal_handler (signum=6, si=0x7ff39f18d1f0,
>
> uc=0x7ff39f18d0c0) at /development/libvpp/src/vlib/unix/main.c:180
>
> #5  
>
> #6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
>
> #7  0x7ff41c5f28b1 in __GI_abort () at abort.c:79
>
> #8  0x7ff41248ee66 in os_panic ()
>
> at /development/libvpp/src/vpp/vnet/main.c:559
>
> #9  0x7ff410922825 in debugger ()
>
> at /development/libvpp/src/vppinfra/error.c:84
>
> #10 0x7ff410922bf4 in _clib_error (how_to_die=2, function_name=0x0,
>
> line_number=0, fmt=0x7ff4109e8a78 "%s:%d (%s) assertion `%s' fails")
>
> at /development/libvpp/src/vppinfra/error.c:143
>
> #11 0x7ff4109a64dd in tw_timer_expire_timers_internal_1t_3w_1024sl_ov (
>
> tw=0x7ff39fdf7a40, now=327.5993926951,
>
> ---Type  to continue, or q  to quit---
>
> callback_vector_arg=0x7ff39fdfab00)
>
> at /development/libvpp/src/vppinfra/tw_timer_template.c:753
>
> #12 0x7ff4109a6b36 in tw_timer_expire_timers_vec_1t_3w_1024sl_ov (
>
> tw=0x7ff39fdf7a40, now=327.5993926951, vec=0x7ff39fdfab00)
>
> at /development/libvpp/src/vppinfra/tw_timer_template.c:814
>
> #13 0x7ff410c8321a in vlib_main_or_worker_loop (
>
> vm=0x7ff410f22e40 , is_main=1)
>
> at /development/libvpp/src/vlib/main.c:1859
>
> #14 0x7ff410c83965 in vlib_main_loop (vm=0x7ff410f22e40 
> )
>
> at /development/libvpp/src/vlib/main.c:1930
>
> #15 0x7ff410c8462c in vlib_main (vm=0x7ff410f22e40 ,
>
> input=0x7ff39f18dfb0) at /development/libvpp/src/vlib/main.c:2147
>
> #16 0x7ff410ceabc9 in thread0 (arg=140686233054784)
>
> at /development/libvpp/src/vlib/unix/main.c:666
>
> #17 0x7ff410943600 in clib_calljmp ()
>
>from /usr/local/lib/libvppinfra.so.1.0.1
>
> #18 0x7ffe4d981390 in ?? ()
>
> #19 0x7ff410ceb13f in vlib_unix_main (argc=55, argv=0x556c398eb100)
>
> at /development/libvpp/src/vlib/unix/main.c:736
>
> #20 0x7ff41248e7cb in rtb_vpp_core_init (argc=55, argv=0x556c398eb100)
>
> at /development/libvpp/src/vpp/vnet/main.c:483
>
> #21 0x7ff41256189a in rtb_vpp_main ()
>
> at /development/libvpp/src/vpp/rtbrick/rtb_vpp_main.c:113
>
> ---Type  to continue, or q  to quit---
>
> #22 0x7ff41d11a15a in bd_load_daemon_lib (
>
> dmn_lib_cfg=0x7ff41d337860 )
>
> at /development/librtbrickinfra/bd/src/bd.c:627
>
> #23 0x7ff41d11a205 in bd_load_all_daemon_libs ()
>
> at /development/librtbrickinfra/bd/src/bd.c:646
>
> #24 0x7ff41d11b676 in bd_start_process ()
>
> at /development/librtbrickinfra/bd/src/bd.c:1128
>
> #25 0x000

[vpp-dev]: Crash in Timer wheel infra

2020-09-01 Thread Rajith PR via lists.fd.io
Hi All,

We are facing a crash in VPP's Timer wheel INFRA. Please find the details
below.

Version : *19.08*
Configuration: *2 workers and the main thread.*
Bactraces: thread apply all bt

Thread 1 (Thread 0x7ff41d586d00 (LWP 253)):
---Type  to continue, or q  to quit---
#0  0x7ff41c696722 in __GI___waitpid (pid=707,
stat_loc=stat_loc@entry=0x7ff39f18ca18, options=options@entry=0)
at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x7ff41c601107 in do_system (line=)
at ../sysdeps/posix/system.c:149
#2  0x7ff41d11a76b in bd_signal_handler_cb (signo=6)
at /development/librtbrickinfra/bd/src/bd.c:770
#3  0x7ff410ce907b in rtb_bd_signal_handler (signo=6)
at /development/libvpp/src/vlib/unix/main.c:80
#4  0x7ff410ce9416 in unix_signal_handler (signum=6, si=0x7ff39f18d1f0,
uc=0x7ff39f18d0c0) at /development/libvpp/src/vlib/unix/main.c:180
#5  
#6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#7  0x7ff41c5f28b1 in __GI_abort () at abort.c:79
#8  0x7ff41248ee66 in os_panic ()
at /development/libvpp/src/vpp/vnet/main.c:559
#9  0x7ff410922825 in debugger ()
at /development/libvpp/src/vppinfra/error.c:84
#10 0x7ff410922bf4 in _clib_error (how_to_die=2, function_name=0x0,
line_number=0, fmt=0x7ff4109e8a78 "%s:%d (%s) assertion `%s' fails")
at /development/libvpp/src/vppinfra/error.c:143
#11 0x7ff4109a64dd in tw_timer_expire_timers_internal_1t_3w_1024sl_ov (
tw=0x7ff39fdf7a40, now=327.5993926951,
---Type  to continue, or q  to quit---
callback_vector_arg=0x7ff39fdfab00)
at /development/libvpp/src/vppinfra/tw_timer_template.c:753
#12 0x7ff4109a6b36 in tw_timer_expire_timers_vec_1t_3w_1024sl_ov (
tw=0x7ff39fdf7a40, now=327.5993926951, vec=0x7ff39fdfab00)
at /development/libvpp/src/vppinfra/tw_timer_template.c:814
#13 0x7ff410c8321a in vlib_main_or_worker_loop (
vm=0x7ff410f22e40 , is_main=1)
at /development/libvpp/src/vlib/main.c:1859
#14 0x7ff410c83965 in vlib_main_loop (vm=0x7ff410f22e40 )
at /development/libvpp/src/vlib/main.c:1930
#15 0x7ff410c8462c in vlib_main (vm=0x7ff410f22e40 ,
input=0x7ff39f18dfb0) at /development/libvpp/src/vlib/main.c:2147
#16 0x7ff410ceabc9 in thread0 (arg=140686233054784)
at /development/libvpp/src/vlib/unix/main.c:666
#17 0x7ff410943600 in clib_calljmp ()
   from /usr/local/lib/libvppinfra.so.1.0.1
#18 0x7ffe4d981390 in ?? ()
#19 0x7ff410ceb13f in vlib_unix_main (argc=55, argv=0x556c398eb100)
at /development/libvpp/src/vlib/unix/main.c:736
#20 0x7ff41248e7cb in rtb_vpp_core_init (argc=55, argv=0x556c398eb100)
at /development/libvpp/src/vpp/vnet/main.c:483
#21 0x7ff41256189a in rtb_vpp_main ()
at /development/libvpp/src/vpp/rtbrick/rtb_vpp_main.c:113
---Type  to continue, or q  to quit---
#22 0x7ff41d11a15a in bd_load_daemon_lib (
dmn_lib_cfg=0x7ff41d337860 )
at /development/librtbrickinfra/bd/src/bd.c:627
#23 0x7ff41d11a205 in bd_load_all_daemon_libs ()
at /development/librtbrickinfra/bd/src/bd.c:646
#24 0x7ff41d11b676 in bd_start_process ()
at /development/librtbrickinfra/bd/src/bd.c:1128
#25 0x7ff419e92200 in bds_bd_init ()
at /development/librtbrickinfra/libbds/code/bds/src/bds.c:651
#26 0x7ff419f1aa5d in pubsub_bd_init_expiry (data=0x0)
at /development/librtbrickinfra/libbds/code/pubsub/src/pubsub_helper.c:1412
#27 0x7ff41cc23070 in timer_dispatch (item=0x556c39997cf0, p=QB_LOOP_HIGH)
at /development/librtbrickinfra/libqb/lib/loop_timerlist.c:56
#28 0x7ff41cc1f006 in qb_loop_run_level (level=0x556c366fb3e0)
at /development/librtbrickinfra/libqb/lib/loop.c:43
#29 0x7ff41cc1f77b in qb_loop_run (lp=0x556c366fb370)
at /development/librtbrickinfra/libqb/lib/loop.c:210
#30 0x7ff41cc30b3f in lib_qb_service_start_event_loop ()
at /development/librtbrickinfra/libqb/lib/wrapper/lib_qb_service.c:257
#31 0x556c358c7153 in main ()

Thread 11 (Thread 0x7ff35b622700 (LWP 413)):
#0  rtb_vpp_shm_rx_burst (port_id=3, queue_id=0, burst_size=64 '@')
at /development/libvpp/src/vpp/rtbrick/rtb_vpp_shm_node.c:317
#1  0x7ff4125ee043 in rtb_vpp_shm_device_input (vm=0x7ff39f89ac80,
shmm=0x7ff41285e180 , shmif=0x7ff39f8ad940,
node=0x7ff39d461480, frame=0x0, thread_index=2, queue_id=0)
at /development/libvpp/src/vpp/rtbrick/rtb_vpp_shm_node.c:359
#2  0x7ff4125ee839 in rtb_vpp_shm_input_node_fn (vm=0x7ff39f89ac80,
node=0x7ff39d461480, f=0x0)
at /development/libvpp/src/vpp/rtbrick/rtb_vpp_shm_node.c:452
#3  0x7ff410c80cef in dispatch_node (vm=0x7ff39f89ac80,
node=0x7ff39d461480, type=VLIB_NODE_TYPE_INPUT,
dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0,
last_time_stamp=11572457044265548)
at /development/libvpp/src/vlib/main.c:1207
#4  0x7ff410c82e50 in vlib_main_or_worker_loop (vm=0x7ff39f89ac80,
is_main=0) at /development/libvpp/src/vlib/main.c:1781
#5  0x7ff410c83985 in 

Re: [vpp-dev]: Trouble shooting low bandwidth of memif interface

2020-07-31 Thread Rajith PR via lists.fd.io
Thanks Jerome. I have pinned VPP thread to different cores (no isolation
though). And yes after migrating to tapv2 interface, *performance
significantly improved (75x times approx)*.
Do you have any documentation on how to fine tune the performance on multi
core in an lxc container.

Thanks,
Rajith

On Thu, Jul 30, 2020 at 5:06 PM Jerome Tollet (jtollet) 
wrote:

> Hello Rajith,
>
>1. are you making sure your vpp workers are not sharing same cores and
>are isolated?
>2. host-interfaces are slower than vpp tapv2 interfaces. Maybe you
>should try them.
>
> Jerome
>
>
>
>
>
>
>
> *De : * au nom de "Rajith PR via lists.fd.io"
> 
> *Répondre à : *"raj...@rtbrick.com" 
> *Date : *jeudi 30 juillet 2020 à 08:44
> *À : *vpp-dev 
> *Objet : *Re: [vpp-dev]: Trouble shooting low bandwidth of memif interface
>
>
>
> Looks like the image is not visible. Resending the topology diagram for
> reference.
>
>
>
>
>
>
>
>
>
> On Thu, Jul 30, 2020 at 11:44 AM Rajith PR via lists.fd.io  rtbrick@lists.fd.io> wrote:
>
> Hello Experts,
>
>
>
> I am trying to measure the performance of memif interface and getting a
> very low bandwidth(652Kbytes/sec).  I am new to performance tuning and any
> help on troubleshooting the issue would be very helpful.
>
>
>
> The test topology i am using is as below:
>
>
>
> [image: Image supprimée par l'expéditeur.]
>
>
>
> Basically, I have two lxc containers each hosting an instance of VPP. The
> VPP instances are connected using memif. On lxc-01 i run the iperf3 client
> that generates TCP traffic and on lxc-02 i run the iperf3 server. Linux
> veth pairs are used for interconnecting the iperf tool with VPP.
>
>
>
> *Test Environment:*
>
>
>
> *CPU Details:*
>
>
>
>  *-cpu
>   description: CPU
>   product: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
>   vendor: Intel Corp.
>   physical id: c
>   bus info: cpu@0
>   version: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
>   serial: None
>   slot: U3E1
>   size: 3100MHz
>   capacity: 3100MHz
>   width: 64 bits
>   clock: 100MHz
>   capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae
> mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse
> sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc art arch_perfmon
> pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb
> stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2
> smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt
> xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window
> hwp_epp md_clear flush_l1d cpufreq
>   configuration: cores=2 enabledcores=2 threads=4
>
>
>
> *VPP Configuration:*
>
>
>
> No workers. VPP main thread, iperf client and server are pinned to
> separate cores.
>
>
>
> *Test Results:*
>
>
>
> [11:36][ubuntu:~]$ iperf3 -s -B 200.1.1.1 -f K -A 3
>
> ---
> Server listening on 5201
> ---
> Accepted connection from 100.1.1.1, port 45188
> [  5] local 200.1.1.1 port 5201 connected to 100.1.1.1 port 45190
> [ ID] Interval   Transfer Bandwidth
> [  5]   0.00-1.00   sec   154 KBytes   154 KBytes/sec
> [  5]   1.00-2.00   sec   783 KBytes   784 KBytes/sec
> [  5]   2.00-3.00   sec   782 KBytes   782 KBytes/sec
> [  5]   3.00-4.00   sec   663 KBytes   663 KBytes/sec
> [  5]   4.00-5.00   sec   631 KBytes   631 KBytes/sec
> [  5]   5.00-6.00   sec   677 KBytes   677 KBytes/sec
> [  5]   6.00-7.00   sec   693 KBytes   693 KBytes/sec
> [  5]   7.00-8.00   sec   706 KBytes   706 KBytes/sec
> [  5]   8.00-9.00   sec   672 KBytes   672 KBytes/sec
> [  5]   9.00-10.00  sec   764 KBytes   764 KBytes/sec
> [  5]  10.00-10.04  sec  21.2 KBytes   504 KBytes/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth
> [  5]   0.00-10.04  sec  0.00 Bytes  0.00 KBytes/sec
>  sender
> [  5]   0.00-10.04  sec  6.39 MBytes   652 KBytes/sec
>  receiver
> ---
> Server listening on 5201
> ---
>
>
>
>
>
> [11

Re: [vpp-dev]: Trouble shooting low bandwidth of memif interface

2020-07-30 Thread Rajith PR via lists.fd.io
Looks like the image is not visible. Resending the topology diagram for
reference.


[image: iperf_memif.png]


On Thu, Jul 30, 2020 at 11:44 AM Rajith PR via lists.fd.io  wrote:

> Hello Experts,
>
> I am trying to measure the performance of memif interface and getting a
> very low bandwidth(652Kbytes/sec).  I am new to performance tuning and any
> help on troubleshooting the issue would be very helpful.
>
> The test topology i am using is as below:
>
>
>
> Basically, I have two lxc containers each hosting an instance of VPP. The
> VPP instances are connected using memif. On lxc-01 i run the iperf3 client
> that generates TCP traffic and on lxc-02 i run the iperf3 server. Linux
> veth pairs are used for interconnecting the iperf tool with VPP.
>
> *Test Environment:*
>
> *CPU Details:*
>
>  *-cpu
>   description: CPU
>   product: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
>   vendor: Intel Corp.
>   physical id: c
>   bus info: cpu@0
>   version: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
>   serial: None
>   slot: U3E1
>   size: 3100MHz
>   capacity: 3100MHz
>   width: 64 bits
>   clock: 100MHz
>   capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae
> mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse
> sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc art arch_perfmon
> pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
> lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb
> stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2
> smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt
> xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window
> hwp_epp md_clear flush_l1d cpufreq
>   configuration: cores=2 enabledcores=2 threads=4
>
> *VPP Configuration:*
>
> No workers. VPP main thread, iperf client and server are pinned to
> separate cores.
>
> *Test Results:*
>
> [11:36][ubuntu:~]$ iperf3 -s -B 200.1.1.1 -f K -A 3
> ---
> Server listening on 5201
> ---
> Accepted connection from 100.1.1.1, port 45188
> [  5] local 200.1.1.1 port 5201 connected to 100.1.1.1 port 45190
> [ ID] Interval   Transfer Bandwidth
> [  5]   0.00-1.00   sec   154 KBytes   154 KBytes/sec
> [  5]   1.00-2.00   sec   783 KBytes   784 KBytes/sec
> [  5]   2.00-3.00   sec   782 KBytes   782 KBytes/sec
> [  5]   3.00-4.00   sec   663 KBytes   663 KBytes/sec
> [  5]   4.00-5.00   sec   631 KBytes   631 KBytes/sec
> [  5]   5.00-6.00   sec   677 KBytes   677 KBytes/sec
> [  5]   6.00-7.00   sec   693 KBytes   693 KBytes/sec
> [  5]   7.00-8.00   sec   706 KBytes   706 KBytes/sec
> [  5]   8.00-9.00   sec   672 KBytes   672 KBytes/sec
> [  5]   9.00-10.00  sec   764 KBytes   764 KBytes/sec
> [  5]  10.00-10.04  sec  21.2 KBytes   504 KBytes/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth
> [  5]   0.00-10.04  sec  0.00 Bytes  0.00 KBytes/sec
>  sender
> [  5]   0.00-10.04  sec  6.39 MBytes   652 KBytes/sec
>  receiver
> ---
> Server listening on 5201
> ---
>
>
> [11:36][ubuntu:~]$ sudo iperf3 -c 200.1.1.1  -A 2
> Connecting to host 200.1.1.1, port 5201
> [  4] local 100.1.1.1 port 45190 connected to 200.1.1.1 port 5201
> [ ID] Interval   Transfer Bandwidth   Retr  Cwnd
> [  4]   0.00-1.00   sec   281 KBytes  2.30 Mbits/sec   44   2.83 KBytes
>
> [  4]   1.00-2.00   sec   807 KBytes  6.62 Mbits/sec  124   5.66 KBytes
>
> [  4]   2.00-3.00   sec   737 KBytes  6.04 Mbits/sec  136   5.66 KBytes
>
> [  4]   3.00-4.00   sec   720 KBytes  5.90 Mbits/sec  130   5.66 KBytes
>
> [  4]   4.00-5.00   sec   574 KBytes  4.70 Mbits/sec  134   5.66 KBytes
>
> [  4]   5.00-6.00   sec   720 KBytes  5.90 Mbits/sec  120   7.07 KBytes
>
> [  4]   6.00-7.00   sec   666 KBytes  5.46 Mbits/sec  134   5.66 KBytes
>
> [  4]   7.00-8.00   sec   741 KBytes  6.07 Mbits/sec  124   5.66 KBytes
>
> [  4]   8.00-9.00   sec   660 KBytes  5.41 Mbits/sec  128   4.24 KBytes
>
> [  4]   9.00-10.00  sec   740 KBytes  6.05 Mbits/sec  130   4.24 KBytes
>
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval   Transfer Bandwidth

[vpp-dev]: Trouble shooting low bandwidth of memif interface

2020-07-30 Thread Rajith PR via lists.fd.io
Hello Experts,

I am trying to measure the performance of memif interface and getting a
very low bandwidth(652Kbytes/sec).  I am new to performance tuning and any
help on troubleshooting the issue would be very helpful.

The test topology i am using is as below:



Basically, I have two lxc containers each hosting an instance of VPP. The
VPP instances are connected using memif. On lxc-01 i run the iperf3 client
that generates TCP traffic and on lxc-02 i run the iperf3 server. Linux
veth pairs are used for interconnecting the iperf tool with VPP.

*Test Environment:*

*CPU Details:*

 *-cpu
  description: CPU
  product: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
  vendor: Intel Corp.
  physical id: c
  bus info: cpu@0
  version: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
  serial: None
  slot: U3E1
  size: 3100MHz
  capacity: 3100MHz
  width: 64 bits
  clock: 100MHz
  capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae
mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse
sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp constant_tsc art arch_perfmon
pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1
sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand
lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb
stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2
smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt
xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window
hwp_epp md_clear flush_l1d cpufreq
  configuration: cores=2 enabledcores=2 threads=4

*VPP Configuration:*

No workers. VPP main thread, iperf client and server are pinned to separate
cores.

*Test Results:*

[11:36][ubuntu:~]$ iperf3 -s -B 200.1.1.1 -f K -A 3
---
Server listening on 5201
---
Accepted connection from 100.1.1.1, port 45188
[  5] local 200.1.1.1 port 5201 connected to 100.1.1.1 port 45190
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-1.00   sec   154 KBytes   154 KBytes/sec
[  5]   1.00-2.00   sec   783 KBytes   784 KBytes/sec
[  5]   2.00-3.00   sec   782 KBytes   782 KBytes/sec
[  5]   3.00-4.00   sec   663 KBytes   663 KBytes/sec
[  5]   4.00-5.00   sec   631 KBytes   631 KBytes/sec
[  5]   5.00-6.00   sec   677 KBytes   677 KBytes/sec
[  5]   6.00-7.00   sec   693 KBytes   693 KBytes/sec
[  5]   7.00-8.00   sec   706 KBytes   706 KBytes/sec
[  5]   8.00-9.00   sec   672 KBytes   672 KBytes/sec
[  5]   9.00-10.00  sec   764 KBytes   764 KBytes/sec
[  5]  10.00-10.04  sec  21.2 KBytes   504 KBytes/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 KBytes/sec  sender
[  5]   0.00-10.04  sec  6.39 MBytes   652 KBytes/sec
 receiver
---
Server listening on 5201
---


[11:36][ubuntu:~]$ sudo iperf3 -c 200.1.1.1  -A 2
Connecting to host 200.1.1.1, port 5201
[  4] local 100.1.1.1 port 45190 connected to 200.1.1.1 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-1.00   sec   281 KBytes  2.30 Mbits/sec   44   2.83 KBytes

[  4]   1.00-2.00   sec   807 KBytes  6.62 Mbits/sec  124   5.66 KBytes

[  4]   2.00-3.00   sec   737 KBytes  6.04 Mbits/sec  136   5.66 KBytes

[  4]   3.00-4.00   sec   720 KBytes  5.90 Mbits/sec  130   5.66 KBytes

[  4]   4.00-5.00   sec   574 KBytes  4.70 Mbits/sec  134   5.66 KBytes

[  4]   5.00-6.00   sec   720 KBytes  5.90 Mbits/sec  120   7.07 KBytes

[  4]   6.00-7.00   sec   666 KBytes  5.46 Mbits/sec  134   5.66 KBytes

[  4]   7.00-8.00   sec   741 KBytes  6.07 Mbits/sec  124   5.66 KBytes

[  4]   8.00-9.00   sec   660 KBytes  5.41 Mbits/sec  128   4.24 KBytes

[  4]   9.00-10.00  sec   740 KBytes  6.05 Mbits/sec  130   4.24 KBytes

- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval   Transfer Bandwidth   Retr
[  4]   0.00-10.00  sec  6.49 MBytes  5.44 Mbits/sec  1204
sender
[  4]   0.00-10.00  sec  6.39 MBytes  5.36 Mbits/sec
 receiver

iperf Done.

Thanks,
Rajith
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17115): https://lists.fd.io/g/vpp-dev/message/17115
Mute This Topic: https://lists.fd.io/mt/75881323/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev]: ASSERT in load_balance_get()

2020-07-17 Thread Rajith PR via lists.fd.io
Hi All,

The crash has occurred again in load_balance_get().  This
time lbi=320017171. May be an invalid value??. Also, this time the flow is
from mpls_lookup_node_fn_avx2().
Any pointes to fix the issue would be very helpful.

Thread 10 (Thread 0x7fc1c089b700 (LWP 438)):
#0  0x7fc27b1a8722 in __GI___waitpid (pid=5267,
stat_loc=stat_loc@entry=0x7fc1fefc2a18,
options=options@entry=0)
at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x7fc27b113107 in do_system (line=) at
../sysdeps/posix/system.c:149
#2  0x7fc27bc2c76b in bd_signal_handler_cb (signo=6) at
/development/librtbrickinfra/bd/src/bd.c:770
#3  0x7fc26f5670ac in rtb_bd_signal_handler (signo=6) at
/development/libvpp/src/vlib/unix/main.c:80
#4  0x7fc26f567447 in unix_signal_handler (signum=6, si=0x7fc1fefc31f0,
uc=0x7fc1fefc30c0)
at /development/libvpp/src/vlib/unix/main.c:180
#5  
#6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#7  0x7fc27b1048b1 in __GI_abort () at abort.c:79
#8  0x7fc270d09f86 in os_panic () at
/development/libvpp/src/vpp/vnet/main.c:559
#9  0x7fc26f1a0825 in debugger () at
/development/libvpp/src/vppinfra/error.c:84
#10 0x7fc26f1a0bf4 in _clib_error (how_to_die=2, function_name=0x0,
line_number=0, fmt=0x7fc27055f7a8 "%s:%d (%s) assertion `%s' fails")
at /development/libvpp/src/vppinfra/error.c:143
#11 0x7fc26f95e36d in load_balance_get (lbi=320017171) at
/development/libvpp/src/vnet/dpo/load_balance.h:222
#12 0x7fc26f95fdeb in mpls_lookup_node_fn_avx2 (vm=0x7fc1feb95280,
node=0x7fc1fe770300, from_frame=0x7fc1fea9bd80)
at /development/libvpp/src/vnet/mpls/mpls_lookup.c:396
#13 0x7fc26f4fecef in dispatch_node (vm=0x7fc1feb95280,
node=0x7fc1fe770300, type=VLIB_NODE_TYPE_INTERNAL,
dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x7fc1fea9bd80,
last_time_stamp=959252489496170)
at /development/libvpp/src/vlib/main.c:1207
#14 0x7fc26f4ff4aa in dispatch_pending_node (vm=0x7fc1feb95280,
pending_frame_index=2, last_time_stamp=959252489496170)
at /development/libvpp/src/vlib/main.c:1375
#15 0x7fc26f5010ee in vlib_main_or_worker_loop (vm=0x7fc1feb95280,
is_main=0) at /development/libvpp/src/vlib/main.c:1826
#16 0x7fc26f501971 in vlib_worker_loop (vm=0x7fc1feb95280) at
/development/libvpp/src/vlib/main.c:1934
#17 0x7fc26f54069b in vlib_worker_thread_fn (arg=0x7fc1fc320b40) at
/development/libvpp/src/vlib/threads.c:1803
#18 0x7fc26f1c1600 in clib_calljmp () from
/usr/local/lib/libvppinfra.so.1.0.1
#19 0x7fc1c089aec0 in ?? ()
#20 0x7fc26f53ac32 in vlib_worker_thread_bootstrap_fn
(arg=0x7fc1fc320b40) at /development/libvpp/src/vlib/threads.c:573

Thanks,
Rajith

On Tue, Jul 7, 2020 at 6:28 PM Rajith PR via lists.fd.io  wrote:

> Hi Benoit,
>
> I have all those fixes. I had reported this issue (27407), the others i
> found during my tests and added barrier protection in all those places.
> This ASSERT seems to be not due to pool expansion, as what I have seen
> mostly in those cases is an invalid pointer access causing a SIGSEGV.
> Here its an abort triggered from *pool_elt_at_index *as the index is
> already freed.
>
> Is it not possible that the main thread freed the pool entry, while the
> worker thread was holding an index. The backtraces I have attached indicate
> that the main thread was freeing fib entries.
>
> Thanks,
> Rajith
>
>
> On Tue, Jul 7, 2020 at 5:50 PM Benoit Ganne (bganne) 
> wrote:
>
>> Hi Rajith,
>>
>> You are probably missing https://gerrit.fd.io/r/c/vpp/+/27407
>> https://gerrit.fd.io/r/c/vpp/+/27454 and maybe
>> https://gerrit.fd.io/r/c/vpp/+/27448
>>
>> Best
>> ben
>>
>> > -Original Message-
>> > From: vpp-dev@lists.fd.io  On Behalf Of Rajith PR
>> via
>> > lists.fd.io
>> > Sent: mardi 7 juillet 2020 14:11
>> > To: vpp-dev 
>> > Subject: [vpp-dev]: ASSERT in load_balance_get()
>> >
>> > Hi All,
>> >
>> > During our scale testing of routes we have hit an ASSERT in
>> > load_balance_get() .  From the code it looks like the lb_index(148)
>> > referred to is already returned to the pool by the main thread causing
>> the
>> > ASSERT in the worker. The version in 19.08.  We have two workers and a
>> > main thread.
>> >
>> > Any inputs to fix the issue is highly appreciated.
>> >
>> > The complete bt is pasted below:
>> >
>> > Thread 11 (Thread 0x7f988ebe3700 (LWP 398)):
>> > #0  0x7f99464cef54 in vlib_worker_thread_barrier_check () at
>> > /development/libvpp/src/vlib/threads.h:430
>> > #1  0x7f99464d6b9b in vlib_main_or_worker_loop (vm=0x7f98d8ba3540,
>> > is_main=0) at /development/libvpp/src/vlib/main.c:1744
>> > #2  0x7f99

Re: [vpp-dev]: ASSERT in load_balance_get()

2020-07-07 Thread Rajith PR via lists.fd.io
Hi Benoit,

I have all those fixes. I had reported this issue (27407), the others i
found during my tests and added barrier protection in all those places.
This ASSERT seems to be not due to pool expansion, as what I have seen
mostly in those cases is an invalid pointer access causing a SIGSEGV.
Here its an abort triggered from *pool_elt_at_index *as the index is
already freed.

Is it not possible that the main thread freed the pool entry, while the
worker thread was holding an index. The backtraces I have attached indicate
that the main thread was freeing fib entries.

Thanks,
Rajith


On Tue, Jul 7, 2020 at 5:50 PM Benoit Ganne (bganne) 
wrote:

> Hi Rajith,
>
> You are probably missing https://gerrit.fd.io/r/c/vpp/+/27407
> https://gerrit.fd.io/r/c/vpp/+/27454 and maybe
> https://gerrit.fd.io/r/c/vpp/+/27448
>
> Best
> ben
>
> > -Original Message-
> > From: vpp-dev@lists.fd.io  On Behalf Of Rajith PR
> via
> > lists.fd.io
> > Sent: mardi 7 juillet 2020 14:11
> > To: vpp-dev 
> > Subject: [vpp-dev]: ASSERT in load_balance_get()
> >
> > Hi All,
> >
> > During our scale testing of routes we have hit an ASSERT in
> > load_balance_get() .  From the code it looks like the lb_index(148)
> > referred to is already returned to the pool by the main thread causing
> the
> > ASSERT in the worker. The version in 19.08.  We have two workers and a
> > main thread.
> >
> > Any inputs to fix the issue is highly appreciated.
> >
> > The complete bt is pasted below:
> >
> > Thread 11 (Thread 0x7f988ebe3700 (LWP 398)):
> > #0  0x7f99464cef54 in vlib_worker_thread_barrier_check () at
> > /development/libvpp/src/vlib/threads.h:430
> > #1  0x7f99464d6b9b in vlib_main_or_worker_loop (vm=0x7f98d8ba3540,
> > is_main=0) at /development/libvpp/src/vlib/main.c:1744
> > #2  0x7f99464d7971 in vlib_worker_loop (vm=0x7f98d8ba3540) at
> > /development/libvpp/src/vlib/main.c:1934
> > #3  0x7f994651669b in vlib_worker_thread_fn (arg=0x7f98d6191a40) at
> > /development/libvpp/src/vlib/threads.c:1803
> > #4  0x7f9946197600 in clib_calljmp () from
> > /usr/local/lib/libvppinfra.so.1.0.1
> > #5  0x7f988ebe2ec0 in ?? ()
> > #6  0x7f9946510c32 in vlib_worker_thread_bootstrap_fn
> > (arg=0x7f98d6191a40) at /development/libvpp/src/vlib/threads.c:573
> > Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> >
> > Thread 10 (Thread 0x7f988f3e4700 (LWP 397)):
> > #0  0x7f995217d722 in __GI___waitpid (pid=2595,
> > stat_loc=stat_loc@entry=0x7f98d8fd0118, options=options@entry=0) at
> > ../sysdeps/unix/sysv/linux/waitpid.c:30
> > #1  0x7f99520e8107 in do_system (line=) at
> > ../sysdeps/posix/system.c:149
> > #2  0x7f9952c016ca in bd_signal_handler_cb (signo=6) at
> > /development/librtbrickinfra/bd/src/bd.c:770
> > #3  0x7f994653d0ac in rtb_bd_signal_handler (signo=6) at
> > /development/libvpp/src/vlib/unix/main.c:80
> > #4  0x7f994653d447 in unix_signal_handler (signum=6,
> > si=0x7f98d8fd08f0, uc=0x7f98d8fd07c0) at
> > /development/libvpp/src/vlib/unix/main.c:180
> > #5  
> > #6  __GI_raise (sig=sig@entry=6) at
> ../sysdeps/unix/sysv/linux/raise.c:51
> > #7  0x7f99520d98b1 in __GI_abort () at abort.c:79
> > #8  0x7f9947cdec86 in os_panic () at
> > /development/libvpp/src/vpp/vnet/main.c:559
> > #9  0x7f9946176825 in debugger () at
> > /development/libvpp/src/vppinfra/error.c:84
> > #10 0x7f9946176bf4 in _clib_error (how_to_die=2, function_name=0x0,
> > line_number=0, fmt=0x7f99475271b8 "%s:%d (%s) assertion `%s' fails")
> > at /development/libvpp/src/vppinfra/error.c:143
> > #11 0x7f99468d046c in load_balance_get (lbi=148) at
> > /development/libvpp/src/vnet/dpo/load_balance.h:222
> > #12 0x7f99468d4d44 in ip4_local_check_src (b=0x1002535e00,
> > ip0=0x1002535f52, last_check=0x7f98d8fd1234, error0=0x7f98d8fd11e8
> > "\016\r")
> > at /development/libvpp/src/vnet/ip/ip4_forward.c:1583
> > #13 0x7f99468d58e1 in ip4_local_inline (vm=0x7f98d8ba2e40,
> > node=0x7f98d8711f40, frame=0x7f98d9585bc0, head_of_feature_arc=1)
> > at /development/libvpp/src/vnet/ip/ip4_forward.c:1870
> > #14 0x7f99468d5a08 in ip4_local_node_fn_avx2 (vm=0x7f98d8ba2e40,
> > node=0x7f98d8711f40, frame=0x7f98d9585bc0) at
> > /development/libvpp/src/vnet/ip/ip4_forward.c:1889
> > #15 0x7f99464d4cef in dispatch_node (vm=0x7f98d8ba2e40,
> > node=0x7f98d8711f40, type=VLIB_NODE_TYPE_INTERNAL,
> > dispatch_state=VLIB_NODE_STATE_POLLING,
> > frame=0x7f98d9585bc0, last_time

[vpp-dev]: ASSERT in load_balance_get()

2020-07-07 Thread Rajith PR via lists.fd.io
Hi All,

During our scale testing of routes we have hit an ASSERT in *load_balance_get()
. * From the code it looks like the lb_index(148) referred to is already
returned to the pool by the main thread causing the ASSERT in the worker.
The version in *19.08.  *We have two workers and a main thread.

Any inputs to fix the issue is highly appreciated.

The complete bt is pasted below:

Thread 11 (Thread 0x7f988ebe3700 (LWP 398)):
#0  0x7f99464cef54 in vlib_worker_thread_barrier_check () at
/development/libvpp/src/vlib/threads.h:430
#1  0x7f99464d6b9b in vlib_main_or_worker_loop (vm=0x7f98d8ba3540,
is_main=0) at /development/libvpp/src/vlib/main.c:1744
#2  0x7f99464d7971 in vlib_worker_loop (vm=0x7f98d8ba3540) at
/development/libvpp/src/vlib/main.c:1934
#3  0x7f994651669b in vlib_worker_thread_fn (arg=0x7f98d6191a40) at
/development/libvpp/src/vlib/threads.c:1803
#4  0x7f9946197600 in clib_calljmp () from
/usr/local/lib/libvppinfra.so.1.0.1
#5  0x7f988ebe2ec0 in ?? ()
#6  0x7f9946510c32 in vlib_worker_thread_bootstrap_fn
(arg=0x7f98d6191a40) at /development/libvpp/src/vlib/threads.c:573
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 10 (Thread 0x7f988f3e4700 (LWP 397)):
#0  0x7f995217d722 in __GI___waitpid (pid=2595,
stat_loc=stat_loc@entry=0x7f98d8fd0118,
options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x7f99520e8107 in do_system (line=) at
../sysdeps/posix/system.c:149
#2  0x7f9952c016ca in bd_signal_handler_cb (signo=6) at
/development/librtbrickinfra/bd/src/bd.c:770
#3  0x7f994653d0ac in rtb_bd_signal_handler (signo=6) at
/development/libvpp/src/vlib/unix/main.c:80
#4  0x7f994653d447 in unix_signal_handler (signum=6, si=0x7f98d8fd08f0,
uc=0x7f98d8fd07c0) at /development/libvpp/src/vlib/unix/main.c:180
#5  
#6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#7  0x7f99520d98b1 in __GI_abort () at abort.c:79
#8  0x7f9947cdec86 in os_panic () at
/development/libvpp/src/vpp/vnet/main.c:559
#9  0x7f9946176825 in debugger () at
/development/libvpp/src/vppinfra/error.c:84
#10 0x7f9946176bf4 in _clib_error (how_to_die=2, function_name=0x0,
line_number=0, fmt=0x7f99475271b8 "%s:%d (%s) assertion `%s' fails")
at /development/libvpp/src/vppinfra/error.c:143
*#11 0x7f99468d046c in load_balance_get (lbi=148) at
/development/libvpp/src/vnet/dpo/load_balance.h:222*
#12 0x7f99468d4d44 in ip4_local_check_src (b=0x1002535e00,
ip0=0x1002535f52, last_check=0x7f98d8fd1234, error0=0x7f98d8fd11e8 "\016\r")
at /development/libvpp/src/vnet/ip/ip4_forward.c:1583
#13 0x7f99468d58e1 in ip4_local_inline (vm=0x7f98d8ba2e40,
node=0x7f98d8711f40, frame=0x7f98d9585bc0, head_of_feature_arc=1)
at /development/libvpp/src/vnet/ip/ip4_forward.c:1870
#14 0x7f99468d5a08 in ip4_local_node_fn_avx2 (vm=0x7f98d8ba2e40,
node=0x7f98d8711f40, frame=0x7f98d9585bc0) at
/development/libvpp/src/vnet/ip/ip4_forward.c:1889
#15 0x7f99464d4cef in dispatch_node (vm=0x7f98d8ba2e40,
node=0x7f98d8711f40, type=VLIB_NODE_TYPE_INTERNAL,
dispatch_state=VLIB_NODE_STATE_POLLING,
frame=0x7f98d9585bc0, last_time_stamp=531887125605324) at
/development/libvpp/src/vlib/main.c:1207
#16 0x7f99464d54aa in dispatch_pending_node (vm=0x7f98d8ba2e40,
pending_frame_index=3, last_time_stamp=531887125605324) at
/development/libvpp/src/vlib/main.c:1375
#17 0x7f99464d70ee in vlib_main_or_worker_loop (vm=0x7f98d8ba2e40,
is_main=0) at /development/libvpp/src/vlib/main.c:1826
#18 0x7f99464d7971 in vlib_worker_loop (vm=0x7f98d8ba2e40) at
/development/libvpp/src/vlib/main.c:1934
#19 0x7f994651669b in vlib_worker_thread_fn (arg=0x7f98d6191940) at
/development/libvpp/src/vlib/threads.c:1803
#20 0x7f9946197600 in clib_calljmp () from
/usr/local/lib/libvppinfra.so.1.0.1
#21 0x7f988f3e3ec0 in ?? ()
#22 0x7f9946510c32 in vlib_worker_thread_bootstrap_fn
(arg=0x7f98d6191940) at /development/libvpp/src/vlib/threads.c:573
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 1 (Thread 0x7f995306c740 (LWP 249)):
#0  0x7f994650dd9e in clib_time_now (c=0x7f9946776e40
) at /development/libvpp/src/vppinfra/time.h:217
#1  0x7f994650de74 in vlib_time_now (vm=0x7f9946776e40
) at /development/libvpp/src/vlib/main.h:268
#2  0x7f99465159ab in vlib_worker_thread_barrier_sync_int
(vm=0x7f9946776e40 ,
func_name=0x7f994764e2d0 <__FUNCTION__.42472> "adj_last_lock_gone") at
/development/libvpp/src/vlib/threads.c:1486
#3  0x7f994743fb9f in adj_last_lock_gone (adj=0x7f98d5f0fc00) at
/development/libvpp/src/vnet/adj/adj.c:256
#4  0x7f994744062e in adj_node_last_lock_gone (node=0x7f98d5f0fc00) at
/development/libvpp/src/vnet/adj/adj.c:546
#5  0x7f99473e74be in fib_node_unlock (node=0x7f98d5f0fc00) at
/development/libvpp/src/vnet/fib/fib_node.c:215
#6  0x7f9947440186 in adj_unlock (adj_index=37) at
/development/libvpp/src/vnet/adj/adj.c:346
#7  0x7f99474277ba 

[vpp-dev] ASSERT in load_balance_get()

2020-07-07 Thread Rajith PR via lists.fd.io
Hi All,

During our scale testing of routes we have hit an ASSERT in *load_balance_get()
. * From the code it looks like the lb_index(148) referred to is already
returned to the pool by the main thread causing the ASSERT in the worker.
The version in *19.08.  *We have two workers and a main thread.

Any inputs to fix the issue is highly appreciated.

The complete bt is pasted below:

Thread 11 (Thread 0x7f988ebe3700 (LWP 398)):
#0  0x7f99464cef54 in vlib_worker_thread_barrier_check () at
/development/libvpp/src/vlib/threads.h:430
#1  0x7f99464d6b9b in vlib_main_or_worker_loop (vm=0x7f98d8ba3540,
is_main=0) at /development/libvpp/src/vlib/main.c:1744
#2  0x7f99464d7971 in vlib_worker_loop (vm=0x7f98d8ba3540) at
/development/libvpp/src/vlib/main.c:1934
#3  0x7f994651669b in vlib_worker_thread_fn (arg=0x7f98d6191a40) at
/development/libvpp/src/vlib/threads.c:1803
#4  0x7f9946197600 in clib_calljmp () from
/usr/local/lib/libvppinfra.so.1.0.1
#5  0x7f988ebe2ec0 in ?? ()
#6  0x7f9946510c32 in vlib_worker_thread_bootstrap_fn
(arg=0x7f98d6191a40) at /development/libvpp/src/vlib/threads.c:573
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 10 (Thread 0x7f988f3e4700 (LWP 397)):
#0  0x7f995217d722 in __GI___waitpid (pid=2595,
stat_loc=stat_loc@entry=0x7f98d8fd0118,
options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x7f99520e8107 in do_system (line=) at
../sysdeps/posix/system.c:149
#2  0x7f9952c016ca in bd_signal_handler_cb (signo=6) at
/development/librtbrickinfra/bd/src/bd.c:770
#3  0x7f994653d0ac in rtb_bd_signal_handler (signo=6) at
/development/libvpp/src/vlib/unix/main.c:80
#4  0x7f994653d447 in unix_signal_handler (signum=6, si=0x7f98d8fd08f0,
uc=0x7f98d8fd07c0) at /development/libvpp/src/vlib/unix/main.c:180
#5  
#6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#7  0x7f99520d98b1 in __GI_abort () at abort.c:79
#8  0x7f9947cdec86 in os_panic () at
/development/libvpp/src/vpp/vnet/main.c:559
#9  0x7f9946176825 in debugger () at
/development/libvpp/src/vppinfra/error.c:84
#10 0x7f9946176bf4 in _clib_error (how_to_die=2, function_name=0x0,
line_number=0, fmt=0x7f99475271b8 "%s:%d (%s) assertion `%s' fails")
at /development/libvpp/src/vppinfra/error.c:143
*#11 0x7f99468d046c in load_balance_get (lbi=148) at
/development/libvpp/src/vnet/dpo/load_balance.h:222*
#12 0x7f99468d4d44 in ip4_local_check_src (b=0x1002535e00,
ip0=0x1002535f52, last_check=0x7f98d8fd1234, error0=0x7f98d8fd11e8 "\016\r")
at /development/libvpp/src/vnet/ip/ip4_forward.c:1583
#13 0x7f99468d58e1 in ip4_local_inline (vm=0x7f98d8ba2e40,
node=0x7f98d8711f40, frame=0x7f98d9585bc0, head_of_feature_arc=1)
at /development/libvpp/src/vnet/ip/ip4_forward.c:1870
#14 0x7f99468d5a08 in ip4_local_node_fn_avx2 (vm=0x7f98d8ba2e40,
node=0x7f98d8711f40, frame=0x7f98d9585bc0) at
/development/libvpp/src/vnet/ip/ip4_forward.c:1889
#15 0x7f99464d4cef in dispatch_node (vm=0x7f98d8ba2e40,
node=0x7f98d8711f40, type=VLIB_NODE_TYPE_INTERNAL,
dispatch_state=VLIB_NODE_STATE_POLLING,
frame=0x7f98d9585bc0, last_time_stamp=531887125605324) at
/development/libvpp/src/vlib/main.c:1207
#16 0x7f99464d54aa in dispatch_pending_node (vm=0x7f98d8ba2e40,
pending_frame_index=3, last_time_stamp=531887125605324) at
/development/libvpp/src/vlib/main.c:1375
#17 0x7f99464d70ee in vlib_main_or_worker_loop (vm=0x7f98d8ba2e40,
is_main=0) at /development/libvpp/src/vlib/main.c:1826
#18 0x7f99464d7971 in vlib_worker_loop (vm=0x7f98d8ba2e40) at
/development/libvpp/src/vlib/main.c:1934
#19 0x7f994651669b in vlib_worker_thread_fn (arg=0x7f98d6191940) at
/development/libvpp/src/vlib/threads.c:1803
#20 0x7f9946197600 in clib_calljmp () from
/usr/local/lib/libvppinfra.so.1.0.1
#21 0x7f988f3e3ec0 in ?? ()
#22 0x7f9946510c32 in vlib_worker_thread_bootstrap_fn
(arg=0x7f98d6191940) at /development/libvpp/src/vlib/threads.c:573
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 1 (Thread 0x7f995306c740 (LWP 249)):
#0  0x7f994650dd9e in clib_time_now (c=0x7f9946776e40
) at /development/libvpp/src/vppinfra/time.h:217
#1  0x7f994650de74 in vlib_time_now (vm=0x7f9946776e40
) at /development/libvpp/src/vlib/main.h:268
#2  0x7f99465159ab in vlib_worker_thread_barrier_sync_int
(vm=0x7f9946776e40 ,
func_name=0x7f994764e2d0 <__FUNCTION__.42472> "adj_last_lock_gone") at
/development/libvpp/src/vlib/threads.c:1486
#3  0x7f994743fb9f in adj_last_lock_gone (adj=0x7f98d5f0fc00) at
/development/libvpp/src/vnet/adj/adj.c:256
#4  0x7f994744062e in adj_node_last_lock_gone (node=0x7f98d5f0fc00) at
/development/libvpp/src/vnet/adj/adj.c:546
#5  0x7f99473e74be in fib_node_unlock (node=0x7f98d5f0fc00) at
/development/libvpp/src/vnet/fib/fib_node.c:215
#6  0x7f9947440186 in adj_unlock (adj_index=37) at
/development/libvpp/src/vnet/adj/adj.c:346
#7  0x7f99474277ba 

[vpp-dev] ASSERT in arp_mk_reply

2020-06-28 Thread Rajith PR via lists.fd.io
Hi All,

We are seeing  *ASSERT (vec_len (hw_if0->hw_address) == 6);* being hit in
*arp_mk_reply*() . This is happening on *19.08. *
We are having worker threads and a main thread.

As such , the hw_if0 appears to be valid(the pointer and content).
*But the length of the vector is 15. *
I have attached some information from the core file.

Thread 11 (Thread 0x7f085f7fe700 (LWP 261)):
#0  vlib_get_trace_count (vm=0x7f08bc310480, rt=0x7f08b9a3d0c0) at
/development/libvpp/src/vlib/trace_funcs.h:177
#1  0x7f092b98082c in rtb_vpp_shm_device_input (vm=0x7f08bc310480,
shmm=0x7f092bbe9980 , shmif=0x7f08bbdf93c0,
node=0x7f08b9a3d0c0,
frame=0x0, thread_index=2, queue_id=0) at
/development/libvpp/src/vpp/rtbrick/rtb_vpp_shm_node.c:341
#2  0x7f092b98102e in rtb_vpp_shm_input_node_fn (vm=0x7f08bc310480,
node=0x7f08b9a3d0c0, f=0x0) at
/development/libvpp/src/vpp/rtbrick/rtb_vpp_shm_node.c:434
#3  0x7f092a020c4f in dispatch_node (vm=0x7f08bc310480,
node=0x7f08b9a3d0c0, type=VLIB_NODE_TYPE_INPUT,
dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0,
last_time_stamp=3193893401554218) at
/development/libvpp/src/vlib/main.c:1207
#4  0x7f092a022d9c in vlib_main_or_worker_loop (vm=0x7f08bc310480,
is_main=0) at /development/libvpp/src/vlib/main.c:1779
#5  0x7f092a0238d1 in vlib_worker_loop (vm=0x7f08bc310480) at
/development/libvpp/src/vlib/main.c:1934
#6  0x7f092a062306 in vlib_worker_thread_fn (arg=0x7f08b9749140) at
/development/libvpp/src/vlib/threads.c:1754
#7  0x7f0929ce3600 in clib_calljmp () from
/usr/local/lib/libvppinfra.so.1.0.1
#8  0x7f085f7fdec0 in ?? ()
#9  0x7f092a05cb92 in vlib_worker_thread_bootstrap_fn
(arg=0x7f08b9749140) at /development/libvpp/src/vlib/threads.c:573
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 10 (Thread 0x7f085700 (LWP 260)):
#0  0x7f0935cb86c2 in __GI___waitpid (pid=7605,
stat_loc=stat_loc@entry=0x7f08bc73db18,
options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x7f0935c23067 in do_system (line=) at
../sysdeps/posix/system.c:149
#2  0x7f093673c21a in bd_signal_handler_cb (signo=6) at
/development/librtbrickinfra/bd/src/bd.c:770
#3  0x7f092a088d17 in rtb_bd_signal_handler (signo=6) at
/development/libvpp/src/vlib/unix/main.c:80
#4  0x7f092a0890b2 in unix_signal_handler (signum=6, si=0x7f08bc73e2f0,
uc=0x7f08bc73e1c0) at /development/libvpp/src/vlib/unix/main.c:180
#5  
#6  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#7  0x7f0935c14801 in __GI_abort () at abort.c:79
#8  0x7f092b827476 in os_panic () at
/development/libvpp/src/vpp/vnet/main.c:559
#9  0x7f0929cc2825 in debugger () at
/development/libvpp/src/vppinfra/error.c:84
#10 0x7f0929cc2bf4 in _clib_error (how_to_die=2, function_name=0x0,
line_number=0, fmt=0x7f092b1156b8 "%s:%d (%s) assertion `%s' fails")
at /development/libvpp/src/vppinfra/error.c:143
#11 0x7f092aa096bf in arp_mk_reply (vnm=0x7f092b5a05a0 ,
p0=0x1001e349c0, sw_if_index0=8, if_addr0=0x7f08bbe399f4,
arp0=0x1001e34b0e,
eth_rx=0x1001e34b00) at /development/libvpp/src/vnet/ethernet/arp.c:1206
#12 0x7f092aa09e76 in arp_reply (vm=0x7f08bc30fd40,
node=0x7f08bcc4dc80, frame=0x7f08be1835c0) at
/development/libvpp/src/vnet/ethernet/arp.c:1514
#13 0x7f092a020c4f in dispatch_node (vm=0x7f08bc30fd40,
node=0x7f08bcc4dc80, type=VLIB_NODE_TYPE_INTERNAL,
dispatch_state=VLIB_NODE_STATE_POLLING,
frame=0x7f08be1835c0, last_time_stamp=3193892880085078) at
/development/libvpp/src/vlib/main.c:1207
#14 0x7f092a02140a in dispatch_pending_node (vm=0x7f08bc30fd40,
pending_frame_index=2, last_time_stamp=3193892880085078)
at /development/libvpp/src/vlib/main.c:1375
#15 0x7f092a02304e in vlib_main_or_worker_loop (vm=0x7f08bc30fd40,
is_main=0) at /development/libvpp/src/vlib/main.c:1826
#16 0x7f092a0238d1 in vlib_worker_loop (vm=0x7f08bc30fd40) at
/development/libvpp/src/vlib/main.c:1934
#17 0x7f092a062306 in vlib_worker_thread_fn (arg=0x7f08b9749040) at
/development/libvpp/src/vlib/threads.c:1754
#18 0x7f0929ce3600 in clib_calljmp () from
/usr/local/lib/libvppinfra.so.1.0.1
#19 0x7f085fffeec0 in ?? ()
#20 0x7f092a05cb92 in vlib_worker_thread_bootstrap_fn
(arg=0x7f08b9749040) at /development/libvpp/src/vlib/threads.c:573
---Type  to continue, or q  to quit---q
Quit
(gdb) thread 10
[Switching to thread 10 (Thread 0x7f085700 (LWP 260))]
#0  0x7f0935cb86c2 in __GI___waitpid (pid=7605,
stat_loc=stat_loc@entry=0x7f08bc73db18,
options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
30 ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
(gdb) fr 11
#11 0x7f092aa096bf in arp_mk_reply (vnm=0x7f092b5a05a0 ,
p0=0x1001e349c0, sw_if_index0=8, if_addr0=0x7f08bbe399f4,
arp0=0x1001e34b0e,
eth_rx=0x1001e34b00) at /development/libvpp/src/vnet/ethernet/arp.c:1206
1206 /development/libvpp/src/vnet/ethernet/arp.c: No such file or directory.
(gdb) info locals
hw_if0 = 

Re: [vpp-dev] VPP_Main Thread Gets Stuck

2020-06-19 Thread Rajith PR via lists.fd.io
Version is *19.08.  *And we suspect one of our own process nodes is in a
tight loop doing route download. However show run , show run max does not
indicate any high clock tim on them.
Is there any other way to detect the problem node.

Thanks,
Rajith

On Fri, Jun 19, 2020 at 5:26 PM Dave Barach (dbarach) 
wrote:

> Vpp version? Configuration? Backtraces from other threads? The timer wheel
> code is not likely to be directly responsible.
>
>
>
> Earlier this year, we addressed a number of issues in vppinfra/time.[ch]
> having to do with NTP and/or manual time changes which could lead to
> symptoms like this.
>
>
>
> If you don’t have those patches, it would be best to acquire them at your
> earliest convenience. T=131 seconds is within the plausible range for an
> NTP timebase earthquake.
>
>
>
> HTH... Dave
>
>
>
> Please refer to
> https://fd.io/docs/vpp/master/troubleshooting/reportingissues/reportingissues.html#
> <https://fd.io/docs/vpp/master/troubleshooting/reportingissues/reportingissues.html>
>
>
>
> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Rajith
> PR via lists.fd.io
> *Sent:* Friday, June 19, 2020 12:30 AM
> *To:* vpp-dev 
> *Subject:* [vpp-dev] VPP_Main Thread Gets Stuck
>
>
>
> Hi All,
>
>
>
> While during scale tests with large numbers of routes, we occasionally hit
> a strange issue in our container. The *vpp process became unresponsive*,
> after attaching the process to gdb we could see the *vpp_main thread is
> stuck on a specific function*. Any pointer to debug such issues would be
> of great help.
>
>
>
> *Back Trace:*
>
>
>
> #0 0x7f6895f1bc56 in clib_bitmap_get (ai=0x7f683ad339c0, i=826) at
> /development/libvpp/src/vppinfra/bitmap.h:201
>
> #1 0x7f6895f20357 in tw_timer_expire_timers_internal_1t_3w_1024sl_ov
> (tw=0x7f683ad3, now=131.6111045732342,
> callback_vector_arg=0x7f683ad330c0) at
> /development/libvpp/src/vppinfra/tw_timer_template.c:744 #2
> 0x7f6895f20b36 in tw_timer_expire_timers_vec_1t_3w_1024sl_ov
> (tw=0x7f683ad3, now=131.6111045732342, vec=0x7f683ad330c0) at
> /development/libvpp/src/vppinfra/tw_timer_template.c:814 #3
> 0x7f68961fd166 in vlib_main_or_worker_loop (vm=0x7f689649ce00
> , is_main=1) at /development/libvpp/src/vlib/main.c:1857
> #4 0x7f68961fd8b1 in vlib_main_loop (vm=0x7f689649ce00
> ) at /development/libvpp/src/vlib/main.c:1928 #5
> 0x7f68961fe578 in vlib_main (vm=0x7f689649ce00 ,
> input=0x7f683a60ffb0) at /development/libvpp/src/vlib/main.c:2145 #6
> 0x7f6896264865 in thread0 (arg=140087174745600) at
> /development/libvpp/src/vlib/unix/main.c:666 #7 0x7f6895ebd600 in
> clib_calljmp () from /usr/local/lib/libvppinfra.so.1.0.1 #8
> 0x7fff47e2f760 in ?? () #9 0x7f6896264ddb in vlib_unix_main
> (argc=21, argv=0x563cecf5f900) at
> /development/libvpp/src/vlib/unix/main.c:736
>
>
>
> Thanks,
>
> Rajith
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16764): https://lists.fd.io/g/vpp-dev/message/16764
Mute This Topic: https://lists.fd.io/mt/74973962/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] VPP_Main Thread Gets Stuck

2020-06-18 Thread Rajith PR via lists.fd.io
Hi All,

While during scale tests with large numbers of routes, we occasionally hit
a strange issue in our container. The *vpp process became unresponsive*,
after attaching the process to gdb we could see the *vpp_main thread is
stuck on a specific function*. Any pointer to debug such issues would be of
great help.

*Back Trace:*

#0 0x7f6895f1bc56 in clib_bitmap_get (ai=0x7f683ad339c0, i=826) at
/development/libvpp/src/vppinfra/bitmap.h:201
#1 0x7f6895f20357 in tw_timer_expire_timers_internal_1t_3w_1024sl_ov
(tw=0x7f683ad3, now=131.6111045732342,
callback_vector_arg=0x7f683ad330c0) at
/development/libvpp/src/vppinfra/tw_timer_template.c:744 #2
0x7f6895f20b36 in tw_timer_expire_timers_vec_1t_3w_1024sl_ov
(tw=0x7f683ad3, now=131.6111045732342, vec=0x7f683ad330c0) at
/development/libvpp/src/vppinfra/tw_timer_template.c:814 #3
0x7f68961fd166 in vlib_main_or_worker_loop (vm=0x7f689649ce00
, is_main=1) at /development/libvpp/src/vlib/main.c:1857
#4 0x7f68961fd8b1 in vlib_main_loop (vm=0x7f689649ce00
) at /development/libvpp/src/vlib/main.c:1928 #5
0x7f68961fe578 in vlib_main (vm=0x7f689649ce00 ,
input=0x7f683a60ffb0) at /development/libvpp/src/vlib/main.c:2145 #6
0x7f6896264865 in thread0 (arg=140087174745600) at
/development/libvpp/src/vlib/unix/main.c:666 #7 0x7f6895ebd600 in
clib_calljmp () from /usr/local/lib/libvppinfra.so.1.0.1 #8
0x7fff47e2f760 in ?? () #9 0x7f6896264ddb in vlib_unix_main
(argc=21, argv=0x563cecf5f900) at
/development/libvpp/src/vlib/unix/main.c:736

Thanks,
Rajith
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16759): https://lists.fd.io/g/vpp-dev/message/16759
Mute This Topic: https://lists.fd.io/mt/74973962/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

2020-06-10 Thread Rajith PR via lists.fd.io
Hi Dave,

We ran a good number of scale tests with the fix. We didn't hit this crash.

Thanks a lot for the fix.

Regards,
Rajith



On Wed, Jun 3, 2020 at 5:40 PM Dave Barach (dbarach) 
wrote:

> Please test https://gerrit.fd.io/r/c/vpp/+/27407 and report results.
>
> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Dave Barach
> via lists.fd.io
> Sent: Wednesday, June 3, 2020 7:08 AM
> To: Benoit Ganne (bganne) ; raj...@rtbrick.com
> Cc: vpp-dev ; Neale Ranns (nranns) 
> Subject: Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
>
> +1, can't tell which poison pattern is involved without a scorecard.
>
> load_balance_alloc_i (...) is clearly not thread-safe due to calls to
> pool_get_aligned (...) and vlib_validate_combined_counter(...).
>
> Judicious use of pool_get_aligned_will_expand(...),
> _vec_resize_will_expand(...) and a manual barrier sync will fix this
> problem without resorting to draconian measures.
>
> It'd sure be nice to hear from Neale before we code something like that.
>
> D.
>
> -Original Message-
> From: Benoit Ganne (bganne) 
> Sent: Wednesday, June 3, 2020 3:17 AM
> To: raj...@rtbrick.com; Dave Barach (dbarach) 
> Cc: vpp-dev ; Neale Ranns (nranns) 
> Subject: RE: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
>
> Neale is away and might be slow to react.
> I suspect the issue is when creating new load balance entry through
> load_blance_create(), which will get a new element from the load balance
> pool. This in turn will update the pool free bitmap, which can grow. As it
> is backed by a vector, it can be reallocated somewhere else to fit the new
> size.
> If it is done concurrently with dataplane processing, bad things happen.
> The pattern 0x131313 is filled by dlmalloc free() and will happen in that
> case. I think the same could happen to the pool itself, not only the bitmap.
> If I am correct, I am not sure how we should fix that: fib update API is
> marked as mp_safe, so we could create a fixed-size load balance pool to
> prevent runtime reallocation, but it would waste memory and impose a
> maximum size.
>
> ben
>
> > -Original Message-
> > From: vpp-dev@lists.fd.io  On Behalf Of Rajith PR
> > via lists.fd.io
> > Sent: mercredi 3 juin 2020 05:46
> > To: Dave Barach (dbarach) 
> > Cc: vpp-dev ; Neale Ranns (nranns)
> > 
> > Subject: Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
> >
> > Hi Dave/Neal,
> >
> > The adj_poison seems to be a filling pattern - - 0xfefe. Am I looking
> > into the right code or I have interpreted it incorrectly?
> >
> > Thanks,
> > Rajith
> >
> > On Tue, Jun 2, 2020 at 7:44 PM Dave Barach (dbarach)
> > mailto:dbar...@cisco.com> > wrote:
> >
> >
> >   The code manages to access a poisoned adjacency – 0x131313 fill
> > pattern – copying Neale for an opinion.
> >
> >
> >
> >   D.
> >
> >
> >
> >   From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>   > d...@lists.fd.io <mailto:vpp-dev@lists.fd.io> > On Behalf Of Rajith PR
> > via lists.fd.io <http://lists.fd.io>
> >   Sent: Tuesday, June 2, 2020 10:00 AM
> >   To: vpp-dev mailto:vpp-dev@lists.fd.io> >
> >   Subject: [vpp-dev] SEGMENTATION FAULT in load_balance_get()
> >
> >
> >
> >   Hello All,
> >
> >
> >
> >   In 19.08 VPP version we are seeing a crash while accessing the
> > load_balance_pool  in load_balanc_get() function. This is happening
> > after enabling worker threads.
> >
> >   As such the FIB programming is happening in the main thread and in
> > one of the worker threads we see this crash.
> >
> >   Also, this is seen when we scale to 300K+ ipv4 routes.
> >
> >
> >
> >   Here is the complete stack,
> >
> >
> >
> >   Thread 10 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
> >
> >   [Switching to Thread 0x7fbe4aa8e700 (LWP 333)]
> >   0x7fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313,
> i=61)
> > at /home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
> >   201  return i0 < vec_len (ai) && 0 != ((ai[i0] >> i1) & 1);
> >
> >
> >
> >   Thread 10 (Thread 0x7fbe4aa8e700 (LWP 333)):
> >   #0  0x7fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313,
> > i=61) at /home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
> >   #1  0x7fbef10676a8 in load_balance_get (lbi=61) at
> > /home/ubuntu/Scale/libvpp/src/vnet/dpo/load_b

Re: [vpp-dev] SEGMENTATION FAULT in load_balance_get()

2020-06-02 Thread Rajith PR via lists.fd.io
Hi Dave/Neal,

The adj_poison seems to be a filling pattern - - 0xfefe. Am I looking into
the right code or I have interpreted it incorrectly?

Thanks,
Rajith

On Tue, Jun 2, 2020 at 7:44 PM Dave Barach (dbarach) 
wrote:

> The code manages to access a poisoned adjacency – 0x131313 fill pattern –
> copying Neale for an opinion.
>
>
>
> D.
>
>
>
> *From:* vpp-dev@lists.fd.io  *On Behalf Of *Rajith
> PR via lists.fd.io
> *Sent:* Tuesday, June 2, 2020 10:00 AM
> *To:* vpp-dev 
> *Subject:* [vpp-dev] SEGMENTATION FAULT in load_balance_get()
>
>
>
> Hello All,
>
>
>
> In *19.08 VPP version* we are seeing a crash while accessing the
> *load_balance_pool*  in *load_balanc_get*() function. This is happening
> after *enabling worker threads*.
>
> As such the FIB programming is happening in the main thread and in one of
> the worker threads we see this crash.
>
> Also, this is seen when we *scale to 300K+ ipv4 routes.*
>
>
>
> Here is the complete stack,
>
>
>
> Thread 10 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.
>
> [Switching to Thread 0x7fbe4aa8e700 (LWP 333)]
> 0x7fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313, i=61) at
> /home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
> 201  return i0 < vec_len (ai) && 0 != ((ai[i0] >> i1) & 1);
>
>
>
> Thread 10 (Thread 0x7fbe4aa8e700 (LWP 333)):
> #0  0x7fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313, i=61) at
> /home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
> #1  0x7fbef10676a8 in load_balance_get (lbi=61) at
> /home/ubuntu/Scale/libvpp/src/vnet/dpo/load_balance.h:222
> #2  0x7fbef106890c in ip4_lookup_inline (vm=0x7fbe8a5aa080,
> node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40) at
> /home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.h:369
> #3  0x7fbef1068ead in ip4_lookup_node_fn_avx2 (vm=0x7fbe8a5aa080,
> node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40)
> at /home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.c:95
> #4  0x7fbef0c6afec in dispatch_node (vm=0x7fbe8a5aa080,
> node=0x7fbe8b3fd380, type=VLIB_NODE_TYPE_INTERNAL,
> dispatch_state=VLIB_NODE_STATE_POLLING,
> frame=0x7fbe8a5edb40, last_time_stamp=381215594286358) at
> /home/ubuntu/Scale/libvpp/src/vlib/main.c:1207
> #5  0x7fbef0c6b7ad in dispatch_pending_node (vm=0x7fbe8a5aa080,
> pending_frame_index=2, last_time_stamp=381215594286358)
> at /home/ubuntu/Scale/libvpp/src/vlib/main.c:1375
> #6  0x7fbef0c6d3f0 in vlib_main_or_worker_loop (vm=0x7fbe8a5aa080,
> is_main=0) at /home/ubuntu/Scale/libvpp/src/vlib/main.c:1826
> #7  0x7fbef0c6dc73 in vlib_worker_loop (vm=0x7fbe8a5aa080) at
> /home/ubuntu/Scale/libvpp/src/vlib/main.c:1934
> #8  0x7fbef0cac791 in vlib_worker_thread_fn (arg=0x7fbe8de2a340) at
> /home/ubuntu/Scale/libvpp/src/vlib/threads.c:1754
> #9  0x7fbef092da48 in clib_calljmp () from
> /home/ubuntu/Scale/libvpp/build-root/install-vpp_debug-native/vpp/lib/libvppinfra.so.1.0.1
> #10 0x7fbe4aa8dec0 in ?? ()
> #11 0x7fbef0ca700c in vlib_worker_thread_bootstrap_fn
> (arg=0x7fbe8de2a340) at /home/ubuntu/Scale/libvpp/src/vlib/threads.c:573
>
> Thanks in Advance,
>
> Rajith
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16626): https://lists.fd.io/g/vpp-dev/message/16626
Mute This Topic: https://lists.fd.io/mt/74627827/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] SEGMENTATION FAULT in load_balance_get()

2020-06-02 Thread Rajith PR via lists.fd.io
Hello All,

In *19.08** VPP version* we are seeing a crash while accessing the
*load_balance_pool*  in *load_balanc_get*() function. This is happening
after *enabling worker threads*.
As such the FIB programming is happening in the main thread and in one of
the worker threads we see this crash.
Also, this is seen when we *scale to 300K+ ipv4 routes.*

Here is the complete stack,

Thread 10 "vpp_wk_0" received signal SIGSEGV, Segmentation fault.

[Switching to Thread 0x7fbe4aa8e700 (LWP 333)]
0x7fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313, i=61) at
/home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
201  return i0 < vec_len (ai) && 0 != ((ai[i0] >> i1) & 1);

Thread 10 (Thread 0x7fbe4aa8e700 (LWP 333)):
#0  0x7fbef10636f8 in clib_bitmap_get (ai=0x1313131313131313, i=61) at
/home/ubuntu/Scale/libvpp/src/vppinfra/bitmap.h:201
#1  0x7fbef10676a8 in load_balance_get (lbi=61) at
/home/ubuntu/Scale/libvpp/src/vnet/dpo/load_balance.h:222
#2  0x7fbef106890c in ip4_lookup_inline (vm=0x7fbe8a5aa080,
node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40) at
/home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.h:369
#3  0x7fbef1068ead in ip4_lookup_node_fn_avx2 (vm=0x7fbe8a5aa080,
node=0x7fbe8b3fd380, frame=0x7fbe8a5edb40)
at /home/ubuntu/Scale/libvpp/src/vnet/ip/ip4_forward.c:95
#4  0x7fbef0c6afec in dispatch_node (vm=0x7fbe8a5aa080,
node=0x7fbe8b3fd380, type=VLIB_NODE_TYPE_INTERNAL,
dispatch_state=VLIB_NODE_STATE_POLLING,
frame=0x7fbe8a5edb40, last_time_stamp=381215594286358) at
/home/ubuntu/Scale/libvpp/src/vlib/main.c:1207
#5  0x7fbef0c6b7ad in dispatch_pending_node (vm=0x7fbe8a5aa080,
pending_frame_index=2, last_time_stamp=381215594286358)
at /home/ubuntu/Scale/libvpp/src/vlib/main.c:1375
#6  0x7fbef0c6d3f0 in vlib_main_or_worker_loop (vm=0x7fbe8a5aa080,
is_main=0) at /home/ubuntu/Scale/libvpp/src/vlib/main.c:1826
#7  0x7fbef0c6dc73 in vlib_worker_loop (vm=0x7fbe8a5aa080) at
/home/ubuntu/Scale/libvpp/src/vlib/main.c:1934
#8  0x7fbef0cac791 in vlib_worker_thread_fn (arg=0x7fbe8de2a340) at
/home/ubuntu/Scale/libvpp/src/vlib/threads.c:1754
#9  0x7fbef092da48 in clib_calljmp () from
/home/ubuntu/Scale/libvpp/build-root/install-vpp_debug-native/vpp/lib/libvppinfra.so.1.0.1
#10 0x7fbe4aa8dec0 in ?? ()
#11 0x7fbef0ca700c in vlib_worker_thread_bootstrap_fn
(arg=0x7fbe8de2a340) at /home/ubuntu/Scale/libvpp/src/vlib/threads.c:573

Thanks in Advance,
Rajith
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16617): https://lists.fd.io/g/vpp-dev/message/16617
Mute This Topic: https://lists.fd.io/mt/74627827/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] How to match a specific packet to the outbound direction of a specified interface #vpp

2020-05-10 Thread Rajith PR via lists.fd.io
Another solution is to redirect the traffic from punt node to your feature
node. Here you can match on packets of interest and send them to interfere
output node.

Thanks,
Rajith

On Sat 9 May, 2020, 3:43 PM Mrityunjay Kumar,  wrote:

> which vpp version are you heading? If you r using 19.05 or less, you can
> create ipsec tunnel, and route your packet to ipsec0 interface,
>
> create ipsec tunnel local-ip  local-spi  remote-ip 
> remote-spi 
> set interface ipsec key ipsec0 local crypto aes-gcm-128
> 2b7e151628aed2a6abf7158809cf4f3d
> set interface ipsec key ipsec0 remote crypto aes-gcm-128
> 2b7e151628aed2a6abf7158809cf4f3d
> set interface state ipsec0 up
> set interface unnumbered ipsec0 use 
> ip route add 192.168.200.10/24 via ipsec0
>
> if your are using >= 19.08, best practice, you can create policy based
> tunnel.
>
> ipsec policy add spd 1 priority 100 inbound action bypass protocol 50
> ipsec policy add spd 1 priority 100 outbound action bypass protocol 50
> ipsec policy add spd 1 outbound action bypass local-ip-range
> 10.168.4.0-10.168.4.255 remote-ip-range 10.168.2.0-10.168.2.255
> ipsec sa add 10 spi 3391172682 esp crypto-alg aes-gcm-256 crypto-key
> 523a88fa4ad8c0325d75c933d9e567c23879ea701355207551bc2cf7d963c3dac8dcdca2
> tunnel-src 10.168.2.4 tunnel-dst 10.168.4.11
> ipsec sa add 20 spi 3443809241 esp crypto-alg aes-gcm-256 crypto-key
> 6062e3e9a9d578f58527242e9fbd48aeef7a0f8b4adc4569e7a84cda19c14ae21aa0a2b4
> tunnel-src 10.168.4.11 tunnel-dst 10.168.2.4
> ipsec policy add spd 1 priority 10  inbound action protect sa 10
> local-ip-range 10.168.3.11 - 10.168.3.11 remote-ip-range 10.168.2.4 -
> 10.168.2.4
> ipsec policy add spd 1 priority 10 outbound action protect sa 20
> local-ip-range 10.168.3.11 - 10.168.3.11 remote-ip-range 10.168.2.4 -
> 10.168.2.4
>
>
>
> cheers!   enjoy
> //MJ
>
>
>
> *Regards*,
> Mrityunjay Kumar.
> Mobile: +91 - 9731528504
>
>
>
> On Sat, May 9, 2020 at 12:16 PM  wrote:
>
>> Hi VPP hackers,
>> My program and vpp communicate through the memif interface.
>> I want to make vpp match specific packets(such as ospf packet), and then
>> redirect to the outbound direction of the memif interface.
>>
>> I don't know how to match a specific packet to the outbound direction of
>> a specified interface.
>>
>> Can someone provide an example of configuration.
>> Thanks in advance!
>>
>> 
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16294): https://lists.fd.io/g/vpp-dev/message/16294
Mute This Topic: https://lists.fd.io/mt/74091305/21656
Mute #vpp: https://lists.fd.io/mk?hashtag=vpp=1480452
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] : High memory usage by vpp_main

2020-03-06 Thread Rajith PR
Hello Team,

After moving from 17.04 to 19.01 VPP version we are observing huge increase
in memory requirement(VIRT, SHR) for vpp_main process. Is this expected?

 PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
*138 root  20   0 19.251g 1.461g 283400 R  14.6  4.7   0:38.85
vpp_main   <--19.01*

24278 root  20   0 7726748 2.968g 194584 S   6.7 19.0   6:27.99
vpp_main<---17.04

Thanks,
Rajith
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15695): https://lists.fd.io/g/vpp-dev/message/15695
Mute This Topic: https://lists.fd.io/mt/71768505/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Issues in VNET infra

2020-01-15 Thread Rajith PR
Hello Team,

During our integration with VPP stack we have found a couple of problems in
VNET infra and would like to seek your help in resolving these:-

1.  Is there any way to disable a hardware interface( Eg. memif interface
or a host
interface). vnet_hw_interface_t not  vnet_hw_interface_flags_t seem to not
have an attribute nor state for admin enable/disable.
2.  Is there any way to disable untagged software interface? It seems in
VPP, the untagged software interface thats gets created is also the parent
port and disabling it has the implication of disabling all the sub
interfaces under that hardware interface.
3. With regards to memif interface,in a single vpp instance we are not able
to create two memif interfaces with one as master and another as slave. Can
some one let us know how this can be done?

Thanks,
Rajith
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15184): https://lists.fd.io/g/vpp-dev/message/15184
Mute This Topic: https://lists.fd.io/mt/69718725/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Ipv6 neighbor not getting discovered

2019-12-03 Thread Rajith PR
Hello Team,

During integration of our software with VPP 19.08 we have found that ipv6
neighbor does not get discovered on first sw_if_index on which ipv6 is
enabled.
On further analysis we found that, it is due to radv_info->mcast_adj_index
being checked against "0" in the  following code :-

Function:

static_always_inline uword icmp6_router_solicitation (vlib_main_t * vm,
vlib_node_runtime_t * node, vlib_frame_t * frame)  :-

  else
{
  adj_index0 = radv_info->mcast_adj_index;


*if (adj_index0 == 0)error0 = ICMP6_ERROR_DST_LOOKUP_MISS;else*
{
  next0 =
is_dropped ? next0 :
ICMP6_ROUTER_SOLICITATION_NEXT_REPLY_RW;
  vnet_buffer (p0)->ip.adj_index[VLIB_TX] =
adj_index0;
}
}

Based on our understanding, "0" is a valid adjacency index. After
changing the code as below the problem seems to have been solved.

  else
{
  adj_index0 = radv_info->mcast_adj_index;
  *if (adj_index0 == ADJ_INDEX_INVALID )*


*error0 = ICMP6_ERROR_DST_LOOKUP_MISS;else*
{
  next0 =
is_dropped ? next0 :
ICMP6_ROUTER_SOLICITATION_NEXT_REPLY_RW;
  vnet_buffer (p0)->ip.adj_index[VLIB_TX] =
adj_index0;
}
}

Is this fix correct? If yes, can this be fixed in the master branch please.

Thanks,

Rajith
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14768): https://lists.fd.io/g/vpp-dev/message/14768
Mute This Topic: https://lists.fd.io/mt/65768746/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-