Re: [vpp-dev] VPP-556 - vpp crashing in an openstack odl stack

2017-01-18 Thread Damjan Marion (damarion)

I also have two other questions:
•what's the difference between a regular image and an TAG=vpp_debug 
image?

vpp_debug_TAG_CFLAGS = -g -O0 -DCLIB_DEBUG -DFORTIFY_SOURCE=2 -march=$(MARCH) \
-fstack-protector-all -fPIC -Werror
vpp_debug_TAG_LDFLAGS = -g -O0 -DCLIB_DEBUG -DFORTIFY_SOURCE=2 -march=$(MARCH) \
-fstack-protector-all -fPIC -Werror

vpp_TAG_CFLAGS = -g -O2 -DFORTIFY_SOURCE=2 -march=$(MARCH) -mtune=$(MTUNE) \
-fstack-protector -fPIC -Werror
vpp_TAG_LDFLAGS = -g -O2 -DFORTIFY_SOURCE=2 -march=$(MARCH) -mtune=$(MTUNE) \
-fstack-protector -fPIC -Werror

•I've tried configuring the core files in a number of different ways, 
but nothings seems to be working – the core files are just not being created. 
Is there a guide on how to set it up for CentOS7? For reference, here's one of 
the guides that I 
used.

No idea, never used redhat. Works pretty well on ubuntu.

Thanks,

Damjan
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] VPP-556 - vpp crashing in an openstack odl stack

2017-01-17 Thread Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco)
Hi Dave, John,

I've tried building the latest 17.01 vpp (using "make V=0 PLATFORM=vpp 
TAG=vpp_debug install-rpm" - I understand that's where the TAG=vpp_debug is 
refereing to) and the issue is no longer present there, but there is something 
else - now vpp crashes when I delete a vhost-user port.

I've looked at patches submitted for master that could solve this and I've 
found https://gerrit.fd.io/r/#/c/4619/, but that didn't help. I've attached 
post-mortem api traces and backtrace. Pierre, could you please look at it?

I also have two other questions:

*what's the difference between a regular image and an TAG=vpp_debug 
image?

*I've tried configuring the core files in a number of different ways, 
but nothings seems to be working - the core files are just not being created. 
Is there a guide on how to set it up for CentOS7? For reference, here's one of 
the guides<https://www.unixmen.com/how-to-enable-core-dumps-in-rhel6/> that I 
used.

And the last thing is that Honeycomb now should work with vpp 17.04, so I'm 
going to try that one as well.

Thanks,
Juraj

From: Dave Barach (dbarach)
Sent: Wednesday, 11 January, 2017 23:43
To: John Lo (loj) <l...@cisco.com>; Juraj Linkes -X (jlinkes - PANTHEON 
TECHNOLOGIES at Cisco) <jlin...@cisco.com>; vpp-dev@lists.fd.io
Subject: RE: VPP-556 - vpp crashing in an openstack odl stack

+1... Hey John, thanks a lot for the detailed analysis...

Dave

From: John Lo (loj)
Sent: Wednesday, January 11, 2017 5:40 PM
To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>; Juraj 
Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) 
<jlin...@cisco.com<mailto:jlin...@cisco.com>>; 
vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: RE: VPP-556 - vpp crashing in an openstack odl stack

Hi Juraj,

I looked at the custom-dump of the API trace and noticed this "interesting" 
sequence:
SCRIPT: vxlan_add_del_tunnel src 192.168.11.22 dst 192.168.11.20 decap-next -1 
vni 1
SCRIPT: sw_interface_set_flags sw_if_index 4 admin-up link-up
SCRIPT: sw_interface_set_l2_bridge sw_if_index 4 bd_id 1 shg 1  enable
SCRIPT: sw_interface_set_l2_bridge sw_if_index 2 disable
SCRIPT: bridge_domain_add_del bd_id 1 del

Any idea why BD1 is deleted while the VXLAN tunnel with sw_if_index still in 
the BD? May be this is what is  causing the crash. From your vppctl output 
capture for "compute_that_crashed.txt", I do see BD 1 presen with vxlan_tunnel0 
on it:
[root@overcloud-novacompute-1 ~]# vppctl show bridge-domain
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term BVI-Intf
  0  0offoffoffoffofflocal0
  1  1on on on on off  N/A
[root@overcloud-novacompute-1 ~]# vppctl show bridge-domain 1 detail
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term BVI-Intf
  1  1on on on on off  N/A

   Interface   Index  SHG  BVI  TxFloodVLAN-Tag-Rewrite
 vxlan_tunnel0   3 1-  * none

I did install a vpp 1701 image on my server and performed an api trace replay 
of your api_post_mortem. Thereafter, I do not see BD 1 present while 
vxlan_tunnel1 is still configured as in BD 1:
DBGvpp# show bridge
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term BVI-Intf
  0  0offoffoffoffofflocal0
DBGvpp# sho vxlan tunnel
[1] src 192.168.11.22 dst 192.168.11.20 vni 1 sw_if_index 4 encap_fib_index 0 
fib_entry_index 12 decap_next l2
DBGvpp# sho int addr
GigabitEthernet2/3/0 (dn):
VirtualEthernet0/0/0 (up):
local0 (dn):
vxlan_tunnel0 (dn):
vxlan_tunnel1 (up):
  l2 bridge bd_id 1 shg 1
DBGvpp# show int
  Name   Idx   State  Counter  Count
GigabitEthernet2/3/0  1down
VirtualEthernet0/0/0  2 up
local00down
vxlan_tunnel0 3down
vxlan_tunnel1 4 up
DBGvpp#

With system in this state, I can easily imaging a packet received by 
vxlan_tunnel1 and forwarded in a non-existing BD causes VPP crash. I will look 
into VPP code from this angle. In general, however, there is really no need to 
create and delete BDs on VPP. Adding an interface/tunnel to a BD will cause it 
to be created. Deleting a BD without removing all the ports in it can cause 
problems which may well be the cause here. If a BD is to be not used, all the 
ports on it should be removed. If a BD is to be reused, just add ports to it.

As mentioned by Dave, please test using a known good image like 1701 and 
preferably built with debug enabled (with TAG-vpp_debug) so it is easier to 
find any issues.

Regards,
John

From: Dave Barach (dbarach)
Sent: Wednesday,

Re: [vpp-dev] VPP-556 - vpp crashing in an openstack odl stack

2017-01-11 Thread Dave Barach (dbarach)
+1... Hey John, thanks a lot for the detailed analysis...

Dave

From: John Lo (loj)
Sent: Wednesday, January 11, 2017 5:40 PM
To: Dave Barach (dbarach) <dbar...@cisco.com>; Juraj Linkes -X (jlinkes - 
PANTHEON TECHNOLOGIES at Cisco) <jlin...@cisco.com>; vpp-dev@lists.fd.io
Subject: RE: VPP-556 - vpp crashing in an openstack odl stack

Hi Juraj,

I looked at the custom-dump of the API trace and noticed this "interesting" 
sequence:
SCRIPT: vxlan_add_del_tunnel src 192.168.11.22 dst 192.168.11.20 decap-next -1 
vni 1
SCRIPT: sw_interface_set_flags sw_if_index 4 admin-up link-up
SCRIPT: sw_interface_set_l2_bridge sw_if_index 4 bd_id 1 shg 1  enable
SCRIPT: sw_interface_set_l2_bridge sw_if_index 2 disable
SCRIPT: bridge_domain_add_del bd_id 1 del

Any idea why BD1 is deleted while the VXLAN tunnel with sw_if_index still in 
the BD? May be this is what is  causing the crash. From your vppctl output 
capture for "compute_that_crashed.txt", I do see BD 1 presen with vxlan_tunnel0 
on it:
[root@overcloud-novacompute-1 ~]# vppctl show bridge-domain
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term BVI-Intf
  0  0offoffoffoffofflocal0
  1  1on on on on off  N/A
[root@overcloud-novacompute-1 ~]# vppctl show bridge-domain 1 detail
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term BVI-Intf
  1  1on on on on off  N/A

   Interface   Index  SHG  BVI  TxFloodVLAN-Tag-Rewrite
 vxlan_tunnel0   3 1-  * none

I did install a vpp 1701 image on my server and performed an api trace replay 
of your api_post_mortem. Thereafter, I do not see BD 1 present while 
vxlan_tunnel1 is still configured as in BD 1:
DBGvpp# show bridge
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term BVI-Intf
  0  0offoffoffoffofflocal0
DBGvpp# sho vxlan tunnel
[1] src 192.168.11.22 dst 192.168.11.20 vni 1 sw_if_index 4 encap_fib_index 0 
fib_entry_index 12 decap_next l2
DBGvpp# sho int addr
GigabitEthernet2/3/0 (dn):
VirtualEthernet0/0/0 (up):
local0 (dn):
vxlan_tunnel0 (dn):
vxlan_tunnel1 (up):
  l2 bridge bd_id 1 shg 1
DBGvpp# show int
  Name   Idx   State  Counter  Count
GigabitEthernet2/3/0  1down
VirtualEthernet0/0/0  2 up
local00down
vxlan_tunnel0 3down
vxlan_tunnel1 4 up
DBGvpp#

With system in this state, I can easily imaging a packet received by 
vxlan_tunnel1 and forwarded in a non-existing BD causes VPP crash. I will look 
into VPP code from this angle. In general, however, there is really no need to 
create and delete BDs on VPP. Adding an interface/tunnel to a BD will cause it 
to be created. Deleting a BD without removing all the ports in it can cause 
problems which may well be the cause here. If a BD is to be not used, all the 
ports on it should be removed. If a BD is to be reused, just add ports to it.

As mentioned by Dave, please test using a known good image like 1701 and 
preferably built with debug enabled (with TAG-vpp_debug) so it is easier to 
find any issues.

Regards,
John

From: Dave Barach (dbarach)
Sent: Wednesday, January 11, 2017 9:01 AM
To: Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) 
<jlin...@cisco.com<mailto:jlin...@cisco.com>>; 
vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>; John Lo (loj) 
<l...@cisco.com<mailto:l...@cisco.com>>
Subject: RE: VPP-556 - vpp crashing in an openstack odl stack

Dear Juraj,

I took a look. It appears that the last operation in the post-mortem API trace 
was to kill a vxlan tunnel. Is there a reasonable chance that other interfaces 
in the bridge group containing the tunnel were still admin-up? Was the tunnel 
interface removed from the bridge group prior to killing it?

The image involved is not stable/1701/LATEST. It's missing at least 20 fixes 
considered critical enough to justify merging them into the release throttle:

[root@overcloud-novacompute-1 ~]# vppctl show version verbose
Version:  v17.01-rc0~242-gabd98b2~b1576
Compiled by:  jenkins
Compile host: centos-7-a8b
Compile date: Mon Dec 12 18:55:56 UTC 2016

Please re-test with stable/1701/LATEST. Please use a TAG=vpp_debug image. If 
the problem is reproducible, we'll need a core file to make further progress.

Copying John Lo ("Dr. Vxlan") for any further thoughts he might have...

Thanks... Dave

From: vpp-dev-boun...@lists.fd.io<mailto:vpp-dev-boun...@lists.fd.io> 
[mailto:vpp-dev-boun...@lists.fd.io] On Behalf Of Juraj Linkes -X (jlinkes - 
PANTHEON TECHNOLOGIES at Cisco)
Se

Re: [vpp-dev] VPP-556 - vpp crashing in an openstack odl stack

2017-01-11 Thread John Lo (loj)
Hi Juraj,

I looked at the custom-dump of the API trace and noticed this "interesting" 
sequence:
SCRIPT: vxlan_add_del_tunnel src 192.168.11.22 dst 192.168.11.20 decap-next -1 
vni 1
SCRIPT: sw_interface_set_flags sw_if_index 4 admin-up link-up
SCRIPT: sw_interface_set_l2_bridge sw_if_index 4 bd_id 1 shg 1  enable
SCRIPT: sw_interface_set_l2_bridge sw_if_index 2 disable
SCRIPT: bridge_domain_add_del bd_id 1 del

Any idea why BD1 is deleted while the VXLAN tunnel with sw_if_index still in 
the BD? May be this is what is  causing the crash. From your vppctl output 
capture for "compute_that_crashed.txt", I do see BD 1 presen with vxlan_tunnel0 
on it:
[root@overcloud-novacompute-1 ~]# vppctl show bridge-domain
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term BVI-Intf
  0  0offoffoffoffofflocal0
  1  1on on on on off  N/A
[root@overcloud-novacompute-1 ~]# vppctl show bridge-domain 1 detail
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term BVI-Intf
  1  1on on on on off  N/A

   Interface   Index  SHG  BVI  TxFloodVLAN-Tag-Rewrite
 vxlan_tunnel0   3 1-  * none

I did install a vpp 1701 image on my server and performed an api trace replay 
of your api_post_mortem. Thereafter, I do not see BD 1 present while 
vxlan_tunnel1 is still configured as in BD 1:
DBGvpp# show bridge
  ID   Index   Learning   U-Forwrd   UU-Flood   Flooding   ARP-Term BVI-Intf
  0  0offoffoffoffofflocal0
DBGvpp# sho vxlan tunnel
[1] src 192.168.11.22 dst 192.168.11.20 vni 1 sw_if_index 4 encap_fib_index 0 
fib_entry_index 12 decap_next l2
DBGvpp# sho int addr
GigabitEthernet2/3/0 (dn):
VirtualEthernet0/0/0 (up):
local0 (dn):
vxlan_tunnel0 (dn):
vxlan_tunnel1 (up):
  l2 bridge bd_id 1 shg 1
DBGvpp# show int
  Name   Idx   State  Counter  Count
GigabitEthernet2/3/0  1down
VirtualEthernet0/0/0  2 up
local00down
vxlan_tunnel0 3down
vxlan_tunnel1 4 up
DBGvpp#

With system in this state, I can easily imaging a packet received by 
vxlan_tunnel1 and forwarded in a non-existing BD causes VPP crash. I will look 
into VPP code from this angle. In general, however, there is really no need to 
create and delete BDs on VPP. Adding an interface/tunnel to a BD will cause it 
to be created. Deleting a BD without removing all the ports in it can cause 
problems which may well be the cause here. If a BD is to be not used, all the 
ports on it should be removed. If a BD is to be reused, just add ports to it.

As mentioned by Dave, please test using a known good image like 1701 and 
preferably built with debug enabled (with TAG-vpp_debug) so it is easier to 
find any issues.

Regards,
John

From: Dave Barach (dbarach)
Sent: Wednesday, January 11, 2017 9:01 AM
To: Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco) 
; vpp-dev@lists.fd.io; John Lo (loj) 
Subject: RE: VPP-556 - vpp crashing in an openstack odl stack

Dear Juraj,

I took a look. It appears that the last operation in the post-mortem API trace 
was to kill a vxlan tunnel. Is there a reasonable chance that other interfaces 
in the bridge group containing the tunnel were still admin-up? Was the tunnel 
interface removed from the bridge group prior to killing it?

The image involved is not stable/1701/LATEST. It's missing at least 20 fixes 
considered critical enough to justify merging them into the release throttle:

[root@overcloud-novacompute-1 ~]# vppctl show version verbose
Version:  v17.01-rc0~242-gabd98b2~b1576
Compiled by:  jenkins
Compile host: centos-7-a8b
Compile date: Mon Dec 12 18:55:56 UTC 2016

Please re-test with stable/1701/LATEST. Please use a TAG=vpp_debug image. If 
the problem is reproducible, we'll need a core file to make further progress.

Copying John Lo ("Dr. Vxlan") for any further thoughts he might have...

Thanks... Dave

From: vpp-dev-boun...@lists.fd.io 
[mailto:vpp-dev-boun...@lists.fd.io] On Behalf Of Juraj Linkes -X (jlinkes - 
PANTHEON TECHNOLOGIES at Cisco)
Sent: Wednesday, January 11, 2017 3:47 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] VPP-556 - vpp crashing in an openstack odl stack

Hi vpp-dev,

I just wanted to ask whether anyone has taken a look at 
VPP-556? There might not be enough logs, I 
collected just backtrace from gdb - if we need anything more, please give me a 
little bit of a guidance on what could help/how to get it.

This is one the last few issues we're facing 

Re: [vpp-dev] VPP-556 - vpp crashing in an openstack odl stack

2017-01-11 Thread Dave Barach (dbarach)
Dear Juraj,

I took a look. It appears that the last operation in the post-mortem API trace 
was to kill a vxlan tunnel. Is there a reasonable chance that other interfaces 
in the bridge group containing the tunnel were still admin-up? Was the tunnel 
interface removed from the bridge group prior to killing it?

The image involved is not stable/1701/LATEST. It's missing at least 20 fixes 
considered critical enough to justify merging them into the release throttle:

[root@overcloud-novacompute-1 ~]# vppctl show version verbose
Version:  v17.01-rc0~242-gabd98b2~b1576
Compiled by:  jenkins
Compile host: centos-7-a8b
Compile date: Mon Dec 12 18:55:56 UTC 2016

Please re-test with stable/1701/LATEST. Please use a TAG=vpp_debug image. If 
the problem is reproducible, we'll need a core file to make further progress.

Copying John Lo ("Dr. Vxlan") for any further thoughts he might have...

Thanks... Dave

From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On 
Behalf Of Juraj Linkes -X (jlinkes - PANTHEON TECHNOLOGIES at Cisco)
Sent: Wednesday, January 11, 2017 3:47 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] VPP-556 - vpp crashing in an openstack odl stack

Hi vpp-dev,

I just wanted to ask whether anyone has taken a look at 
VPP-556? There might not be enough logs, I 
collected just backtrace from gdb - if we need anything more, please give me a 
little bit of a guidance on what could help/how to get it.

This is one the last few issues we're facing with the openstack odl scenario 
where we use vpp jsut for l2 and it's been there for a while.

Thanks,
Juraj
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev