[Yahoo-eng-team] [Bug 1820744] [NEW] conntrack v1.4.4 (conntrack-tools): `0' unsupported protocol

2019-03-18 Thread Gaëtan Trellu
Public bug reported:

Hi,

In neutron-openvswitch-agent.log I got some error related to conntrack.
I'm not sure about when _delete_conntrack_state function is triggered.

>From the code it seem to be related to a port update.

Does it has something to do with the fact than during security group
rule creation sometime we could set a number[1] ?

2019-03-18 17:15:45.700 7 ERROR neutron.agent.linux.utils [-] Exit code: 2; 
Stdin: ; Stdout: ; Stderr: conntrack v1.4.4 (conntrack-tools): `0' unsupported 
protocol
Try `conntrack -h' or 'conntrack --help' for more information.

2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack [-] Failed 
execute conntrack command ('conntrack', '-D', '-p', '0', '-f', 'ipv4', '-d', 
'192.168.3.25', '-w', 4591): ProcessExecutionError: Exit code: 2; Stdin: ; 
Stdout: ; Stderr: conntrack v1.4.4 (conntrack-tools): `0' unsupported protocol
Try `conntrack -h' or 'conntrack --help' for more information.
2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack Traceback 
(most recent call last):
2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_conntrack.py",
 line 165, in _delete_conntrack_state
2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack 
extra_ok_codes=[1])
2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py",
 line 147, in execute
2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack 
returncode=returncode)
2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack 
ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: conntrack 
v1.4.4 (conntrack-tools): `0' unsupported protocol
2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack Try `conntrack 
-h' or 'conntrack --help' for more information.
2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack 
2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack 

This happens on Rocky version.

[1] https://github.com/openstack/python-
openstackclient/blob/4bde9af89251431791fc8d69fe09d5e17a8fba8f/openstackclient/network/v2/security_group_rule.py#L155-L164

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1820744

Title:
  conntrack v1.4.4 (conntrack-tools): `0' unsupported protocol

Status in neutron:
  New

Bug description:
  Hi,

  In neutron-openvswitch-agent.log I got some error related to
  conntrack. I'm not sure about when _delete_conntrack_state function is
  triggered.

  From the code it seem to be related to a port update.

  Does it has something to do with the fact than during security group
  rule creation sometime we could set a number[1] ?

  2019-03-18 17:15:45.700 7 ERROR neutron.agent.linux.utils [-] Exit code: 2; 
Stdin: ; Stdout: ; Stderr: conntrack v1.4.4 (conntrack-tools): `0' unsupported 
protocol
  Try `conntrack -h' or 'conntrack --help' for more information.

  2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack [-] Failed 
execute conntrack command ('conntrack', '-D', '-p', '0', '-f', 'ipv4', '-d', 
'192.168.3.25', '-w', 4591): ProcessExecutionError: Exit code: 2; Stdin: ; 
Stdout: ; Stderr: conntrack v1.4.4 (conntrack-tools): `0' unsupported protocol
  Try `conntrack -h' or 'conntrack --help' for more information.
  2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack Traceback 
(most recent call last):
  2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/ip_conntrack.py",
 line 165, in _delete_conntrack_state
  2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack 
extra_ok_codes=[1])
  2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py",
 line 147, in execute
  2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack 
returncode=returncode)
  2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack 
ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: conntrack 
v1.4.4 (conntrack-tools): `0' unsupported protocol
  2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack Try 
`conntrack -h' or 'conntrack --help' for more information.
  2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack 
  2019-03-18 17:15:45.701 7 ERROR neutron.agent.linux.ip_conntrack 

  This happens on Rocky version.

  [1] https://github.com/openstack/python-
  
openstackclient/blob/4bde9af89251431791fc8d69fe09d5e17a8fba8f/openstackclient/network/v2/security_group_rule.py#L155-L164

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/182

[Yahoo-eng-team] [Bug 1794991] [NEW] Inconsistent flows with DVR l2pop VxLAN on br-tun

2018-09-28 Thread Gaëtan Trellu
Public bug reported:

We are using Neutron (Pike) configured as DVR with l2pop and ARP
responder and VxLAN. Since few weeks we are experiencing unexpected
behaviors:

- [1] Some instances are not able to get DHCP address
- [2] Instances are not able to ping other instances on different compute

This is totally random, sometime it will work as expected and sometime
we will have the behaviors describe above.

After checking the flows between network and compute nodes we have been
able to discover that for behavior [1] it is due to missing flows on the
compute nodes pointing to the DHCP agent on the network one.

About behavior [2] it is related to missing flows too, some compute
nodes have missing output to other compute nodes (vxlan-xx) which
prevent an instance on compute 1 to communicate to an instance on
compute 2.

When we add the missing flows for [1] and [2] we are able to fix the
issues but if we restart neutron-openvswitch-agent the flows are missing
again.

For [1] sometime just disable/enable the port on the network nodes
related to each DHCP solve the problem and sometime not.

For [2] the only way we found to fix the flows without adding them
manually is to remove all instances of a network from the compute and
create a new instance from this network which will sends a notification
message to all computing and network nodes but again when neutron-
openvswitch-agent restart the flows vanish again.

We cherry-picked these commits but nothing changed:
  - https://review.openstack.org/#/c/600151/
  - https://review.openstack.org/#/c/573785/

Any ideas ?

** Affects: neutron
 Importance: Undecided
 Status: New

** Description changed:

  We are using Neutron (Pike) configured as DVR with l2pop and ARP
  responder. Since few weeks we are experiencing unexpected behaviors:
  
  - [1] Some instances are not able to get DHCP address
  - [2] Instances are not able to ping other instances on different compute
  
  This is totally random, sometime it will work as expected and sometime
  we will have the behaviors describe above.
  
  After checking the flows between network and compute nodes we have been
- able to discover that for behavior [1] is due to missing flows on the
+ able to discover that for behavior [1] it is due to missing flows on the
  compute nodes pointing to the DHCP agent on the network one.
  
  About behavior [2] it is related to missing flows too, some compute
  nodes have missing output to other compute nodes which prevent an
  instance on compute 1 to communicate to an instance on compute 2.
  
  When we add the missing flows for [1] and [2] we are able to fix the
  issues but if we restart neutron-openvswitch-agent the flows are missing
  again.
  
  For [1] sometime just disable/enable the port on the network nodes
  related to each DHCP solve the problem and sometime not.
  
  For [2] the only way we found to fix the flows without adding them
  manually is to remove all instances of a network from the compute and
  create a new instance from this network which will sends a notification
  message to all computing and network nodes.
  
  We cherry-picked the commits but nothing changed:
-   - https://review.openstack.org/#/c/600151/
-   - https://review.openstack.org/#/c/573785/
+   - https://review.openstack.org/#/c/600151/
+   - https://review.openstack.org/#/c/573785/
  
  Any ideas ?

** Description changed:

  We are using Neutron (Pike) configured as DVR with l2pop and ARP
- responder. Since few weeks we are experiencing unexpected behaviors:
+ responder and VxLAN. Since few weeks we are experiencing unexpected
+ behaviors:
  
  - [1] Some instances are not able to get DHCP address
  - [2] Instances are not able to ping other instances on different compute
  
  This is totally random, sometime it will work as expected and sometime
  we will have the behaviors describe above.
  
  After checking the flows between network and compute nodes we have been
  able to discover that for behavior [1] it is due to missing flows on the
  compute nodes pointing to the DHCP agent on the network one.
  
  About behavior [2] it is related to missing flows too, some compute
- nodes have missing output to other compute nodes which prevent an
- instance on compute 1 to communicate to an instance on compute 2.
+ nodes have missing output to other compute nodes (vxlan-xx) which
+ prevent an instance on compute 1 to communicate to an instance on
+ compute 2.
  
  When we add the missing flows for [1] and [2] we are able to fix the
  issues but if we restart neutron-openvswitch-agent the flows are missing
  again.
  
  For [1] sometime just disable/enable the port on the network nodes
  related to each DHCP solve the problem and sometime not.
  
  For [2] the only way we found to fix the flows without adding them
  manually is to remove all instances of a network from the compute and
  create a new instance from this network which will sends a notification
  message to all comput

[Yahoo-eng-team] [Bug 1774257] [NEW] neutron-openvswitch-agent RuntimeError: Switch connection timeout

2018-05-30 Thread Gaëtan Trellu
Public bug reported:

In neutron-openvswitch-agent.log I see lot of timeout message.

  RuntimeError: Switch connection timeout

This timeout prevents sometime neutron-openvswitch-agent to be UP.
We are running Pike and we have ~1000 ports in Open vSwitch.

I'm able to run ovs-vsctl, ovs-ofctl, etc... commands which mean that
Open vSwitch (vswitchd+db) are working fine.

This is the full TRACE of neutron-openvswitch-agent log:

2018-05-30 19:22:42.353 7 WARNING ovsdbapp.backend.ovs_idl.vlog [-] 
tcp:127.0.0.1:6640: receive error: Connection reset by peer
2018-05-30 19:22:42.358 7 WARNING ovsdbapp.backend.ovs_idl.vlog [-] 
tcp:127.0.0.1:6640: connection dropped (Connection reset by peer)
2018-05-30 19:24:17.626 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch 
[req-3c335d47-9b3e-4f18-994b-afca7d7d70be - - - - -] Switch connection timeout: 
RuntimeError: Switch connection timeout
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
[req-3c335d47-9b3e-4f18-994b-afca7d7d70be - - - - -] Error while processing VIF 
ports: RuntimeError: Switch connection timeout
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent Traceback (most 
recent call last):
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 2066, in rpc_loop
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
ofport_changed_ports = self.update_stale_ofport_rules()
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/osprofiler/profiler.py", 
line 153, in wrapper
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent return 
f(*args, **kwargs)
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 1210, in update_stale_ofport_rules
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
self.int_br.delete_arp_spoofing_protection(port=ofport)
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/br_int.py",
 line 255, in delete_arp_spoofing_protection
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent match=match)
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py",
 line 111, in uninstall_flows
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent (dp, ofp, 
ofpp) = self._get_dp()
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_bridge.py",
 line 67, in _get_dp
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
self._cached_dpid = new_dpid
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", 
line 220, in __exit__
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
self.force_reraise()
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_utils/excutils.py", 
line 196, in force_reraise
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
six.reraise(self.type_, self.value, self.tb)
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_bridge.py",
 line 50, in _get_dp
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent dp = 
self._get_dp_by_dpid(self._cached_dpid)
2018-05-30 19:24:17.628 7 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent   File 
"/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py",
 line 69, in _get_dp_by_dpi