Public bug reported:

Environment description:

- Deployment using RDO Trunk repo from master.
- Neutron based on commit c430e9b

In neutron-ovs-agent is started before neutron-server starts, it exits
with return code 0, which is not identified by systemd as a failure so
it's not restarted.

following ERRORS appear in /var/log/neutron/openvswitch-agent.log:

2017-05-30 17:38:48.692 29042 DEBUG neutron.api.rpc.handlers.resources_rpc 
[req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] 
neutron.api.rpc.handlers.resources_rpc.ResourcesPullRpcApi met
hod bulk_pull called with arguments (<neutron_lib.context.Context object at 
0x75ff950>, 'Port') {} wrapper 
/usr/lib/python2.7/site-packages/oslo_log/helpers.py:47
2017-05-30 17:38:49.298 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] 
on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202

....
2017-05-30 17:40:26.506 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] 
on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202
2017-05-30 17:40:27.530 29042 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 
[req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] Agent main thread died of 
an exception
...
2017-05-30 17:40:27.530 29042 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp     
'to message ID %s' % msg_id)
2017-05-30 17:40:27.530 29042 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 
MessagingTimeout: Timed out waiting for a reply to message ID 
3874905892f543e0be9984e6504644bb
2017-05-30 17:40:27.530 29042 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 
2017-05-30 17:40:27.624 29042 INFO oslo_rootwrap.client [-] Stopping rootwrap 
daemon process with pid=29502

>From systemd side, following status is reported:

[root@weirdo1 neutron]# systemctl status neutron-openvswitch-agent
● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
   Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; 
enabled; vendor preset: disabled)
   Active: inactive (dead) since Tue 2017-05-30 17:40:27 UTC; 5min ago
 Main PID: 29042 (code=exited, status=0/SUCCESS)

May 30 17:38:44 weirdo1 systemd[1]: Starting OpenStack Neutron Open vSwitch 
Agent...
May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: 
net.bridge.bridge-nf-call-arptables = 1
May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: 
net.bridge.bridge-nf-call-iptables = 1
May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: 
net.bridge.bridge-nf-call-ip6tables = 1
May 30 17:38:44 weirdo1 systemd[1]: Started OpenStack Neutron Open vSwitch 
Agent.
May 30 17:38:45 weirdo1 neutron-openvswitch-agent[29042]: Guru meditation now 
registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 
will no longer be reg...te reports.
May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Option 
"notification_driver" from group "DEFAULT" is deprecated. Use option "driver" 
from group "oslo_messaging_notifications".
May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Could not load 
neutron.openstack.common.notifier.rpc_notifier


Note the (code=exited, status=0/SUCCESS)


A easy way to reproduce this is:

1. Stop neutron-server
2. Start manually neutron-openvswitch-agent:

# /usr/bin/neutron-openvswitch-agent --config-file 
/usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf  
--config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir 
/etc/neutron/conf.d/common --config-dir 
/etc/neutron/conf.d/neutron-openvswitch-agent
Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward 
compatibility. SIGUSR1 will no longer be registered in a future release, so 
please use SIGUSR2 to generate reports.
Option "notification_driver" from group "DEFAULT" is deprecated. Use option 
"driver" from group "oslo_messaging_notifications".
Could not load neutron.openstack.common.notifier.rpc_notifier
[root@weirdo1 neutron]# echo $?
0

Note return code is 0


I'd say this is a bug in ovs agent which should exit with rc!=0 so that systemd 
service restart it again based on "Restart=on-failure" current policy. 
Otherwise we should change systemd restart policy.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1694505

Title:
  neutron-ovs-agent dies with return code 0 when neutron-server is down

Status in neutron:
  New

Bug description:
  Environment description:

  - Deployment using RDO Trunk repo from master.
  - Neutron based on commit c430e9b

  In neutron-ovs-agent is started before neutron-server starts, it exits
  with return code 0, which is not identified by systemd as a failure so
  it's not restarted.

  following ERRORS appear in /var/log/neutron/openvswitch-agent.log:

  2017-05-30 17:38:48.692 29042 DEBUG neutron.api.rpc.handlers.resources_rpc 
[req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] 
neutron.api.rpc.handlers.resources_rpc.ResourcesPullRpcApi met
  hod bulk_pull called with arguments (<neutron_lib.context.Context object at 
0x75ff950>, 'Port') {} wrapper 
/usr/lib/python2.7/site-packages/oslo_log/helpers.py:47
  2017-05-30 17:38:49.298 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] 
[POLLIN] on fd 12 __log_wakeup 
/usr/lib/python2.7/site-packages/ovs/poller.py:202

  ....
  2017-05-30 17:40:26.506 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] 
[POLLIN] on fd 12 __log_wakeup 
/usr/lib/python2.7/site-packages/ovs/poller.py:202
  2017-05-30 17:40:27.530 29042 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 
[req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] Agent main thread died of 
an exception
  ...
  2017-05-30 17:40:27.530 29042 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp     
'to message ID %s' % msg_id)
  2017-05-30 17:40:27.530 29042 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 
MessagingTimeout: Timed out waiting for a reply to message ID 
3874905892f543e0be9984e6504644bb
  2017-05-30 17:40:27.530 29042 ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 
  2017-05-30 17:40:27.624 29042 INFO oslo_rootwrap.client [-] Stopping rootwrap 
daemon process with pid=29502

  From systemd side, following status is reported:

  [root@weirdo1 neutron]# systemctl status neutron-openvswitch-agent
  ● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent
     Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; 
enabled; vendor preset: disabled)
     Active: inactive (dead) since Tue 2017-05-30 17:40:27 UTC; 5min ago
   Main PID: 29042 (code=exited, status=0/SUCCESS)

  May 30 17:38:44 weirdo1 systemd[1]: Starting OpenStack Neutron Open vSwitch 
Agent...
  May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: 
net.bridge.bridge-nf-call-arptables = 1
  May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: 
net.bridge.bridge-nf-call-iptables = 1
  May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: 
net.bridge.bridge-nf-call-ip6tables = 1
  May 30 17:38:44 weirdo1 systemd[1]: Started OpenStack Neutron Open vSwitch 
Agent.
  May 30 17:38:45 weirdo1 neutron-openvswitch-agent[29042]: Guru meditation now 
registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 
will no longer be reg...te reports.
  May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Option 
"notification_driver" from group "DEFAULT" is deprecated. Use option "driver" 
from group "oslo_messaging_notifications".
  May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Could not load 
neutron.openstack.common.notifier.rpc_notifier

  
  Note the (code=exited, status=0/SUCCESS)

  
  A easy way to reproduce this is:

  1. Stop neutron-server
  2. Start manually neutron-openvswitch-agent:

  # /usr/bin/neutron-openvswitch-agent --config-file 
/usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf  
--config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir 
/etc/neutron/conf.d/common --config-dir 
/etc/neutron/conf.d/neutron-openvswitch-agent
  Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward 
compatibility. SIGUSR1 will no longer be registered in a future release, so 
please use SIGUSR2 to generate reports.
  Option "notification_driver" from group "DEFAULT" is deprecated. Use option 
"driver" from group "oslo_messaging_notifications".
  Could not load neutron.openstack.common.notifier.rpc_notifier
  [root@weirdo1 neutron]# echo $?
  0

  Note return code is 0

  
  I'd say this is a bug in ovs agent which should exit with rc!=0 so that 
systemd service restart it again based on "Restart=on-failure" current policy. 
Otherwise we should change systemd restart policy.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1694505/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to