Public bug reported: Environment description:
- Deployment using RDO Trunk repo from master. - Neutron based on commit c430e9b In neutron-ovs-agent is started before neutron-server starts, it exits with return code 0, which is not identified by systemd as a failure so it's not restarted. following ERRORS appear in /var/log/neutron/openvswitch-agent.log: 2017-05-30 17:38:48.692 29042 DEBUG neutron.api.rpc.handlers.resources_rpc [req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] neutron.api.rpc.handlers.resources_rpc.ResourcesPullRpcApi met hod bulk_pull called with arguments (<neutron_lib.context.Context object at 0x75ff950>, 'Port') {} wrapper /usr/lib/python2.7/site-packages/oslo_log/helpers.py:47 2017-05-30 17:38:49.298 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202 .... 2017-05-30 17:40:26.506 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202 2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp [req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] Agent main thread died of an exception ... 2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 'to message ID %s' % msg_id) 2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp MessagingTimeout: Timed out waiting for a reply to message ID 3874905892f543e0be9984e6504644bb 2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 2017-05-30 17:40:27.624 29042 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=29502 >From systemd side, following status is reported: [root@weirdo1 neutron]# systemctl status neutron-openvswitch-agent ● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled) Active: inactive (dead) since Tue 2017-05-30 17:40:27 UTC; 5min ago Main PID: 29042 (code=exited, status=0/SUCCESS) May 30 17:38:44 weirdo1 systemd[1]: Starting OpenStack Neutron Open vSwitch Agent... May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-arptables = 1 May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-iptables = 1 May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-ip6tables = 1 May 30 17:38:44 weirdo1 systemd[1]: Started OpenStack Neutron Open vSwitch Agent. May 30 17:38:45 weirdo1 neutron-openvswitch-agent[29042]: Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be reg...te reports. May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications". May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Could not load neutron.openstack.common.notifier.rpc_notifier Note the (code=exited, status=0/SUCCESS) A easy way to reproduce this is: 1. Stop neutron-server 2. Start manually neutron-openvswitch-agent: # /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-openvswitch-agent Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports. Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications". Could not load neutron.openstack.common.notifier.rpc_notifier [root@weirdo1 neutron]# echo $? 0 Note return code is 0 I'd say this is a bug in ovs agent which should exit with rc!=0 so that systemd service restart it again based on "Restart=on-failure" current policy. Otherwise we should change systemd restart policy. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1694505 Title: neutron-ovs-agent dies with return code 0 when neutron-server is down Status in neutron: New Bug description: Environment description: - Deployment using RDO Trunk repo from master. - Neutron based on commit c430e9b In neutron-ovs-agent is started before neutron-server starts, it exits with return code 0, which is not identified by systemd as a failure so it's not restarted. following ERRORS appear in /var/log/neutron/openvswitch-agent.log: 2017-05-30 17:38:48.692 29042 DEBUG neutron.api.rpc.handlers.resources_rpc [req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] neutron.api.rpc.handlers.resources_rpc.ResourcesPullRpcApi met hod bulk_pull called with arguments (<neutron_lib.context.Context object at 0x75ff950>, 'Port') {} wrapper /usr/lib/python2.7/site-packages/oslo_log/helpers.py:47 2017-05-30 17:38:49.298 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202 .... 2017-05-30 17:40:26.506 29042 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 12 __log_wakeup /usr/lib/python2.7/site-packages/ovs/poller.py:202 2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp [req-b5a96471-f0e2-4b24-938c-27ed4d8502c9 - - - - -] Agent main thread died of an exception ... 2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 'to message ID %s' % msg_id) 2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp MessagingTimeout: Timed out waiting for a reply to message ID 3874905892f543e0be9984e6504644bb 2017-05-30 17:40:27.530 29042 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_ryuapp 2017-05-30 17:40:27.624 29042 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=29502 From systemd side, following status is reported: [root@weirdo1 neutron]# systemctl status neutron-openvswitch-agent ● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled) Active: inactive (dead) since Tue 2017-05-30 17:40:27 UTC; 5min ago Main PID: 29042 (code=exited, status=0/SUCCESS) May 30 17:38:44 weirdo1 systemd[1]: Starting OpenStack Neutron Open vSwitch Agent... May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-arptables = 1 May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-iptables = 1 May 30 17:38:44 weirdo1 neutron-enable-bridge-firewall.sh[29032]: net.bridge.bridge-nf-call-ip6tables = 1 May 30 17:38:44 weirdo1 systemd[1]: Started OpenStack Neutron Open vSwitch Agent. May 30 17:38:45 weirdo1 neutron-openvswitch-agent[29042]: Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be reg...te reports. May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications". May 30 17:38:46 weirdo1 neutron-openvswitch-agent[29042]: Could not load neutron.openstack.common.notifier.rpc_notifier Note the (code=exited, status=0/SUCCESS) A easy way to reproduce this is: 1. Stop neutron-server 2. Start manually neutron-openvswitch-agent: # /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-openvswitch-agent Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports. Option "notification_driver" from group "DEFAULT" is deprecated. Use option "driver" from group "oslo_messaging_notifications". Could not load neutron.openstack.common.notifier.rpc_notifier [root@weirdo1 neutron]# echo $? 0 Note return code is 0 I'd say this is a bug in ovs agent which should exit with rc!=0 so that systemd service restart it again based on "Restart=on-failure" current policy. Otherwise we should change systemd restart policy. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1694505/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp