[Yahoo-eng-team] [Bug 1830295] Re: devstack py3 get_link_devices() KeyError: 'index'
Reinstalling oslo.privsep seems to have "fixed" it on the 16.04 box too. Still don't understand why - must have been some bad cache or something. Will put this down to "gremlins" unless it resurfaces :/ ** Changed in: neutron Status: New => Invalid ** Changed in: oslo.privsep Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1830295 Title: devstack py3 get_link_devices() KeyError: 'index' Status in neutron: Invalid Status in oslo.privsep: Invalid Bug description: devstack master with py3. openvswitch agent has suddenly stopped working, with no change in config or environment (other than rebuilding devstack). Stack trace below. For some reason (yet undetermined), privileged.get_link_devices() now seems to be returning byte arrays instead of strings as the dict keys: >>> from neutron.privileged.agent.linux import ip_lib as privileged >>> privileged.get_link_devices(None)[0].keys() dict_keys([b'index', b'family', b'__align', b'header', b'flags', b'ifi_type', b'event', b'change', b'attrs']) >>> From agent startup: neutron-openvswitch-agent[42936]: ERROR neutron Traceback (most recent call last): neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/bin/neutron-openvswitch-agent", line 10, in neutron-openvswitch-agent[42936]: ERROR neutron sys.exit(main()) neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 20, in main neutron-openvswitch-agent[42936]: ERROR neutron agent_main.main() neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", line 47, in main neutron-openvswitch-agent[42936]: ERROR neutron mod.main() neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py", line 35, in main neutron-openvswitch-agent[42936]: ERROR neutron 'neutron.plugins.ml2.drivers.openvswitch.agent.' neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/os_ken/base/app_manager.py", line 375, in run_apps neutron-openvswitch-agent[42936]: ERROR neutron hub.joinall(services) neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 102, in joinall neutron-openvswitch-agent[42936]: ERROR neutron t.wait() neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 180, in wait neutron-openvswitch-agent[42936]: ERROR neutron return self._exit_event.wait() neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/eventlet/event.py", line 132, in wait neutron-openvswitch-agent[42936]: ERROR neutron current.throw(*self._exc) neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 219, in main neutron-openvswitch-agent[42936]: ERROR neutron result = function(*args, **kwargs) neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 64, in _launch neutron-openvswitch-agent[42936]: ERROR neutron raise e neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 59, in _launch neutron-openvswitch-agent[42936]: ERROR neutron return func(*args, **kwargs) neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py", line 43, in agent_main_wrapper neutron-openvswitch-agent[42936]: ERROR neutron LOG.exception("Agent main thread died of an exception") neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ neutron-openvswitch-agent[42936]: ERROR neutron self.force_reraise() neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise neutron-openvswitch-agent[42936]: ERROR neutron six.reraise(self.type_, self.value, self.tb) neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise neutron-openvswitch-agent[42936]: ERROR neutron raise value neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py", line 40, in agent_main_wrapper neutron-openvswitch-agent[42936]: ERROR neutron ovs_agent.main(bridge_classes)
[Yahoo-eng-team] [Bug 1830295] Re: devstack py3 get_link_devices() KeyError: 'index'
Yeah... downgrading oslo.privsep from 1.33.0 to 1.32.1 makes the problem go away. ** Also affects: oslo.privsep Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1830295 Title: devstack py3 get_link_devices() KeyError: 'index' Status in neutron: New Status in oslo.privsep: New Bug description: devstack master with py3. openvswitch agent has suddenly stopped working, with no change in config or environment (other than rebuilding devstack). Stack trace below. For some reason (yet undetermined), privileged.get_link_devices() now seems to be returning byte arrays instead of strings as the dict keys: >>> from neutron.privileged.agent.linux import ip_lib as privileged >>> privileged.get_link_devices(None)[0].keys() dict_keys([b'index', b'family', b'__align', b'header', b'flags', b'ifi_type', b'event', b'change', b'attrs']) >>> From agent startup: neutron-openvswitch-agent[42936]: ERROR neutron Traceback (most recent call last): neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/bin/neutron-openvswitch-agent", line 10, in neutron-openvswitch-agent[42936]: ERROR neutron sys.exit(main()) neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 20, in main neutron-openvswitch-agent[42936]: ERROR neutron agent_main.main() neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", line 47, in main neutron-openvswitch-agent[42936]: ERROR neutron mod.main() neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py", line 35, in main neutron-openvswitch-agent[42936]: ERROR neutron 'neutron.plugins.ml2.drivers.openvswitch.agent.' neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/os_ken/base/app_manager.py", line 375, in run_apps neutron-openvswitch-agent[42936]: ERROR neutron hub.joinall(services) neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 102, in joinall neutron-openvswitch-agent[42936]: ERROR neutron t.wait() neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 180, in wait neutron-openvswitch-agent[42936]: ERROR neutron return self._exit_event.wait() neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/eventlet/event.py", line 132, in wait neutron-openvswitch-agent[42936]: ERROR neutron current.throw(*self._exc) neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 219, in main neutron-openvswitch-agent[42936]: ERROR neutron result = function(*args, **kwargs) neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 64, in _launch neutron-openvswitch-agent[42936]: ERROR neutron raise e neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 59, in _launch neutron-openvswitch-agent[42936]: ERROR neutron return func(*args, **kwargs) neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py", line 43, in agent_main_wrapper neutron-openvswitch-agent[42936]: ERROR neutron LOG.exception("Agent main thread died of an exception") neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ neutron-openvswitch-agent[42936]: ERROR neutron self.force_reraise() neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise neutron-openvswitch-agent[42936]: ERROR neutron six.reraise(self.type_, self.value, self.tb) neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise neutron-openvswitch-agent[42936]: ERROR neutron raise value neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py", line 40, in agent_main_wrapper neutron-openvswitch-agent[42936]: ERROR neutron ovs_agent.main(bridge_classes) neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2393, in main
[Yahoo-eng-team] [Bug 1830295] [NEW] devstack py3 get_link_devices() KeyError: 'index'
Public bug reported: devstack master with py3. openvswitch agent has suddenly stopped working, with no change in config or environment (other than rebuilding devstack). Stack trace below. For some reason (yet undetermined), privileged.get_link_devices() now seems to be returning byte arrays instead of strings as the dict keys: >>> from neutron.privileged.agent.linux import ip_lib as privileged >>> privileged.get_link_devices(None)[0].keys() dict_keys([b'index', b'family', b'__align', b'header', b'flags', b'ifi_type', b'event', b'change', b'attrs']) >>> >From agent startup: neutron-openvswitch-agent[42936]: ERROR neutron Traceback (most recent call last): neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/bin/neutron-openvswitch-agent", line 10, in neutron-openvswitch-agent[42936]: ERROR neutron sys.exit(main()) neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 20, in main neutron-openvswitch-agent[42936]: ERROR neutron agent_main.main() neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", line 47, in main neutron-openvswitch-agent[42936]: ERROR neutron mod.main() neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py", line 35, in main neutron-openvswitch-agent[42936]: ERROR neutron 'neutron.plugins.ml2.drivers.openvswitch.agent.' neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/os_ken/base/app_manager.py", line 375, in run_apps neutron-openvswitch-agent[42936]: ERROR neutron hub.joinall(services) neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 102, in joinall neutron-openvswitch-agent[42936]: ERROR neutron t.wait() neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 180, in wait neutron-openvswitch-agent[42936]: ERROR neutron return self._exit_event.wait() neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/eventlet/event.py", line 132, in wait neutron-openvswitch-agent[42936]: ERROR neutron current.throw(*self._exc) neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 219, in main neutron-openvswitch-agent[42936]: ERROR neutron result = function(*args, **kwargs) neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 64, in _launch neutron-openvswitch-agent[42936]: ERROR neutron raise e neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 59, in _launch neutron-openvswitch-agent[42936]: ERROR neutron return func(*args, **kwargs) neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py", line 43, in agent_main_wrapper neutron-openvswitch-agent[42936]: ERROR neutron LOG.exception("Agent main thread died of an exception") neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ neutron-openvswitch-agent[42936]: ERROR neutron self.force_reraise() neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise neutron-openvswitch-agent[42936]: ERROR neutron six.reraise(self.type_, self.value, self.tb) neutron-openvswitch-agent[42936]: ERROR neutron File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise neutron-openvswitch-agent[42936]: ERROR neutron raise value neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py", line 40, in agent_main_wrapper neutron-openvswitch-agent[42936]: ERROR neutron ovs_agent.main(bridge_classes) neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2393, in main neutron-openvswitch-agent[42936]: ERROR neutron validate_tunnel_config(cfg.CONF.AGENT.tunnel_types, cfg.CONF.OVS.local_ip) neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2362, in validate_tunnel_config neutron-openvswitch-agent[42936]: ERROR neutron validate_local_ip(local_ip) neutron-openvswitch-agent[42936]: ERROR neutron File "/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2350, in
[Yahoo-eng-team] [Bug 1825584] [NEW] eventlet monkey-patching breaks AMQP heartbeat on uWSGI
Public bug reported: Stein nova-api running under uWSGI presents an AMQP issue. The first API call that requires RPC creates an AMQP connection and successfully completes. Normally regular heartbeats would be sent from this point on, to maintain the connection. This is not happening. After a few minutes, the AMQP server (rabbitmq, in my case) notices that there have been no heartbeats, and drops the connection. A later nova API call that requires RPC tries to use the old connection, and throws a "connection reset by peer" exception and the API call fails. A mailing-list reponse suggests that this is affecting mod_wsgi also: http://lists.openstack.org/pipermail/openstack- discuss/2019-April/005310.html I've discovered that this problem seems to be caused by eventlet monkey- patching, which was introduced in: https://github.com/openstack/nova/commit/23ba1c690652832c655d57476630f02c268c87ae It was later rearranged in: https://github.com/openstack/nova/commit/3c5e2b0e9fac985294a949852bb8c83d4ed77e04 but this problem remains. If I comment out the import of nova.monkey_patch in nova/api/openstack/__init__.py the problem goes away. Seems that eventlet monkey-patching and uWSGI are not getting along for some reason... ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1825584 Title: eventlet monkey-patching breaks AMQP heartbeat on uWSGI Status in OpenStack Compute (nova): New Bug description: Stein nova-api running under uWSGI presents an AMQP issue. The first API call that requires RPC creates an AMQP connection and successfully completes. Normally regular heartbeats would be sent from this point on, to maintain the connection. This is not happening. After a few minutes, the AMQP server (rabbitmq, in my case) notices that there have been no heartbeats, and drops the connection. A later nova API call that requires RPC tries to use the old connection, and throws a "connection reset by peer" exception and the API call fails. A mailing-list reponse suggests that this is affecting mod_wsgi also: http://lists.openstack.org/pipermail/openstack- discuss/2019-April/005310.html I've discovered that this problem seems to be caused by eventlet monkey-patching, which was introduced in: https://github.com/openstack/nova/commit/23ba1c690652832c655d57476630f02c268c87ae It was later rearranged in: https://github.com/openstack/nova/commit/3c5e2b0e9fac985294a949852bb8c83d4ed77e04 but this problem remains. If I comment out the import of nova.monkey_patch in nova/api/openstack/__init__.py the problem goes away. Seems that eventlet monkey-patching and uWSGI are not getting along for some reason... To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1825584/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1824435] [NEW] fill_virtual_interface_list migration fails on second attempt
Public bug reported: On attempting to run online_data_migrations on Stein for the second time (and beyond), fill_virtual_interface_list fails as below. I find two rows in the security_groups table which have name='default' and project_id NULL. 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage [req-6cb533e2-58b5-41db-a455-29dae8efef31 - - - - -] Error attempting to run : TypeError: 'NoneType' object has no attribute '__getitem__' 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage Traceback (most recent call last): 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 686, in _run_migration 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage found, done = migration_meth(ctxt, count) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1012, in wrapper 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return fn(*args, **kwargs) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/objects/virtual_interface.py", line 279, in fill_virtual_interface_list 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage _set_or_delete_marker_for_migrate_instances(cctxt, marker) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(context, *args, **kwargs) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/objects/virtual_interface.py", line 305, in _set_or_delete_marker_for_migrate_instances 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage instance.create() 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return fn(self, *args, **kwargs) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 600, in create 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage db_inst = db.instance_create(self._context, updates) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 748, in instance_create 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return IMPL.instance_create(context, values) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 170, in wrapper 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(*args, **kwargs) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage ectxt.value = e.inner_exc 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage self.force_reraise() 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage six.reraise(self.type_, self.value, self.tb) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(*args, **kwargs) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(context, *args, **kwargs) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 1728, in instance_create 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage security_group_ensure_default(context) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 4039, in security_group_ensure_default 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return _security_group_ensure_default(context) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(context, *args, **kwargs) 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 4050, in _security_group_ensure_default 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage default_group = _security_group_get_by_names(context, ['default'])[0] 2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage
[Yahoo-eng-team] [Bug 1816831] [NEW] DOC: typo in add_initial_allocation_ratio releasenote
Public bug reported: https://github.com/openstack/nova/blob/master/releasenotes/notes/add_initial_allocation_ratio-2d2666d62426a4bf.yaml - initial_cpu_allocation_ratio with default value 16.0 - initial_ram_allocation_ratio with default value 1.5 - initial_ram_allocation_ratio with default value 1.0 The third one should be "initial_disk_allocation_ratio". ** Affects: nova Importance: Undecided Status: New ** Tags: doc releasenote -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1816831 Title: DOC: typo in add_initial_allocation_ratio releasenote Status in OpenStack Compute (nova): New Bug description: https://github.com/openstack/nova/blob/master/releasenotes/notes/add_initial_allocation_ratio-2d2666d62426a4bf.yaml - initial_cpu_allocation_ratio with default value 16.0 - initial_ram_allocation_ratio with default value 1.5 - initial_ram_allocation_ratio with default value 1.0 The third one should be "initial_disk_allocation_ratio". To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1816831/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1804075] [NEW] [doc] multistore incorrect options group for default_backend
Public bug reported: https://docs.openstack.org/glance/latest/admin/multistores.html describes the default_backend option in the [glance_store] options group, in the example incorrectly shows the [DEFAULT] options group. ** Affects: glance Importance: Undecided Assignee: iain MacDonnell (imacdonn) Status: In Progress ** Changed in: glance Assignee: (unassigned) => iain MacDonnell (imacdonn) ** Changed in: glance Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1804075 Title: [doc] multistore incorrect options group for default_backend Status in Glance: In Progress Bug description: https://docs.openstack.org/glance/latest/admin/multistores.html describes the default_backend option in the [glance_store] options group, in the example incorrectly shows the [DEFAULT] options group. To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1804075/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1794364] Re: 'nova-manage db online_data_migrations' count fail
** Also affects: cinder Importance: Undecided Status: New ** Changed in: cinder Assignee: (unassigned) => iain MacDonnell (imacdonn) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1794364 Title: 'nova-manage db online_data_migrations' count fail Status in Cinder: New Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Fix Committed Status in OpenStack Compute (nova) pike series: Fix Committed Status in OpenStack Compute (nova) queens series: Fix Committed Status in OpenStack Compute (nova) rocky series: Fix Committed Bug description: 'nova-manage db online_data_migrations' attempts to display summary counts of migrations "Needed" and "Completed" in a pretty table at the end, but fails to accumulate the totals between successive invocations of _run_migration(), and ends up reporting zeroes. # nova-manage db online_data_migrations /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:332: NotSupportedWarning: Configuration option(s) ['use_tpool'] not supported exception.NotSupportedWarning Running batches of 50 until complete /usr/lib/python2.7/site-packages/pymysql/cursors.py:166: Warning: (3090, u"Changing sql mode 'NO_AUTO_CREATE_USER' is deprecated. It will be removed in a future release.") result = self._query(query) 2 rows matched query migrate_instances_add_request_spec, 0 migrated 13 rows matched query migrate_quota_limits_to_api_db, 13 migrated 37 rows matched query populate_uuids, 37 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 21 rows matched query populate_uuids, 21 migrated +-+--+---+ | Migration | Total Needed | Completed | +-+--+---+ | delete_build_requests_with_no_instance_uuid | 0 | 0 | |migrate_aggregate_reset_autoincrement| 0 | 0 | | migrate_aggregates | 0 | 0 | | migrate_instance_groups_to_api_db | 0 | 0 | | migrate_instances_add_request_spec | 0 | 0 | | migrate_keypairs_to_api_db | 0 | 0 | | migrate_quota_classes_to_api_db | 0 | 0 | |
[Yahoo-eng-team] [Bug 1796192] [NEW] online_data_migrations exceptions quietly masked
Public bug reported: When online_data_migrations raise exceptions, nova/cinder-manage catches the exception, prints a fairly useless "something didn't work" message, and moves on. Two issues: 1) The user(/admin) has no way to see what actually failed (exception is not logged) 2) The command returns exit status 0, as if all possible migrations have been completed successfully - this can cause failures to get missed, especially if automated ** Affects: cinder Importance: Undecided Status: New ** Affects: nova Importance: Undecided Assignee: iain MacDonnell (imacdonn) Status: In Progress ** Also affects: cinder Importance: Undecided Status: New ** Description changed: When online_data_migrations raise exceptions, nova/cinder-manage catches the exception, prints a fairly useless "something didn't work" message, and moves on. Two issues: 1) The user(/admin) has no way to see what actually failed (exception is not logged) - 2) The command returns exit status 0, as if all possible migrations have been completed successfully + 2) The command returns exit status 0, as if all possible migrations have been completed successfully - this can cause failures to get missed, especially if automated ** Changed in: nova Assignee: (unassigned) => iain MacDonnell (imacdonn) ** Changed in: nova Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1796192 Title: online_data_migrations exceptions quietly masked Status in Cinder: New Status in OpenStack Compute (nova): In Progress Bug description: When online_data_migrations raise exceptions, nova/cinder-manage catches the exception, prints a fairly useless "something didn't work" message, and moves on. Two issues: 1) The user(/admin) has no way to see what actually failed (exception is not logged) 2) The command returns exit status 0, as if all possible migrations have been completed successfully - this can cause failures to get missed, especially if automated To manage notifications about this bug go to: https://bugs.launchpad.net/cinder/+bug/1796192/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1794364] [NEW] 'nova-manage db online_data_migrations' count fail
Public bug reported: 'nova-manage db online_data_migrations' attempts to display summary counts of migrations "Needed" and "Completed" in a pretty table at the end, but fails to accumulate the totals between successive invocations of _run_migration(), and ends up reporting zeroes. # nova-manage db online_data_migrations /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:332: NotSupportedWarning: Configuration option(s) ['use_tpool'] not supported exception.NotSupportedWarning Running batches of 50 until complete /usr/lib/python2.7/site-packages/pymysql/cursors.py:166: Warning: (3090, u"Changing sql mode 'NO_AUTO_CREATE_USER' is deprecated. It will be removed in a future release.") result = self._query(query) 2 rows matched query migrate_instances_add_request_spec, 0 migrated 13 rows matched query migrate_quota_limits_to_api_db, 13 migrated 37 rows matched query populate_uuids, 37 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 50 rows matched query populate_uuids, 50 migrated 21 rows matched query populate_uuids, 21 migrated +-+--+---+ | Migration | Total Needed | Completed | +-+--+---+ | delete_build_requests_with_no_instance_uuid | 0 | 0 | |migrate_aggregate_reset_autoincrement| 0 | 0 | | migrate_aggregates | 0 | 0 | | migrate_instance_groups_to_api_db | 0 | 0 | | migrate_instances_add_request_spec | 0 | 0 | | migrate_keypairs_to_api_db | 0 | 0 | | migrate_quota_classes_to_api_db | 0 | 0 | |migrate_quota_limits_to_api_db | 0 | 0 | | migration_migrate_to_uuid | 0 | 0 | |populate_uuids | 0 | 0 | | service_uuids_online_data_migration | 0 | 0 | +-+--+---+ ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1794364 Title: 'nova-manage db online_data_migrations' count fail Status in OpenStack Compute (nova): New Bug description: 'nova-manage db online_data_migrations' attempts to display summary counts of migrations "Needed" and "Completed" in a pretty table at the end, but fails to
[Yahoo-eng-team] [Bug 1782448] Re: memcache client super() fail
Need to apply the same fix as in https://review.openstack.org/#/c/175291/ ** Project changed: keystone => keystonemiddleware ** Changed in: keystonemiddleware Assignee: (unassigned) => iain MacDonnell (imacdonn) ** Description changed: + Applies to Pike, but not later releases... + When configured to use the memcache connection pool, clients (e.g. neutron-server) fail with: 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors Traceback (most recent call last): 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/oslo_middleware/catch_errors.py", line 40, in __call__ 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors response = req.get_response(self.application) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/webob/request.py", line 1316, in send 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors application, catch_exc_info=False) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/webob/request.py", line 1280, in call_application 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors app_iter = application(self.environ, start_response) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/webob/dec.py", line 131, in __call__ 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors resp = self.call_func(req, *args, **self.kwargs) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/webob/dec.py", line 196, in call_func 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors return self.func(req, *args, **kwargs) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 331, in __call__ 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors response = self.process_request(req) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 622, in process_request 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors resp = super(AuthProtocol, self).process_request(request) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 404, in process_request 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors allow_expired=allow_expired) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 434, in _do_fetch_token 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors data = self.fetch_token(token, **kwargs) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 736, in fetch_token 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors cached = self._cache_get_hashes(token_hashes) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 719, in _cache_get_hashes 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors cached = self._token_cache.get(token) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_cache.py", line 214, in get 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors with self._cache_pool.reserve() as cache: 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors return self.gen.next() 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_cache.py", line 98, in reserve 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors with self._pool.get() as client: 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/eventlet/queue.py", line 295, in get 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors return self._get() 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_memcache_pool.py", line 149, in _get 2018-07-18
[Yahoo-eng-team] [Bug 1782448] [NEW] memcache client super() fail
4810 ERROR oslo_middleware.catch_errors conn = self._create_connection() 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_memcache_pool.py", line 143, in _create_connection 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors socket_timeout=self._socket_timeout) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/memcache.py", line 223, in __init__ 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors super(Client, self).__init__() 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors TypeError: super(type, obj): obj must be an instance or subtype of type ** Affects: keystonemiddleware Importance: Undecided Assignee: iain MacDonnell (imacdonn) Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1782448 Title: memcache client super() fail Status in keystonemiddleware: New Bug description: Applies to Pike, but not later releases... When configured to use the memcache connection pool, clients (e.g. neutron-server) fail with: 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors Traceback (most recent call last): 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/oslo_middleware/catch_errors.py", line 40, in __call__ 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors response = req.get_response(self.application) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/webob/request.py", line 1316, in send 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors application, catch_exc_info=False) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/webob/request.py", line 1280, in call_application 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors app_iter = application(self.environ, start_response) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/webob/dec.py", line 131, in __call__ 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors resp = self.call_func(req, *args, **self.kwargs) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/webob/dec.py", line 196, in call_func 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors return self.func(req, *args, **kwargs) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 331, in __call__ 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors response = self.process_request(req) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 622, in process_request 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors resp = super(AuthProtocol, self).process_request(request) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 404, in process_request 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors allow_expired=allow_expired) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 434, in _do_fetch_token 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors data = self.fetch_token(token, **kwargs) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 736, in fetch_token 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors cached = self._cache_get_hashes(token_hashes) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", line 719, in _cache_get_hashes 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors cached = self._token_cache.get(token) 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors File "/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_cache.py", line 214, in get 2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors with self._cache_pool.reserve() as cache: 2018-07-18 21:00:32.568 14810 ERROR oslo_midd
[Yahoo-eng-team] [Bug 1757207] Re: compute resource providers not equal to compute nodes in deployment
Shouldn't the upgrade check simply discount deleted compute_nodes ? +---+ | Check: Resource Providers | | Result: Warning | | Details: There are 34 compute resource providers and 40 compute nodes | | in the deployment. Ideally the number of compute resource | | providers should equal the number of enabled compute nodes | | otherwise the cloud may be underutilized. See | | https://docs.openstack.org/nova/latest/user/placement.html | | for more details. | +---+ mysql> select count(*) from compute_nodes; +--+ | count(*) | +--+ | 40 | +--+ 1 row in set (0.00 sec) mysql> select count(*) from compute_nodes where deleted=0; +--+ | count(*) | +--+ | 34 | +--+ 1 row in set (0.00 sec) ** Changed in: nova Status: Expired => Incomplete -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1757207 Title: compute resource providers not equal to compute nodes in deployment Status in OpenStack Compute (nova): Incomplete Bug description: Description === When I execute the command `nova-status upgrade check`, output: nova-status upgrade check +--+ | Upgrade Check Results | +--+ | Check: Cells v2 | | Result: Success | | Details: None | +--+ | Check: Placement API | | Result: Success | | Details: None | +--+ | Check: Resource Providers | | Result: Warning | | Details: There are 4 compute resource providers and 15 compute nodes | | in the deployment. Ideally the number of compute resource | | providers should equal the number of enabled compute nodes | | otherwise the cloud may be underutilized. See | | http://docs.openstack.org/developer/nova/placement.html | | for more details. | +--+ Steps to reproduce == How to replicate this? Remove the hosts from the openstack controller: nova hypervisor-list nova service-delete {id} Then run: su -s /bin/sh -c "nova-manage cell_v2 discover_hosts --verbose" nova The deleted compute node will be added again as a new node. run: nova-status upgrade check Expected result === No warning when you run: nova-status upgrade check Actual result = You can find the warning. This causes issue with placement of new VM's. The compute host which was deleted and added again will not be considered during VM scheduling and placement. Environment === OpenStack Pike release Neutron Networking which is default. Logs and Configs Config as the Openstack documentation. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1757207/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1756465] [NEW] Need glance v2 way to register image by location
Public bug reported: The release notes for Queens state "With the introduction of the web- download import method, we consider the Image Service v2 API to have reached feature parity with the DEPRECATED v1 API in all important respects.", but v2 does NOT provide any way to register an image by location. Before v1 gets removed, there needs to be something similar to web- download, but referencing the remote URL, rather than copying from it - web-location?. This may be considered insecure on the internet, but for a private cloud on a protected intranet, it's acceptable, and more efficient when multiple glance instances need to make use of images that are already published on an internal HTTP server. See also bug 1750892, which could enable a workaround, but it's cumbersome, and requires the location to be exposed, which could present other security issues (even on an intranet). ** Affects: glance Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1756465 Title: Need glance v2 way to register image by location Status in Glance: New Bug description: The release notes for Queens state "With the introduction of the web- download import method, we consider the Image Service v2 API to have reached feature parity with the DEPRECATED v1 API in all important respects.", but v2 does NOT provide any way to register an image by location. Before v1 gets removed, there needs to be something similar to web- download, but referencing the remote URL, rather than copying from it - web-location?. This may be considered insecure on the internet, but for a private cloud on a protected intranet, it's acceptable, and more efficient when multiple glance instances need to make use of images that are already published on an internal HTTP server. See also bug 1750892, which could enable a workaround, but it's cumbersome, and requires the location to be exposed, which could present other security issues (even on an intranet). To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1756465/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1750892] [NEW] Image remains in queued status after location set via PATCH
Public bug reported: Pike release, with show_image_direct_url and show_multiple_locations enabled. Attempting to create an image using the HTTP backend with the glance v2 API. I create a new/blank image (goes into "queued" status), then set the location with: curl -g -i -X PATCH -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'User-Agent: imacdonn-getting-dangerous' -H 'X-Auth-Token: xxx' -H 'Content-Type: application/openstack-images-v2.1-json-patch' -d '[{"op":"replace", "path": "/locations", "value": [{"url": "http://my_http_server/cirros.img;, "metadata": {}}]}]' http://my_glance_api_endpoint:9292/v2/images/e5581f14-2d05-4ae7-8d78-9da42731a37e This results in the direct_url getting set correctly, and the size of the image is correctly determined, but the image remains in "queued" status. It should become "active". ** Affects: glance Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Glance. https://bugs.launchpad.net/bugs/1750892 Title: Image remains in queued status after location set via PATCH Status in Glance: New Bug description: Pike release, with show_image_direct_url and show_multiple_locations enabled. Attempting to create an image using the HTTP backend with the glance v2 API. I create a new/blank image (goes into "queued" status), then set the location with: curl -g -i -X PATCH -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*' -H 'User-Agent: imacdonn-getting-dangerous' -H 'X-Auth-Token: xxx' -H 'Content-Type: application/openstack-images-v2.1-json-patch' -d '[{"op":"replace", "path": "/locations", "value": [{"url": "http://my_http_server/cirros.img;, "metadata": {}}]}]' http://my_glance_api_endpoint:9292/v2/images/e5581f14-2d05-4ae7-8d78-9da42731a37e This results in the direct_url getting set correctly, and the size of the image is correctly determined, but the image remains in "queued" status. It should become "active". To manage notifications about this bug go to: https://bugs.launchpad.net/glance/+bug/1750892/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1749838] [NEW] Rescheduled instance with pre-existing port fails with PortInUse exception
*** This bug is a duplicate of bug 1597596 *** https://bugs.launchpad.net/bugs/1597596 Public bug reported: Attempting to create an instance that uses an existing neutron port, when the instance creation fails on the first compute node, and gets rescheduled to another compute node, the rescheduled attempt fails with a PortInUse exception. In case it matters, I'm using neutron ML2 with linuxbridge and the port is on a VLAN provider network. Steps to reproduce (starting with an AZ/aggregate with two functional compute nodes up and running): 1. Create a neutron port, and make a note of the ID (os port create --network XXX myport) 2. Inject a failure on the first node - e.g. by renaming the qemu binary 3. Create an instance, using the port created earlier (openstack server create --nic port-id=XXX --image cirros --flavor m1.tiny myvm) The instance will fail on the first node, and get rescheduled on the second, where it will fail with: 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager Traceback (most recent call last): 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1415, in _allocate_network_async 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager bind_host_id=bind_host_id) 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 855, in allocate_for_instance 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager context, instance, neutron, requested_networks) 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 573, in _validate_requested_port_ids 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager raise exception.PortInUse(port_id=request.port_id) 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager PortInUse: Port 9fd24371-e906-4af2-898a-eaef223abca9 is still in use. I've reproduced this on both Ocata and Pike. It does not seem to happen if the port is created by nova (i.e. openstack server create --nic net-id=XXX ...) This looks a bit like https://bugs.launchpad.net/nova/+bug/1308405 , but that's supposed to have been fixed long ago. ** Affects: nova Importance: Undecided Status: New ** Tags: compute neutron -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1749838 Title: Rescheduled instance with pre-existing port fails with PortInUse exception Status in OpenStack Compute (nova): New Bug description: Attempting to create an instance that uses an existing neutron port, when the instance creation fails on the first compute node, and gets rescheduled to another compute node, the rescheduled attempt fails with a PortInUse exception. In case it matters, I'm using neutron ML2 with linuxbridge and the port is on a VLAN provider network. Steps to reproduce (starting with an AZ/aggregate with two functional compute nodes up and running): 1. Create a neutron port, and make a note of the ID (os port create --network XXX myport) 2. Inject a failure on the first node - e.g. by renaming the qemu binary 3. Create an instance, using the port created earlier (openstack server create --nic port-id=XXX --image cirros --flavor m1.tiny myvm) The instance will fail on the first node, and get rescheduled on the second, where it will fail with: 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager Traceback (most recent call last): 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1415, in _allocate_network_async 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager bind_host_id=bind_host_id) 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 855, in allocate_for_instance 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager context, instance, neutron, requested_networks) 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager File "/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 573, in _validate_requested_port_ids 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager raise exception.PortInUse(port_id=request.port_id) 2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager PortInUse: Port 9fd24371-e906-4af2-898a-eaef223abca9 is still in use. I've reproduced this on both Ocata and Pike. It does not seem to happen if the port is created by nova (i.e. openstack server create --nic net-id=XXX ...) This looks a bit like https://bugs.launchpad.net/nova/+bug/1308405 , but that's supposed to have been fixed long ago. To manage notifications about this bug go to:
[Yahoo-eng-team] [Bug 1683972] Re: Overlapping iSCSI volume detach/attach can leave behind broken SCSI devices and multipath maps.
Resolved by Change-Id: I146a74f9f79c68a89677b9b26a324e06a35886f2 ** No longer affects: nova ** Changed in: os-brick Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1683972 Title: Overlapping iSCSI volume detach/attach can leave behind broken SCSI devices and multipath maps. Status in os-brick: Fix Released Bug description: This is fairly easy to reproduce by simultaneously launching and terminating several boot-from-volume instances on the same compute node, with a cinder back-end that takes some time to complete connection-termination (e.g. ZFSSA). The initial symptom is failed multipath maps, and kernel errors writing to SCSI devices. Later symptoms include failure to launch volume-backed instances due to multipath command errors. The issue is caused by a race-condition between the unplumbing of a volume being detached/disconnected, and the plumbing of another volume being attached to a different instance. For example, when an instance is terminated, compute.manager._shutdown_instance() calls driver.destroy(), then it calls volume_api.terminate_connection() for the volume(s). driver.destroy() is responsible for cleaning up devices on the compute node - in my case, the libvirt driver calls disconnect_volume() in os_brick.initiator.connectors.iscsi, which removes the multipath map and SCSI device(s) assocaited with each volume. volume_api.terminate_connection() then instructs cinder to stop presenting the volume to the connector (which translates to disassociating a LUN from the compute node's initiator on the back-end storage-device (iSCSI target)). The problem occurs when another thread is attaching a volume to another instance on the same compute node at the same time. That calls connect_volume() in os_brick.initiator.connectors.iscsi, which does an iSCSI rescan. If the cinder back-end has not yet removed the LUN/initiator association, the rescan picks it back up, and recreates the SCSI device and the multipath map on the compute node. Shortly thereafter, that SCSI device becomes unresponsive, but it (and the multipath map) never go away. To make matters worse, the cinder back-end may use the same LUN number for another volume in the future, but that LUN number (plus portal address) is still associated with the broken SCSI device and multipath map on the compute node, so the wrong multipath map may be picked up by a future volume attachment attempt. There is locking around the connect_volume() and disconnect_volume() functions in os_brick, but this is insufficient, because it doesn't extend over the cinder connection termination. I've been able to hack around this with a rudimentary lock on the parts of compute.manager that deal with volume detachment and connection termination, and the connect_volume() function in virt.libvirt.volume.iscsi. That has gotten me by on Icehouse for the last two years. I was surprised to find that the problem is still present in Ocata. The same workaround seems to be effective. I'm fairly sure that the way I've implemented it is not completely correct, though, so it should be implemented properly by someone more intimately familiar with all of the code-paths. To manage notifications about this bug go to: https://bugs.launchpad.net/os-brick/+bug/1683972/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1683972] Re: Overlapping iSCSI volume detach/attach can leave behind broken SCSI devices and multipath maps.
** Also affects: os-brick Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1683972 Title: Overlapping iSCSI volume detach/attach can leave behind broken SCSI devices and multipath maps. Status in OpenStack Compute (nova): New Status in os-brick: New Bug description: This is fairly easy to reproduce by simultaneously launching and terminating several boot-from-volume instances on the same compute node, with a cinder back-end that takes some time to complete connection-termination (e.g. ZFSSA). The initial symptom is failed multipath maps, and kernel errors writing to SCSI devices. Later symptoms include failure to launch volume-backed instances due to multipath command errors. The issue is caused by a race-condition between the unplumbing of a volume being detached/disconnected, and the plumbing of another volume being attached to a different instance. For example, when an instance is terminated, compute.manager._shutdown_instance() calls driver.destroy(), then it calls volume_api.terminate_connection() for the volume(s). driver.destroy() is responsible for cleaning up devices on the compute node - in my case, the libvirt driver calls disconnect_volume() in os_brick.initiator.connectors.iscsi, which removes the multipath map and SCSI device(s) assocaited with each volume. volume_api.terminate_connection() then instructs cinder to stop presenting the volume to the connector (which translates to disassociating a LUN from the compute node's initiator on the back-end storage-device (iSCSI target)). The problem occurs when another thread is attaching a volume to another instance on the same compute node at the same time. That calls connect_volume() in os_brick.initiator.connectors.iscsi, which does an iSCSI rescan. If the cinder back-end has not yet removed the LUN/initiator association, the rescan picks it back up, and recreates the SCSI device and the multipath map on the compute node. Shortly thereafter, that SCSI device becomes unresponsive, but it (and the multipath map) never go away. To make matters worse, the cinder back-end may use the same LUN number for another volume in the future, but that LUN number (plus portal address) is still associated with the broken SCSI device and multipath map on the compute node, so the wrong multipath map may be picked up by a future volume attachment attempt. There is locking around the connect_volume() and disconnect_volume() functions in os_brick, but this is insufficient, because it doesn't extend over the cinder connection termination. I've been able to hack around this with a rudimentary lock on the parts of compute.manager that deal with volume detachment and connection termination, and the connect_volume() function in virt.libvirt.volume.iscsi. That has gotten me by on Icehouse for the last two years. I was surprised to find that the problem is still present in Ocata. The same workaround seems to be effective. I'm fairly sure that the way I've implemented it is not completely correct, though, so it should be implemented properly by someone more intimately familiar with all of the code-paths. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1683972/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1684326] [NEW] MTU not set on nova instance's vif_type=bridge tap interface
Public bug reported: Using linuxbridge with VLAN networks with MTU<>1500, the nova instance's VIF's tap interface's MTU needs to get set to that of the network it's being plugged into, otherwise the first instance on a compute node gets a tap interface (and bridge) with MTU 1500, but the VM tries to do MTU 9000, and frames get dropped. Sequence on first instance launch goes like: * os_vif creates bridge (with initial MTU 1500) * libvirt creates the domain, which creates the tap interface and adds it to the bridge. The tap interface inherits the bridge's MTU of 1500 * The L2 agent notices that a new tap interface showed up, and ensures that the VLAN interface gets added to the bridge - the VLAN interface has MTU 9000 (inherited from the physical interface), but the bridge MTU remains at 1500 - the lowest amongs its member ports (i.e. the tap interface) If that instance is then destroyed, the tap interface goes away, and the bridge updates its MTU to the lowest amongst its members, which is now the VLAN interface - i.e. 9000. A second instance launch then picks up the bridge's MTU of 9000 work works fine. This was previously solved in the l2 agent under https://bugs.launchpad.net/networking-cisco/+bug/1443607, but the solution was reverted in https://git.openstack.org/cgit/openstack/neutron/commit/?id=d352661c56d5f03713e615b7e0c2c9c8688e0132 Re-implementation should probably get the MTU from the neutron network, rather than the VLAN interface. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1684326 Title: MTU not set on nova instance's vif_type=bridge tap interface Status in neutron: New Bug description: Using linuxbridge with VLAN networks with MTU<>1500, the nova instance's VIF's tap interface's MTU needs to get set to that of the network it's being plugged into, otherwise the first instance on a compute node gets a tap interface (and bridge) with MTU 1500, but the VM tries to do MTU 9000, and frames get dropped. Sequence on first instance launch goes like: * os_vif creates bridge (with initial MTU 1500) * libvirt creates the domain, which creates the tap interface and adds it to the bridge. The tap interface inherits the bridge's MTU of 1500 * The L2 agent notices that a new tap interface showed up, and ensures that the VLAN interface gets added to the bridge - the VLAN interface has MTU 9000 (inherited from the physical interface), but the bridge MTU remains at 1500 - the lowest amongs its member ports (i.e. the tap interface) If that instance is then destroyed, the tap interface goes away, and the bridge updates its MTU to the lowest amongst its members, which is now the VLAN interface - i.e. 9000. A second instance launch then picks up the bridge's MTU of 9000 work works fine. This was previously solved in the l2 agent under https://bugs.launchpad.net/networking-cisco/+bug/1443607, but the solution was reverted in https://git.openstack.org/cgit/openstack/neutron/commit/?id=d352661c56d5f03713e615b7e0c2c9c8688e0132 Re-implementation should probably get the MTU from the neutron network, rather than the VLAN interface. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1684326/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1683972] [NEW] Overlapping iSCSI volume detach/attach can leave behind broken SCSI devices and multipath maps.
Public bug reported: This is fairly easy to reproduce by simultaneously launching and terminating several boot-from-volume instances on the same compute node, with a cinder back-end that takes some time to complete connection- termination (e.g. ZFSSA). The initial symptom is failed multipath maps, and kernel errors writing to SCSI devices. Later symptoms include failure to launch volume-backed instances due to multipath command errors. The issue is caused by a race-condition between the unplumbing of a volume being detached/disconnected, and the plumbing of another volume being attached to a different instance. For example, when an instance is terminated, compute.manager._shutdown_instance() calls driver.destroy(), then it calls volume_api.terminate_connection() for the volume(s). driver.destroy() is responsible for cleaning up devices on the compute node - in my case, the libvirt driver calls disconnect_volume() in os_brick.initiator.connectors.iscsi, which removes the multipath map and SCSI device(s) assocaited with each volume. volume_api.terminate_connection() then instructs cinder to stop presenting the volume to the connector (which translates to disassociating a LUN from the compute node's initiator on the back-end storage-device (iSCSI target)). The problem occurs when another thread is attaching a volume to another instance on the same compute node at the same time. That calls connect_volume() in os_brick.initiator.connectors.iscsi, which does an iSCSI rescan. If the cinder back-end has not yet removed the LUN/initiator association, the rescan picks it back up, and recreates the SCSI device and the multipath map on the compute node. Shortly thereafter, that SCSI device becomes unresponsive, but it (and the multipath map) never go away. To make matters worse, the cinder back-end may use the same LUN number for another volume in the future, but that LUN number (plus portal address) is still associated with the broken SCSI device and multipath map on the compute node, so the wrong multipath map may be picked up by a future volume attachment attempt. There is locking around the connect_volume() and disconnect_volume() functions in os_brick, but this is insufficient, because it doesn't extend over the cinder connection termination. I've been able to hack around this with a rudimentary lock on the parts of compute.manager that deal with volume detachment and connection termination, and the connect_volume() function in virt.libvirt.volume.iscsi. That has gotten me by on Icehouse for the last two years. I was surprised to find that the problem is still present in Ocata. The same workaround seems to be effective. I'm fairly sure that the way I've implemented it is not completely correct, though, so it should be implemented properly by someone more intimately familiar with all of the code-paths. ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1683972 Title: Overlapping iSCSI volume detach/attach can leave behind broken SCSI devices and multipath maps. Status in OpenStack Compute (nova): New Bug description: This is fairly easy to reproduce by simultaneously launching and terminating several boot-from-volume instances on the same compute node, with a cinder back-end that takes some time to complete connection-termination (e.g. ZFSSA). The initial symptom is failed multipath maps, and kernel errors writing to SCSI devices. Later symptoms include failure to launch volume-backed instances due to multipath command errors. The issue is caused by a race-condition between the unplumbing of a volume being detached/disconnected, and the plumbing of another volume being attached to a different instance. For example, when an instance is terminated, compute.manager._shutdown_instance() calls driver.destroy(), then it calls volume_api.terminate_connection() for the volume(s). driver.destroy() is responsible for cleaning up devices on the compute node - in my case, the libvirt driver calls disconnect_volume() in os_brick.initiator.connectors.iscsi, which removes the multipath map and SCSI device(s) assocaited with each volume. volume_api.terminate_connection() then instructs cinder to stop presenting the volume to the connector (which translates to disassociating a LUN from the compute node's initiator on the back-end storage-device (iSCSI target)). The problem occurs when another thread is attaching a volume to another instance on the same compute node at the same time. That calls connect_volume() in os_brick.initiator.connectors.iscsi, which does an iSCSI rescan. If the cinder back-end has not yet removed the LUN/initiator association, the rescan picks it back up, and recreates the SCSI device and the multipath map on the compute node. Shortly
[Yahoo-eng-team] [Bug 1633249] [NEW] Boot volume creation leaves secondary volume attached to broken server
Public bug reported: Attempt to boot a server with a block device mapping that includes a boot volume created from an image, plus an existing data volume. If the boot-volume creation fails, the data volume is left in state "in-use", attached to the server which is now in "error" state". The user can't detach the volume because of the server's error state. They can delete the server, which then leaves the volume apparently attached to a server that no longer exists. The only way out of this is to ask an administrator to reset the state of the data volume (this option is not available to regular users by default policy). The easiest way to reproduce this is to attempt to create the boot volume from qcow2 image where the volume size is less than the image (virtual) size. ~$ cinder list +--+---+--+--+-+--+-+ | ID | Status| Name | Size | Volume Type | Bootable | Attached to | +--+---+--+--+-+--+-+ | 2e733722-8b19-4bff-bd8d-bb770554582a | available | data | 1| - | false| | +--+---+--+--+-+--+-+ ~$ nova boot --flavor m1.large --availability-zone=imot04-1 --block-device 'id=9e122d18-d7a4-406d-b8f2-446cfddaa7c7,source=image,dest=volume,device=vda,size=5,bootindex=0' --block-device 'id=2e733722-8b19-4bff-bd8d-bb770554582a,source=volume,dest=volume,device=vdb,size=1,bootindex=1' ol4 +--+--+ | Property | Value | +--+--+ | OS-DCF:diskConfig| MANUAL | | OS-EXT-AZ:availability_zone | imot04-1 | | OS-EXT-SRV-ATTR:host | - | | OS-EXT-SRV-ATTR:hypervisor_hostname | - | | OS-EXT-SRV-ATTR:instance_name| | | OS-EXT-STS:power_state | 0 | | OS-EXT-STS:task_state| scheduling | | OS-EXT-STS:vm_state | building | | OS-SRV-USG:launched_at | - | | OS-SRV-USG:terminated_at | - | | accessIPv4 | | | accessIPv6 | | | adminPass| DNTr8MG3kVmC | | config_drive | | | created | 2016-10-13T21:54:08Z | | flavor | m1.large (4) | | hostId | | | id | 9541b63c-e003-4bcc-bcb8-5c0461522387 | | image| Attempt to boot from volume - no image supplied | | key_name | - | | metadata | {} | | name | ol4 | | os-extended-volumes:volumes_attached | [{"id": "2e733722-8b19-4bff-bd8d-bb770554582a"}] | | progress | 0 | | security_groups | default | | status | BUILD | | tenant_id| 66234fea2ccc42398a1ae5300c594d49 | | updated | 2016-10-13T21:54:08Z | | user_id | b2ae6b7bdac142ddb708a3550f61d998 | +--+--+ ~$ cinder list +--+--+--+--+-+--+--+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to |
[Yahoo-eng-team] [Bug 1293693] Re: libvirt OVS VLAN tag not set
Per the problem description, it is the nova VIF driver that sets the external-ids on the OVS port. Neutron later picks up that information to set the VLAN tag. It is nova that's not doing its part. ** Also affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1293693 Title: libvirt OVS VLAN tag not set Status in OpenStack Neutron (virtual network service): New Status in OpenStack Compute (Nova): New Bug description: Trying to use icehouse, libvirt-Xen, OpenVswitch 1.11.0, with VLAN tagging. Problem is that networking is non-functional on instance launch. 'ovs- vsctl show' output shows that the tap interface for the instance does not have the appropriate (internal) VLAN tag (no tag is set). Consequently, the instance is unable to obtain an IP address from DHCP, etc. Setting the tag manually with 'ovs-vsctl set port tapXXX tag=1' is a workaround (but not a very good one). Exploring this, I find that the neutron OVS agent scans the OVS ports and examines the 'external-ids' to see which ones are of interest. When it sees a new port that is of interest, it sets the VLAN tag as required. In my case, the VIF port that's added when an instance is launched has empty 'external-ids', and so the agent ignores it. The port is getting added to the OVS integration bridge by the Xen scripts, but the 'external-ids' are not getting set (Xen knows nothing about this part). Looking further; when nova.conf has 'firewall_driver=nova.virt.firewall.NoopFirewallDriver', the LibvirtBaseVIFDriver (nova/virt/libvirt/vif.py) uses function plug_ovs_bridge(), which is a no-op. When firewall_driver=nova.virt.libvirt.firewall.IptablesFirewallDriver, a different function, plug_ovs_hybrid(), is used. When OVS is older than version 0.9.11, a function called plug_ovs_ethernet() is used. Both plug_ovs_hybrid() and plug_ovs_ethernet() call linux_net.create_ovs_vif_port(), and that's where the 'external-ids' get set. I tried modifying plug_ovs_bridge() to call linux_net.create_ovs_vif_port(), but that causes the Xen hotplug scripts to fail (ovs-vsctl: cannot create a port named tap3ccfe10f-c4 because a port named tap3ccfe10f-c4 already exists on bridge br-int) When the Noop firewall_driver is used in conjunction with newer OVS, something needs to set the 'external-ids' on the VIF port so that the neutron agent will see it and set the VLAN tag. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1293693/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1273496] [NEW] libvirt iSCSI driver sets is_block_dev=False
Public bug reported: Trying to use iSCSI with libvirt/Xen, attaching volumes to instances was failing. I tracked this down to the libvirt XML looking like: disk type=block device=disk driver name=file type=raw cache=none/ source dev=/dev/disk/by-path/ip-192.168.8.11:3260-iscsi-iqn.1986-03.com.sun:02:ecd142ab-b1c7-6bcf-8f91-f55b6c766bcc-lun-0/ target bus=xen dev=xvdb/ seriale8c640c6-641b-4940-88f2-79555cdd5551/serial /disk The driver name should be phy, not file. More digging lead to the iSCSI volume driver in nova/virt/libvirt/volume.py, which does: class LibvirtISCSIVolumeDriver(LibvirtBaseVolumeDriver): Driver to attach Network volumes to libvirt. def __init__(self, connection): super(LibvirtISCSIVolumeDriver, self).__init__(connection, is_block_dev=False) Surely is_block_dev should be True for iSCSI?? Changing this makes the problem go away - now pick_disk_driver_name() in nova/virt/libvirt/utils.py does the right thing and my volume attaches successfully. Am I missing something here... ? ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1273496 Title: libvirt iSCSI driver sets is_block_dev=False Status in OpenStack Compute (Nova): New Bug description: Trying to use iSCSI with libvirt/Xen, attaching volumes to instances was failing. I tracked this down to the libvirt XML looking like: disk type=block device=disk driver name=file type=raw cache=none/ source dev=/dev/disk/by-path/ip-192.168.8.11:3260-iscsi-iqn.1986-03.com.sun:02:ecd142ab-b1c7-6bcf-8f91-f55b6c766bcc-lun-0/ target bus=xen dev=xvdb/ seriale8c640c6-641b-4940-88f2-79555cdd5551/serial /disk The driver name should be phy, not file. More digging lead to the iSCSI volume driver in nova/virt/libvirt/volume.py, which does: class LibvirtISCSIVolumeDriver(LibvirtBaseVolumeDriver): Driver to attach Network volumes to libvirt. def __init__(self, connection): super(LibvirtISCSIVolumeDriver, self).__init__(connection, is_block_dev=False) Surely is_block_dev should be True for iSCSI?? Changing this makes the problem go away - now pick_disk_driver_name() in nova/virt/libvirt/utils.py does the right thing and my volume attaches successfully. Am I missing something here... ? To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1273496/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp