[Yahoo-eng-team] [Bug 1830295] Re: devstack py3 get_link_devices() KeyError: 'index'

2019-05-28 Thread iain MacDonnell
Reinstalling oslo.privsep seems to have "fixed" it on the 16.04 box too.
Still don't understand why - must have been some bad cache or something.
Will put this down to "gremlins" unless it resurfaces :/

** Changed in: neutron
   Status: New => Invalid

** Changed in: oslo.privsep
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1830295

Title:
  devstack py3 get_link_devices() KeyError: 'index'

Status in neutron:
  Invalid
Status in oslo.privsep:
  Invalid

Bug description:
  devstack master with py3. openvswitch agent has suddenly stopped
  working, with no change in config or environment (other than
  rebuilding devstack). Stack trace below. For some reason (yet
  undetermined), privileged.get_link_devices() now seems to be returning
  byte arrays instead of strings as the dict keys:

  >>> from neutron.privileged.agent.linux import ip_lib as privileged
  >>> privileged.get_link_devices(None)[0].keys() 
  dict_keys([b'index', b'family', b'__align', b'header', b'flags', b'ifi_type', 
b'event', b'change', b'attrs'])
  >>> 

  
  From agent startup:

  neutron-openvswitch-agent[42936]: ERROR neutron Traceback (most recent call 
last):
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/bin/neutron-openvswitch-agent", line 10, in 
  neutron-openvswitch-agent[42936]: ERROR neutron sys.exit(main())
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 
20, in main
  neutron-openvswitch-agent[42936]: ERROR neutron agent_main.main()
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", 
line 47, in main
  neutron-openvswitch-agent[42936]: ERROR neutron mod.main()
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py",
 line 35, in main
  neutron-openvswitch-agent[42936]: ERROR neutron 
'neutron.plugins.ml2.drivers.openvswitch.agent.'
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/os_ken/base/app_manager.py", line 375, 
in run_apps
  neutron-openvswitch-agent[42936]: ERROR neutron hub.joinall(services)  
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 102, in joinall
  neutron-openvswitch-agent[42936]: ERROR neutron t.wait()
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 180, in 
wait
  neutron-openvswitch-agent[42936]: ERROR neutron return 
self._exit_event.wait()
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/eventlet/event.py", line 132, in wait
  neutron-openvswitch-agent[42936]: ERROR neutron current.throw(*self._exc)
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 219, in 
main
  neutron-openvswitch-agent[42936]: ERROR neutron result = function(*args, 
**kwargs)
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 64, in _launch
  neutron-openvswitch-agent[42936]: ERROR neutron raise e
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 59, in _launch
  neutron-openvswitch-agent[42936]: ERROR neutron return func(*args, 
**kwargs)
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py",
 line 43, in agent_main_wrapper
  neutron-openvswitch-agent[42936]: ERROR neutron LOG.exception("Agent main 
thread died of an exception")
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
  neutron-openvswitch-agent[42936]: ERROR neutron self.force_reraise() 
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  neutron-openvswitch-agent[42936]: ERROR neutron six.reraise(self.type_, 
self.value, self.tb)
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
  neutron-openvswitch-agent[42936]: ERROR neutron raise value
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py",
 line 40, in agent_main_wrapper
  neutron-openvswitch-agent[42936]: ERROR neutron 
ovs_agent.main(bridge_classes)
  

[Yahoo-eng-team] [Bug 1830295] Re: devstack py3 get_link_devices() KeyError: 'index'

2019-05-24 Thread iain MacDonnell
Yeah... downgrading oslo.privsep from 1.33.0 to 1.32.1 makes the problem
go away.

** Also affects: oslo.privsep
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1830295

Title:
  devstack py3 get_link_devices() KeyError: 'index'

Status in neutron:
  New
Status in oslo.privsep:
  New

Bug description:
  devstack master with py3. openvswitch agent has suddenly stopped
  working, with no change in config or environment (other than
  rebuilding devstack). Stack trace below. For some reason (yet
  undetermined), privileged.get_link_devices() now seems to be returning
  byte arrays instead of strings as the dict keys:

  >>> from neutron.privileged.agent.linux import ip_lib as privileged
  >>> privileged.get_link_devices(None)[0].keys() 
  dict_keys([b'index', b'family', b'__align', b'header', b'flags', b'ifi_type', 
b'event', b'change', b'attrs'])
  >>> 

  
  From agent startup:

  neutron-openvswitch-agent[42936]: ERROR neutron Traceback (most recent call 
last):
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/bin/neutron-openvswitch-agent", line 10, in 
  neutron-openvswitch-agent[42936]: ERROR neutron sys.exit(main())
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 
20, in main
  neutron-openvswitch-agent[42936]: ERROR neutron agent_main.main()
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", 
line 47, in main
  neutron-openvswitch-agent[42936]: ERROR neutron mod.main()
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py",
 line 35, in main
  neutron-openvswitch-agent[42936]: ERROR neutron 
'neutron.plugins.ml2.drivers.openvswitch.agent.'
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/os_ken/base/app_manager.py", line 375, 
in run_apps
  neutron-openvswitch-agent[42936]: ERROR neutron hub.joinall(services)  
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 102, in joinall
  neutron-openvswitch-agent[42936]: ERROR neutron t.wait()
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 180, in 
wait
  neutron-openvswitch-agent[42936]: ERROR neutron return 
self._exit_event.wait()
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/eventlet/event.py", line 132, in wait
  neutron-openvswitch-agent[42936]: ERROR neutron current.throw(*self._exc)
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 219, in 
main
  neutron-openvswitch-agent[42936]: ERROR neutron result = function(*args, 
**kwargs)
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 64, in _launch
  neutron-openvswitch-agent[42936]: ERROR neutron raise e
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 59, in _launch
  neutron-openvswitch-agent[42936]: ERROR neutron return func(*args, 
**kwargs)
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py",
 line 43, in agent_main_wrapper
  neutron-openvswitch-agent[42936]: ERROR neutron LOG.exception("Agent main 
thread died of an exception")
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
  neutron-openvswitch-agent[42936]: ERROR neutron self.force_reraise() 
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  neutron-openvswitch-agent[42936]: ERROR neutron six.reraise(self.type_, 
self.value, self.tb)
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
  neutron-openvswitch-agent[42936]: ERROR neutron raise value
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py",
 line 40, in agent_main_wrapper
  neutron-openvswitch-agent[42936]: ERROR neutron 
ovs_agent.main(bridge_classes)
  neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 2393, in main
  

[Yahoo-eng-team] [Bug 1830295] [NEW] devstack py3 get_link_devices() KeyError: 'index'

2019-05-23 Thread iain MacDonnell
Public bug reported:

devstack master with py3. openvswitch agent has suddenly stopped
working, with no change in config or environment (other than rebuilding
devstack). Stack trace below. For some reason (yet undetermined),
privileged.get_link_devices() now seems to be returning byte arrays
instead of strings as the dict keys:

>>> from neutron.privileged.agent.linux import ip_lib as privileged
>>> privileged.get_link_devices(None)[0].keys() 
dict_keys([b'index', b'family', b'__align', b'header', b'flags', b'ifi_type', 
b'event', b'change', b'attrs'])
>>> 


>From agent startup:

neutron-openvswitch-agent[42936]: ERROR neutron Traceback (most recent call 
last):
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/bin/neutron-openvswitch-agent", line 10, in 
neutron-openvswitch-agent[42936]: ERROR neutron sys.exit(main())
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 
20, in main
neutron-openvswitch-agent[42936]: ERROR neutron agent_main.main()
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", 
line 47, in main
neutron-openvswitch-agent[42936]: ERROR neutron mod.main()
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py",
 line 35, in main
neutron-openvswitch-agent[42936]: ERROR neutron 
'neutron.plugins.ml2.drivers.openvswitch.agent.'
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/os_ken/base/app_manager.py", line 375, 
in run_apps
neutron-openvswitch-agent[42936]: ERROR neutron hub.joinall(services)  
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 102, in joinall
neutron-openvswitch-agent[42936]: ERROR neutron t.wait()
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 180, in 
wait
neutron-openvswitch-agent[42936]: ERROR neutron return 
self._exit_event.wait()
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/eventlet/event.py", line 132, in wait
neutron-openvswitch-agent[42936]: ERROR neutron current.throw(*self._exc)
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/eventlet/greenthread.py", line 219, in 
main
neutron-openvswitch-agent[42936]: ERROR neutron result = function(*args, 
**kwargs)
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 64, in _launch
neutron-openvswitch-agent[42936]: ERROR neutron raise e
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/os_ken/lib/hub.py", line 59, in _launch
neutron-openvswitch-agent[42936]: ERROR neutron return func(*args, **kwargs)
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py",
 line 43, in agent_main_wrapper
neutron-openvswitch-agent[42936]: ERROR neutron LOG.exception("Agent main 
thread died of an exception")
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 220, in 
__exit__
neutron-openvswitch-agent[42936]: ERROR neutron self.force_reraise() 
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
neutron-openvswitch-agent[42936]: ERROR neutron six.reraise(self.type_, 
self.value, self.tb)
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
neutron-openvswitch-agent[42936]: ERROR neutron raise value
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py",
 line 40, in agent_main_wrapper
neutron-openvswitch-agent[42936]: ERROR neutron 
ovs_agent.main(bridge_classes)
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 2393, in main
neutron-openvswitch-agent[42936]: ERROR neutron 
validate_tunnel_config(cfg.CONF.AGENT.tunnel_types, cfg.CONF.OVS.local_ip)
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 2362, in validate_tunnel_config
neutron-openvswitch-agent[42936]: ERROR neutron validate_local_ip(local_ip)
neutron-openvswitch-agent[42936]: ERROR neutron   File 
"/opt/stack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 2350, in 

[Yahoo-eng-team] [Bug 1825584] [NEW] eventlet monkey-patching breaks AMQP heartbeat on uWSGI

2019-04-19 Thread iain MacDonnell
Public bug reported:

Stein nova-api running under uWSGI presents an AMQP issue. The first API
call that requires RPC creates an AMQP connection and successfully
completes. Normally regular heartbeats would be sent from this point on,
to maintain the connection. This is not happening. After a few minutes,
the AMQP server (rabbitmq, in my case) notices that there have been no
heartbeats, and drops the connection. A later nova API call that
requires RPC tries to use the old connection, and throws a "connection
reset by peer" exception and the API call fails. A mailing-list reponse
suggests that this is affecting mod_wsgi also:

http://lists.openstack.org/pipermail/openstack-
discuss/2019-April/005310.html

I've discovered that this problem seems to be caused by eventlet monkey-
patching, which was introduced in:

https://github.com/openstack/nova/commit/23ba1c690652832c655d57476630f02c268c87ae

It was later rearranged in:

https://github.com/openstack/nova/commit/3c5e2b0e9fac985294a949852bb8c83d4ed77e04

but this problem remains.

If I comment out the import of nova.monkey_patch in
nova/api/openstack/__init__.py the problem goes away.

Seems that eventlet monkey-patching and uWSGI are not getting along for
some reason...

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1825584

Title:
  eventlet monkey-patching breaks AMQP heartbeat on uWSGI

Status in OpenStack Compute (nova):
  New

Bug description:
  Stein nova-api running under uWSGI presents an AMQP issue. The first
  API call that requires RPC creates an AMQP connection and successfully
  completes. Normally regular heartbeats would be sent from this point
  on, to maintain the connection. This is not happening. After a few
  minutes, the AMQP server (rabbitmq, in my case) notices that there
  have been no heartbeats, and drops the connection. A later nova API
  call that requires RPC tries to use the old connection, and throws a
  "connection reset by peer" exception and the API call fails. A
  mailing-list reponse suggests that this is affecting mod_wsgi also:

  http://lists.openstack.org/pipermail/openstack-
  discuss/2019-April/005310.html

  I've discovered that this problem seems to be caused by eventlet
  monkey-patching, which was introduced in:

  
https://github.com/openstack/nova/commit/23ba1c690652832c655d57476630f02c268c87ae

  It was later rearranged in:

  
https://github.com/openstack/nova/commit/3c5e2b0e9fac985294a949852bb8c83d4ed77e04

  but this problem remains.

  If I comment out the import of nova.monkey_patch in
  nova/api/openstack/__init__.py the problem goes away.

  Seems that eventlet monkey-patching and uWSGI are not getting along
  for some reason...

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1825584/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1824435] [NEW] fill_virtual_interface_list migration fails on second attempt

2019-04-11 Thread iain MacDonnell
Public bug reported:

On attempting to run online_data_migrations on Stein for the second time
(and beyond), fill_virtual_interface_list fails as below. I find two
rows in the security_groups table which have name='default' and
project_id NULL.

2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage 
[req-6cb533e2-58b5-41db-a455-29dae8efef31 - - - - -] Error attempting to run 
: TypeError: 'NoneType' 
object has no attribute '__getitem__'
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage Traceback (most recent call 
last):
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 686, in 
_run_migration
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage found, done = 
migration_meth(ctxt, count)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 
1012, in wrapper
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return fn(*args, 
**kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/nova/objects/virtual_interface.py", line 279, 
in fill_virtual_interface_list
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage 
_set_or_delete_marker_for_migrate_instances(cctxt, marker)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in 
wrapped
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(context, 
*args, **kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/nova/objects/virtual_interface.py", line 305, 
in _set_or_delete_marker_for_migrate_instances
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage instance.create()
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in 
wrapper
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return fn(self, *args, 
**kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 600, in create
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage db_inst = 
db.instance_create(self._context, updates)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/nova/db/api.py", line 748, in instance_create
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return 
IMPL.instance_create(context, values)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 170, in 
wrapper
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(*args, 
**kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage ectxt.value = 
e.inner_exc
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage self.force_reraise()
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage six.reraise(self.type_, 
self.value, self.tb)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(*args, 
**kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in 
wrapped
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(context, 
*args, **kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 1728, in 
instance_create
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage 
security_group_ensure_default(context)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 4039, in 
security_group_ensure_default
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return 
_security_group_ensure_default(context)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in 
wrapped
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage return f(context, 
*args, **kwargs)
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage   File 
"/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 4050, in 
_security_group_ensure_default
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage default_group = 
_security_group_get_by_names(context, ['default'])[0]
2019-04-11 03:51:27.632 22147 ERROR nova.cmd.manage 

[Yahoo-eng-team] [Bug 1816831] [NEW] DOC: typo in add_initial_allocation_ratio releasenote

2019-02-20 Thread iain MacDonnell
Public bug reported:


https://github.com/openstack/nova/blob/master/releasenotes/notes/add_initial_allocation_ratio-2d2666d62426a4bf.yaml

- initial_cpu_allocation_ratio with default value 16.0
- initial_ram_allocation_ratio with default value 1.5
- initial_ram_allocation_ratio with default value 1.0

The third one should be "initial_disk_allocation_ratio".

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: doc releasenote

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1816831

Title:
  DOC: typo in add_initial_allocation_ratio releasenote

Status in OpenStack Compute (nova):
  New

Bug description:
  
  
https://github.com/openstack/nova/blob/master/releasenotes/notes/add_initial_allocation_ratio-2d2666d62426a4bf.yaml

  - initial_cpu_allocation_ratio with default value 16.0
  - initial_ram_allocation_ratio with default value 1.5
  - initial_ram_allocation_ratio with default value 1.0

  The third one should be "initial_disk_allocation_ratio".

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1816831/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1804075] [NEW] [doc] multistore incorrect options group for default_backend

2018-11-19 Thread iain MacDonnell
Public bug reported:

https://docs.openstack.org/glance/latest/admin/multistores.html
describes the default_backend option in the [glance_store] options
group, in the example incorrectly shows the [DEFAULT] options group.

** Affects: glance
 Importance: Undecided
 Assignee: iain MacDonnell (imacdonn)
 Status: In Progress

** Changed in: glance
 Assignee: (unassigned) => iain MacDonnell (imacdonn)

** Changed in: glance
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1804075

Title:
  [doc] multistore incorrect options group for default_backend

Status in Glance:
  In Progress

Bug description:
  https://docs.openstack.org/glance/latest/admin/multistores.html
  describes the default_backend option in the [glance_store] options
  group, in the example incorrectly shows the [DEFAULT] options group.

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1804075/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1794364] Re: 'nova-manage db online_data_migrations' count fail

2018-10-16 Thread iain MacDonnell
** Also affects: cinder
   Importance: Undecided
   Status: New

** Changed in: cinder
 Assignee: (unassigned) => iain MacDonnell (imacdonn)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1794364

Title:
  'nova-manage db online_data_migrations' count fail

Status in Cinder:
  New
Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Fix Committed
Status in OpenStack Compute (nova) pike series:
  Fix Committed
Status in OpenStack Compute (nova) queens series:
  Fix Committed
Status in OpenStack Compute (nova) rocky series:
  Fix Committed

Bug description:
  
  'nova-manage db online_data_migrations' attempts to display summary counts of 
migrations "Needed" and "Completed" in a pretty table at the end, but fails to 
accumulate the totals between successive invocations of _run_migration(), and 
ends up reporting zeroes.

  # nova-manage db online_data_migrations 
  /usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:332: 
NotSupportedWarning: Configuration option(s) ['use_tpool'] not supported
exception.NotSupportedWarning
  Running batches of 50 until complete
  /usr/lib/python2.7/site-packages/pymysql/cursors.py:166: Warning: (3090, 
u"Changing sql mode 'NO_AUTO_CREATE_USER' is deprecated. It will be removed in 
a future release.")
result = self._query(query)
  2 rows matched query migrate_instances_add_request_spec, 0 migrated
  13 rows matched query migrate_quota_limits_to_api_db, 13 migrated
  37 rows matched query populate_uuids, 37 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  50 rows matched query populate_uuids, 50 migrated
  21 rows matched query populate_uuids, 21 migrated
  +-+--+---+
  |  Migration  | Total Needed | Completed |
  +-+--+---+
  | delete_build_requests_with_no_instance_uuid |  0   | 0 |
  |migrate_aggregate_reset_autoincrement|  0   | 0 |
  |  migrate_aggregates |  0   | 0 |
  |  migrate_instance_groups_to_api_db  |  0   | 0 |
  |  migrate_instances_add_request_spec |  0   | 0 |
  |  migrate_keypairs_to_api_db |  0   | 0 |
  |   migrate_quota_classes_to_api_db   |  0   | 0 |
  | 

[Yahoo-eng-team] [Bug 1796192] [NEW] online_data_migrations exceptions quietly masked

2018-10-04 Thread iain MacDonnell
Public bug reported:

When online_data_migrations raise exceptions, nova/cinder-manage catches
the exception, prints a fairly useless "something didn't work" message,
and moves on. Two issues:

1) The user(/admin) has no way to see what actually failed (exception is not 
logged)
2) The command returns exit status 0, as if all possible migrations have been 
completed successfully - this can cause failures to get missed, especially if 
automated

** Affects: cinder
 Importance: Undecided
 Status: New

** Affects: nova
 Importance: Undecided
 Assignee: iain MacDonnell (imacdonn)
 Status: In Progress

** Also affects: cinder
   Importance: Undecided
   Status: New

** Description changed:

  When online_data_migrations raise exceptions, nova/cinder-manage catches
  the exception, prints a fairly useless "something didn't work" message,
  and moves on. Two issues:
  
  1) The user(/admin) has no way to see what actually failed (exception is not 
logged)
- 2) The command returns exit status 0, as if all possible migrations have been 
completed successfully
+ 2) The command returns exit status 0, as if all possible migrations have been 
completed successfully - this can cause failures to get missed, especially if 
automated

** Changed in: nova
 Assignee: (unassigned) => iain MacDonnell (imacdonn)

** Changed in: nova
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1796192

Title:
  online_data_migrations exceptions quietly masked

Status in Cinder:
  New
Status in OpenStack Compute (nova):
  In Progress

Bug description:
  When online_data_migrations raise exceptions, nova/cinder-manage
  catches the exception, prints a fairly useless "something didn't work"
  message, and moves on. Two issues:

  1) The user(/admin) has no way to see what actually failed (exception is not 
logged)
  2) The command returns exit status 0, as if all possible migrations have been 
completed successfully - this can cause failures to get missed, especially if 
automated

To manage notifications about this bug go to:
https://bugs.launchpad.net/cinder/+bug/1796192/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1794364] [NEW] 'nova-manage db online_data_migrations' count fail

2018-09-25 Thread iain MacDonnell
Public bug reported:


'nova-manage db online_data_migrations' attempts to display summary counts of 
migrations "Needed" and "Completed" in a pretty table at the end, but fails to 
accumulate the totals between successive invocations of _run_migration(), and 
ends up reporting zeroes.

# nova-manage db online_data_migrations 
/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/enginefacade.py:332: 
NotSupportedWarning: Configuration option(s) ['use_tpool'] not supported
  exception.NotSupportedWarning
Running batches of 50 until complete
/usr/lib/python2.7/site-packages/pymysql/cursors.py:166: Warning: (3090, 
u"Changing sql mode 'NO_AUTO_CREATE_USER' is deprecated. It will be removed in 
a future release.")
  result = self._query(query)
2 rows matched query migrate_instances_add_request_spec, 0 migrated
13 rows matched query migrate_quota_limits_to_api_db, 13 migrated
37 rows matched query populate_uuids, 37 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
50 rows matched query populate_uuids, 50 migrated
21 rows matched query populate_uuids, 21 migrated
+-+--+---+
|  Migration  | Total Needed | Completed |
+-+--+---+
| delete_build_requests_with_no_instance_uuid |  0   | 0 |
|migrate_aggregate_reset_autoincrement|  0   | 0 |
|  migrate_aggregates |  0   | 0 |
|  migrate_instance_groups_to_api_db  |  0   | 0 |
|  migrate_instances_add_request_spec |  0   | 0 |
|  migrate_keypairs_to_api_db |  0   | 0 |
|   migrate_quota_classes_to_api_db   |  0   | 0 |
|migrate_quota_limits_to_api_db   |  0   | 0 |
|  migration_migrate_to_uuid  |  0   | 0 |
|populate_uuids   |  0   | 0 |
| service_uuids_online_data_migration |  0   | 0 |
+-+--+---+

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1794364

Title:
  'nova-manage db online_data_migrations' count fail

Status in OpenStack Compute (nova):
  New

Bug description:
  
  'nova-manage db online_data_migrations' attempts to display summary counts of 
migrations "Needed" and "Completed" in a pretty table at the end, but fails to 

[Yahoo-eng-team] [Bug 1782448] Re: memcache client super() fail

2018-07-18 Thread iain MacDonnell
Need to apply the same fix as in
https://review.openstack.org/#/c/175291/

** Project changed: keystone => keystonemiddleware

** Changed in: keystonemiddleware
 Assignee: (unassigned) => iain MacDonnell (imacdonn)

** Description changed:

+ Applies to Pike, but not later releases...
+ 
  When configured to use the memcache connection pool, clients (e.g.
  neutron-server) fail with:
  
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors Traceback 
(most recent call last):
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/oslo_middleware/catch_errors.py", line 40, in 
__call__
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors response 
= req.get_response(self.application)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/webob/request.py", line 1316, in send
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors 
application, catch_exc_info=False)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/webob/request.py", line 1280, in 
call_application
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors app_iter 
= application(self.environ, start_response)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/webob/dec.py", line 131, in __call__
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors resp = 
self.call_func(req, *args, **self.kwargs)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/webob/dec.py", line 196, in call_func
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors return 
self.func(req, *args, **kwargs)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", 
line 331, in __call__
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors response 
= self.process_request(req)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", 
line 622, in process_request
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors resp = 
super(AuthProtocol, self).process_request(request)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", 
line 404, in process_request
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors 
allow_expired=allow_expired)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", 
line 434, in _do_fetch_token
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors data = 
self.fetch_token(token, **kwargs)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", 
line 736, in fetch_token
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors cached = 
self._cache_get_hashes(token_hashes)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", 
line 719, in _cache_get_hashes
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors cached = 
self._token_cache.get(token)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_cache.py", 
line 214, in get
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors with 
self._cache_pool.reserve() as cache:
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors return 
self.gen.next()
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_cache.py", 
line 98, in reserve
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors with 
self._pool.get() as client:
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/eventlet/queue.py", line 295, in get
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors return 
self._get()
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_memcache_pool.py",
 line 149, in _get
  2018-07-18

[Yahoo-eng-team] [Bug 1782448] [NEW] memcache client super() fail

2018-07-18 Thread iain MacDonnell
4810 ERROR oslo_middleware.catch_errors conn = 
self._create_connection()
2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_memcache_pool.py",
 line 143, in _create_connection
2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors 
socket_timeout=self._socket_timeout)
2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/memcache.py", line 223, in __init__
2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors 
super(Client, self).__init__()
2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors TypeError: 
super(type, obj): obj must be an instance or subtype of type

** Affects: keystonemiddleware
 Importance: Undecided
 Assignee: iain MacDonnell (imacdonn)
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1782448

Title:
  memcache client super() fail

Status in keystonemiddleware:
  New

Bug description:
  Applies to Pike, but not later releases...

  When configured to use the memcache connection pool, clients (e.g.
  neutron-server) fail with:

  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors Traceback 
(most recent call last):
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/oslo_middleware/catch_errors.py", line 40, in 
__call__
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors response 
= req.get_response(self.application)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/webob/request.py", line 1316, in send
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors 
application, catch_exc_info=False)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/webob/request.py", line 1280, in 
call_application
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors app_iter 
= application(self.environ, start_response)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/webob/dec.py", line 131, in __call__
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors resp = 
self.call_func(req, *args, **self.kwargs)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/webob/dec.py", line 196, in call_func
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors return 
self.func(req, *args, **kwargs)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", 
line 331, in __call__
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors response 
= self.process_request(req)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", 
line 622, in process_request
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors resp = 
super(AuthProtocol, self).process_request(request)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", 
line 404, in process_request
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors 
allow_expired=allow_expired)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", 
line 434, in _do_fetch_token
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors data = 
self.fetch_token(token, **kwargs)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", 
line 736, in fetch_token
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors cached = 
self._cache_get_hashes(token_hashes)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py", 
line 719, in _cache_get_hashes
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors cached = 
self._token_cache.get(token)
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors   File 
"/usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/_cache.py", 
line 214, in get
  2018-07-18 21:00:32.568 14810 ERROR oslo_middleware.catch_errors with 
self._cache_pool.reserve() as cache:
  2018-07-18 21:00:32.568 14810 ERROR oslo_midd

[Yahoo-eng-team] [Bug 1757207] Re: compute resource providers not equal to compute nodes in deployment

2018-07-06 Thread iain MacDonnell
Shouldn't the upgrade check simply discount deleted compute_nodes ?

+---+
| Check: Resource Providers |
| Result: Warning   |
| Details: There are 34 compute resource providers and 40 compute nodes |
|   in the deployment. Ideally the number of compute resource   |
|   providers should equal the number of enabled compute nodes  |
|   otherwise the cloud may be underutilized. See   |
|   https://docs.openstack.org/nova/latest/user/placement.html  |
|   for more details.   |
+---+


mysql> select count(*) from compute_nodes;
+--+
| count(*) |
+--+
|   40 |
+--+
1 row in set (0.00 sec)


mysql> select count(*) from compute_nodes where deleted=0;
+--+
| count(*) |
+--+
|   34 |
+--+
1 row in set (0.00 sec)


** Changed in: nova
   Status: Expired => Incomplete

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1757207

Title:
  compute resource providers not equal to compute nodes in deployment

Status in OpenStack Compute (nova):
  Incomplete

Bug description:
  Description
  ===
  When I execute the command `nova-status upgrade check`,
  output:

  nova-status upgrade check
  +--+
  | Upgrade Check Results |
  +--+
  | Check: Cells v2 |
  | Result: Success |
  | Details: None |
  +--+
  | Check: Placement API |
  | Result: Success |
  | Details: None |
  +--+
  | Check: Resource Providers |
  | Result: Warning |
  | Details: There are 4 compute resource providers and 15 compute nodes |
  | in the deployment. Ideally the number of compute resource |
  | providers should equal the number of enabled compute nodes |
  | otherwise the cloud may be underutilized. See |
  | http://docs.openstack.org/developer/nova/placement.html |
  | for more details. |
  +--+

  Steps to reproduce
  ==
  How to replicate this?
  Remove the hosts from the openstack controller:
  nova hypervisor-list
  nova service-delete {id}

  Then run:
  su -s /bin/sh -c "nova-manage cell_v2 discover_hosts --verbose" nova
  The deleted compute node will be added again as a new node.
  run:
  nova-status upgrade check

  Expected result
  ===
  No warning when you run:
  nova-status upgrade check

  Actual result
  =
  You can find the warning.
  This causes issue with placement of new VM's.
  The compute host which was deleted and added again will not be considered 
during VM scheduling and placement.

  Environment
  ===
  OpenStack Pike release
  Neutron Networking which is default.

  Logs and Configs
  
  Config as the Openstack documentation.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1757207/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1756465] [NEW] Need glance v2 way to register image by location

2018-03-16 Thread iain MacDonnell
Public bug reported:

The release notes for Queens state "With the introduction of the web-
download import method, we consider the Image Service v2 API to have
reached feature parity with the DEPRECATED v1 API in all important
respects.", but v2 does NOT provide any way to register an image by
location.

Before v1 gets removed, there needs to be something similar to web-
download, but referencing the remote URL, rather than copying from it -
web-location?. This may be considered insecure on the internet, but for
a private cloud on a protected intranet, it's acceptable, and more
efficient when multiple glance instances need to make use of images that
are already published on an internal HTTP server.

See also bug 1750892, which could enable a workaround, but it's
cumbersome, and requires the location to be exposed, which could present
other security issues (even on an intranet).

** Affects: glance
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1756465

Title:
  Need glance v2 way to register image by location

Status in Glance:
  New

Bug description:
  The release notes for Queens state "With the introduction of the web-
  download import method, we consider the Image Service v2 API to have
  reached feature parity with the DEPRECATED v1 API in all important
  respects.", but v2 does NOT provide any way to register an image by
  location.

  Before v1 gets removed, there needs to be something similar to web-
  download, but referencing the remote URL, rather than copying from it
  - web-location?. This may be considered insecure on the internet, but
  for a private cloud on a protected intranet, it's acceptable, and more
  efficient when multiple glance instances need to make use of images
  that are already published on an internal HTTP server.

  See also bug 1750892, which could enable a workaround, but it's
  cumbersome, and requires the location to be exposed, which could
  present other security issues (even on an intranet).

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1756465/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1750892] [NEW] Image remains in queued status after location set via PATCH

2018-02-21 Thread iain MacDonnell
Public bug reported:

Pike release, with show_image_direct_url and show_multiple_locations
enabled.

Attempting to create an image using the HTTP backend with the glance v2
API. I create a new/blank image (goes into "queued" status), then set
the location with:

curl -g -i -X PATCH -H 'Accept-Encoding: gzip, deflate' -H 'Accept: */*'
-H 'User-Agent: imacdonn-getting-dangerous' -H 'X-Auth-Token: xxx' -H
'Content-Type: application/openstack-images-v2.1-json-patch' -d
'[{"op":"replace", "path": "/locations", "value": [{"url":
"http://my_http_server/cirros.img;, "metadata": {}}]}]'
http://my_glance_api_endpoint:9292/v2/images/e5581f14-2d05-4ae7-8d78-9da42731a37e

This results in the direct_url getting set correctly, and the size of
the image is correctly determined, but the image remains in "queued"
status. It should become "active".

** Affects: glance
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/1750892

Title:
  Image remains in queued status after location set via PATCH

Status in Glance:
  New

Bug description:
  Pike release, with show_image_direct_url and show_multiple_locations
  enabled.

  Attempting to create an image using the HTTP backend with the glance
  v2 API. I create a new/blank image (goes into "queued" status), then
  set the location with:

  curl -g -i -X PATCH -H 'Accept-Encoding: gzip, deflate' -H 'Accept:
  */*' -H 'User-Agent: imacdonn-getting-dangerous' -H 'X-Auth-Token:
  xxx' -H 'Content-Type: application/openstack-images-v2.1-json-patch'
  -d '[{"op":"replace", "path": "/locations", "value": [{"url":
  "http://my_http_server/cirros.img;, "metadata": {}}]}]'
  
http://my_glance_api_endpoint:9292/v2/images/e5581f14-2d05-4ae7-8d78-9da42731a37e

  This results in the direct_url getting set correctly, and the size of
  the image is correctly determined, but the image remains in "queued"
  status. It should become "active".

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/1750892/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1749838] [NEW] Rescheduled instance with pre-existing port fails with PortInUse exception

2018-02-15 Thread iain MacDonnell
*** This bug is a duplicate of bug 1597596 ***
https://bugs.launchpad.net/bugs/1597596

Public bug reported:

Attempting to create an instance that uses an existing neutron port,
when the instance creation fails on the first compute node, and gets
rescheduled to another compute node, the rescheduled attempt fails with
a PortInUse exception. In case it matters, I'm using neutron ML2 with
linuxbridge and the port is on a VLAN provider network.

Steps to reproduce (starting with an AZ/aggregate with two functional
compute nodes up and running):

1. Create a neutron port, and make a note of the ID (os port create --network 
XXX myport)
2. Inject a failure on the first node - e.g. by renaming the qemu binary
3. Create an instance, using the port created earlier (openstack server create 
--nic port-id=XXX --image cirros --flavor m1.tiny myvm)

The instance will fail on the first node, and get rescheduled on the
second, where it will fail with:

2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager Traceback (most recent 
call last):
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1415, in 
_allocate_network_async
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager 
bind_host_id=bind_host_id)
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 855, in 
allocate_for_instance
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager context, instance, 
neutron, requested_networks)
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 573, in 
_validate_requested_port_ids
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager raise 
exception.PortInUse(port_id=request.port_id)
2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager PortInUse: Port 
9fd24371-e906-4af2-898a-eaef223abca9 is still in use.


I've reproduced this on both Ocata and Pike. It does not seem to happen if the 
port is created by nova (i.e. openstack server create --nic net-id=XXX ...)

This looks a bit like https://bugs.launchpad.net/nova/+bug/1308405 , but
that's supposed to have been fixed long ago.

** Affects: nova
 Importance: Undecided
 Status: New


** Tags: compute neutron

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1749838

Title:
  Rescheduled instance with pre-existing port fails with PortInUse
  exception

Status in OpenStack Compute (nova):
  New

Bug description:
  Attempting to create an instance that uses an existing neutron port,
  when the instance creation fails on the first compute node, and gets
  rescheduled to another compute node, the rescheduled attempt fails
  with a PortInUse exception. In case it matters, I'm using neutron ML2
  with linuxbridge and the port is on a VLAN provider network.

  Steps to reproduce (starting with an AZ/aggregate with two functional
  compute nodes up and running):

  1. Create a neutron port, and make a note of the ID (os port create --network 
XXX myport)
  2. Inject a failure on the first node - e.g. by renaming the qemu binary
  3. Create an instance, using the port created earlier (openstack server 
create --nic port-id=XXX --image cirros --flavor m1.tiny myvm)

  The instance will fail on the first node, and get rescheduled on the
  second, where it will fail with:

  2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager Traceback (most 
recent call last):
  2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1415, in 
_allocate_network_async
  2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager 
bind_host_id=bind_host_id)
  2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 855, in 
allocate_for_instance
  2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager context, 
instance, neutron, requested_networks)
  2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager   File 
"/usr/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 573, in 
_validate_requested_port_ids
  2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager raise 
exception.PortInUse(port_id=request.port_id)
  2018-02-15 22:52:39.347 43784 ERROR nova.compute.manager PortInUse: Port 
9fd24371-e906-4af2-898a-eaef223abca9 is still in use.

  
  I've reproduced this on both Ocata and Pike. It does not seem to happen if 
the port is created by nova (i.e. openstack server create --nic net-id=XXX ...)

  This looks a bit like https://bugs.launchpad.net/nova/+bug/1308405 ,
  but that's supposed to have been fixed long ago.

To manage notifications about this bug go to:

[Yahoo-eng-team] [Bug 1683972] Re: Overlapping iSCSI volume detach/attach can leave behind broken SCSI devices and multipath maps.

2017-07-26 Thread iain MacDonnell
Resolved by Change-Id: I146a74f9f79c68a89677b9b26a324e06a35886f2

** No longer affects: nova

** Changed in: os-brick
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1683972

Title:
  Overlapping iSCSI volume detach/attach can leave behind broken SCSI
  devices and multipath maps.

Status in os-brick:
  Fix Released

Bug description:
  This is fairly easy to reproduce by simultaneously launching and
  terminating several boot-from-volume instances on the same compute
  node, with a cinder back-end that takes some time to complete
  connection-termination (e.g. ZFSSA). The initial symptom is failed
  multipath maps, and kernel errors writing to SCSI devices. Later
  symptoms include failure to launch volume-backed instances due to
  multipath command errors.

  The issue is caused by a race-condition between the unplumbing of a
  volume being detached/disconnected, and the plumbing of another volume
  being attached to a different instance.

  For example, when an instance is terminated,
  compute.manager._shutdown_instance() calls driver.destroy(), then it
  calls volume_api.terminate_connection() for the volume(s).

  driver.destroy() is responsible for cleaning up devices on the compute
  node - in my case, the libvirt driver calls disconnect_volume() in
  os_brick.initiator.connectors.iscsi, which removes the multipath map
  and SCSI device(s) assocaited with each volume.

  volume_api.terminate_connection() then instructs cinder to stop
  presenting the volume to the connector (which translates to
  disassociating a LUN from the compute node's initiator on the back-end
  storage-device (iSCSI target)).

  The problem occurs when another thread is attaching a volume to
  another instance on the same compute node at the same time. That calls
  connect_volume() in os_brick.initiator.connectors.iscsi, which does an
  iSCSI rescan. If the cinder back-end has not yet removed the
  LUN/initiator association, the rescan picks it back up, and recreates
  the SCSI device and the multipath map on the compute node. Shortly
  thereafter, that SCSI device becomes unresponsive, but it (and the
  multipath map) never go away.

  To make matters worse, the cinder back-end may use the same LUN number
  for another volume in the future, but that LUN number (plus portal
  address) is still associated with the broken SCSI device and multipath
  map on the compute node, so the wrong multipath map may be picked up
  by a future volume attachment attempt.

  There is locking around the connect_volume() and disconnect_volume()
  functions in os_brick, but this is insufficient, because it doesn't
  extend over the cinder connection termination.

  I've been able to hack around this with a rudimentary lock on the
  parts of compute.manager that deal with volume detachment and
  connection termination, and the connect_volume() function in
  virt.libvirt.volume.iscsi. That has gotten me by on Icehouse for the
  last two years. I was surprised to find that the problem is still
  present in Ocata. The same workaround seems to be effective. I'm
  fairly sure that the way I've implemented it is not completely
  correct, though, so it should be implemented properly by someone more
  intimately familiar with all of the code-paths.

To manage notifications about this bug go to:
https://bugs.launchpad.net/os-brick/+bug/1683972/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1683972] Re: Overlapping iSCSI volume detach/attach can leave behind broken SCSI devices and multipath maps.

2017-04-21 Thread iain MacDonnell
** Also affects: os-brick
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1683972

Title:
  Overlapping iSCSI volume detach/attach can leave behind broken SCSI
  devices and multipath maps.

Status in OpenStack Compute (nova):
  New
Status in os-brick:
  New

Bug description:
  This is fairly easy to reproduce by simultaneously launching and
  terminating several boot-from-volume instances on the same compute
  node, with a cinder back-end that takes some time to complete
  connection-termination (e.g. ZFSSA). The initial symptom is failed
  multipath maps, and kernel errors writing to SCSI devices. Later
  symptoms include failure to launch volume-backed instances due to
  multipath command errors.

  The issue is caused by a race-condition between the unplumbing of a
  volume being detached/disconnected, and the plumbing of another volume
  being attached to a different instance.

  For example, when an instance is terminated,
  compute.manager._shutdown_instance() calls driver.destroy(), then it
  calls volume_api.terminate_connection() for the volume(s).

  driver.destroy() is responsible for cleaning up devices on the compute
  node - in my case, the libvirt driver calls disconnect_volume() in
  os_brick.initiator.connectors.iscsi, which removes the multipath map
  and SCSI device(s) assocaited with each volume.

  volume_api.terminate_connection() then instructs cinder to stop
  presenting the volume to the connector (which translates to
  disassociating a LUN from the compute node's initiator on the back-end
  storage-device (iSCSI target)).

  The problem occurs when another thread is attaching a volume to
  another instance on the same compute node at the same time. That calls
  connect_volume() in os_brick.initiator.connectors.iscsi, which does an
  iSCSI rescan. If the cinder back-end has not yet removed the
  LUN/initiator association, the rescan picks it back up, and recreates
  the SCSI device and the multipath map on the compute node. Shortly
  thereafter, that SCSI device becomes unresponsive, but it (and the
  multipath map) never go away.

  To make matters worse, the cinder back-end may use the same LUN number
  for another volume in the future, but that LUN number (plus portal
  address) is still associated with the broken SCSI device and multipath
  map on the compute node, so the wrong multipath map may be picked up
  by a future volume attachment attempt.

  There is locking around the connect_volume() and disconnect_volume()
  functions in os_brick, but this is insufficient, because it doesn't
  extend over the cinder connection termination.

  I've been able to hack around this with a rudimentary lock on the
  parts of compute.manager that deal with volume detachment and
  connection termination, and the connect_volume() function in
  virt.libvirt.volume.iscsi. That has gotten me by on Icehouse for the
  last two years. I was surprised to find that the problem is still
  present in Ocata. The same workaround seems to be effective. I'm
  fairly sure that the way I've implemented it is not completely
  correct, though, so it should be implemented properly by someone more
  intimately familiar with all of the code-paths.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1683972/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1684326] [NEW] MTU not set on nova instance's vif_type=bridge tap interface

2017-04-19 Thread iain MacDonnell
Public bug reported:

Using linuxbridge with VLAN networks with MTU<>1500, the nova instance's
VIF's tap interface's MTU needs to get set to that of the network it's
being plugged into, otherwise the first instance on a compute node gets
a tap interface (and bridge) with MTU 1500, but the VM tries to do MTU
9000, and frames get dropped.

Sequence on first instance launch goes like:

 * os_vif creates bridge (with initial MTU 1500)
 * libvirt creates the domain, which creates the tap interface and adds it to 
the bridge. The tap interface inherits the bridge's MTU of 1500
 * The L2 agent notices that a new tap interface showed up, and ensures that 
the VLAN interface gets added to the bridge - the VLAN interface has MTU 9000 
(inherited from the physical interface), but the bridge MTU remains at 1500 - 
the lowest amongs its member ports (i.e. the tap interface)

If that instance is then destroyed, the tap interface goes away, and the
bridge updates its MTU to the lowest amongst its members, which is now
the VLAN interface - i.e. 9000. A second instance launch then picks up
the bridge's MTU of 9000 work works fine.

This was previously solved in the l2 agent under
https://bugs.launchpad.net/networking-cisco/+bug/1443607, but the
solution was reverted in
https://git.openstack.org/cgit/openstack/neutron/commit/?id=d352661c56d5f03713e615b7e0c2c9c8688e0132

Re-implementation should probably get the MTU from the neutron network,
rather than the VLAN interface.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1684326

Title:
  MTU not set on nova instance's vif_type=bridge tap interface

Status in neutron:
  New

Bug description:
  Using linuxbridge with VLAN networks with MTU<>1500, the nova
  instance's VIF's tap interface's MTU needs to get set to that of the
  network it's being plugged into, otherwise the first instance on a
  compute node gets a tap interface (and bridge) with MTU 1500, but the
  VM tries to do MTU 9000, and frames get dropped.

  Sequence on first instance launch goes like:

   * os_vif creates bridge (with initial MTU 1500)
   * libvirt creates the domain, which creates the tap interface and adds it to 
the bridge. The tap interface inherits the bridge's MTU of 1500
   * The L2 agent notices that a new tap interface showed up, and ensures that 
the VLAN interface gets added to the bridge - the VLAN interface has MTU 9000 
(inherited from the physical interface), but the bridge MTU remains at 1500 - 
the lowest amongs its member ports (i.e. the tap interface)

  If that instance is then destroyed, the tap interface goes away, and
  the bridge updates its MTU to the lowest amongst its members, which is
  now the VLAN interface - i.e. 9000. A second instance launch then
  picks up the bridge's MTU of 9000 work works fine.

  This was previously solved in the l2 agent under
  https://bugs.launchpad.net/networking-cisco/+bug/1443607, but the
  solution was reverted in
  
https://git.openstack.org/cgit/openstack/neutron/commit/?id=d352661c56d5f03713e615b7e0c2c9c8688e0132

  Re-implementation should probably get the MTU from the neutron
  network, rather than the VLAN interface.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1684326/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1683972] [NEW] Overlapping iSCSI volume detach/attach can leave behind broken SCSI devices and multipath maps.

2017-04-18 Thread iain MacDonnell
Public bug reported:

This is fairly easy to reproduce by simultaneously launching and
terminating several boot-from-volume instances on the same compute node,
with a cinder back-end that takes some time to complete connection-
termination (e.g. ZFSSA). The initial symptom is failed multipath maps,
and kernel errors writing to SCSI devices. Later symptoms include
failure to launch volume-backed instances due to multipath command
errors.

The issue is caused by a race-condition between the unplumbing of a
volume being detached/disconnected, and the plumbing of another volume
being attached to a different instance.

For example, when an instance is terminated,
compute.manager._shutdown_instance() calls driver.destroy(), then it
calls volume_api.terminate_connection() for the volume(s).

driver.destroy() is responsible for cleaning up devices on the compute
node - in my case, the libvirt driver calls disconnect_volume() in
os_brick.initiator.connectors.iscsi, which removes the multipath map and
SCSI device(s) assocaited with each volume.

volume_api.terminate_connection() then instructs cinder to stop
presenting the volume to the connector (which translates to
disassociating a LUN from the compute node's initiator on the back-end
storage-device (iSCSI target)).

The problem occurs when another thread is attaching a volume to another
instance on the same compute node at the same time. That calls
connect_volume() in os_brick.initiator.connectors.iscsi, which does an
iSCSI rescan. If the cinder back-end has not yet removed the
LUN/initiator association, the rescan picks it back up, and recreates
the SCSI device and the multipath map on the compute node. Shortly
thereafter, that SCSI device becomes unresponsive, but it (and the
multipath map) never go away.

To make matters worse, the cinder back-end may use the same LUN number
for another volume in the future, but that LUN number (plus portal
address) is still associated with the broken SCSI device and multipath
map on the compute node, so the wrong multipath map may be picked up by
a future volume attachment attempt.

There is locking around the connect_volume() and disconnect_volume()
functions in os_brick, but this is insufficient, because it doesn't
extend over the cinder connection termination.

I've been able to hack around this with a rudimentary lock on the parts
of compute.manager that deal with volume detachment and connection
termination, and the connect_volume() function in
virt.libvirt.volume.iscsi. That has gotten me by on Icehouse for the
last two years. I was surprised to find that the problem is still
present in Ocata. The same workaround seems to be effective. I'm fairly
sure that the way I've implemented it is not completely correct, though,
so it should be implemented properly by someone more intimately familiar
with all of the code-paths.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1683972

Title:
  Overlapping iSCSI volume detach/attach can leave behind broken SCSI
  devices and multipath maps.

Status in OpenStack Compute (nova):
  New

Bug description:
  This is fairly easy to reproduce by simultaneously launching and
  terminating several boot-from-volume instances on the same compute
  node, with a cinder back-end that takes some time to complete
  connection-termination (e.g. ZFSSA). The initial symptom is failed
  multipath maps, and kernel errors writing to SCSI devices. Later
  symptoms include failure to launch volume-backed instances due to
  multipath command errors.

  The issue is caused by a race-condition between the unplumbing of a
  volume being detached/disconnected, and the plumbing of another volume
  being attached to a different instance.

  For example, when an instance is terminated,
  compute.manager._shutdown_instance() calls driver.destroy(), then it
  calls volume_api.terminate_connection() for the volume(s).

  driver.destroy() is responsible for cleaning up devices on the compute
  node - in my case, the libvirt driver calls disconnect_volume() in
  os_brick.initiator.connectors.iscsi, which removes the multipath map
  and SCSI device(s) assocaited with each volume.

  volume_api.terminate_connection() then instructs cinder to stop
  presenting the volume to the connector (which translates to
  disassociating a LUN from the compute node's initiator on the back-end
  storage-device (iSCSI target)).

  The problem occurs when another thread is attaching a volume to
  another instance on the same compute node at the same time. That calls
  connect_volume() in os_brick.initiator.connectors.iscsi, which does an
  iSCSI rescan. If the cinder back-end has not yet removed the
  LUN/initiator association, the rescan picks it back up, and recreates
  the SCSI device and the multipath map on the compute node. Shortly
  

[Yahoo-eng-team] [Bug 1633249] [NEW] Boot volume creation leaves secondary volume attached to broken server

2016-10-13 Thread iain MacDonnell
Public bug reported:

Attempt to boot a server with a block device mapping that includes a
boot volume created from an image, plus an existing data volume. If the
boot-volume creation fails, the data volume is left in state "in-use",
attached to the server which is now in "error" state". The user can't
detach the volume because of the server's error state. They can delete
the server, which then leaves the volume apparently attached to a server
that no longer exists. The only way out of this is to ask an
administrator to reset the state of the data volume (this option is not
available to regular users by default policy).

The easiest way to reproduce this is to attempt to create the boot
volume from qcow2 image where the volume size is less than the image
(virtual) size.

 ~$ cinder list
+--+---+--+--+-+--+-+
| ID   | Status| Name | Size | Volume Type 
| Bootable | Attached to |
+--+---+--+--+-+--+-+
| 2e733722-8b19-4bff-bd8d-bb770554582a | available | data | 1| -   
| false| |
+--+---+--+--+-+--+-+

~$ nova boot --flavor m1.large --availability-zone=imot04-1 --block-device 
'id=9e122d18-d7a4-406d-b8f2-446cfddaa7c7,source=image,dest=volume,device=vda,size=5,bootindex=0'
 --block-device 
'id=2e733722-8b19-4bff-bd8d-bb770554582a,source=volume,dest=volume,device=vdb,size=1,bootindex=1'
 ol4
+--+--+
| Property | Value  
  |
+--+--+
| OS-DCF:diskConfig| MANUAL 
  |
| OS-EXT-AZ:availability_zone  | imot04-1   
  |
| OS-EXT-SRV-ATTR:host | -  
  |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | -  
  |
| OS-EXT-SRV-ATTR:instance_name|
  |
| OS-EXT-STS:power_state   | 0  
  |
| OS-EXT-STS:task_state| scheduling 
  |
| OS-EXT-STS:vm_state  | building   
  |
| OS-SRV-USG:launched_at   | -  
  |
| OS-SRV-USG:terminated_at | -  
  |
| accessIPv4   |
  |
| accessIPv6   |
  |
| adminPass| DNTr8MG3kVmC   
  |
| config_drive |
  |
| created  | 2016-10-13T21:54:08Z   
  |
| flavor   | m1.large (4)   
  |
| hostId   |
  |
| id   | 9541b63c-e003-4bcc-bcb8-5c0461522387   
  |
| image| Attempt to boot from volume - no image 
supplied  |
| key_name | -  
  |
| metadata | {} 
  |
| name | ol4
  |
| os-extended-volumes:volumes_attached | [{"id": 
"2e733722-8b19-4bff-bd8d-bb770554582a"}] |
| progress | 0  
  |
| security_groups  | default
  |
| status   | BUILD  
  |
| tenant_id| 66234fea2ccc42398a1ae5300c594d49   
  |
| updated  | 2016-10-13T21:54:08Z   
  |
| user_id  | b2ae6b7bdac142ddb708a3550f61d998   
  |
+--+--+

~$ cinder list
+--+--+--+--+-+--+--+
| ID   | Status   | Name | Size | Volume Type | 
Bootable | Attached to  |

[Yahoo-eng-team] [Bug 1293693] Re: libvirt OVS VLAN tag not set

2014-03-18 Thread Iain MacDonnell
Per the problem description, it is the nova VIF driver that sets the
external-ids on the OVS port. Neutron later picks up that information to
set the VLAN tag. It is nova that's not doing its part.

** Also affects: nova
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1293693

Title:
  libvirt OVS VLAN tag not set

Status in OpenStack Neutron (virtual network service):
  New
Status in OpenStack Compute (Nova):
  New

Bug description:
  Trying to use icehouse, libvirt-Xen, OpenVswitch 1.11.0, with VLAN
  tagging.

  Problem is that networking is non-functional on instance launch. 'ovs-
  vsctl show' output shows that the tap interface for the instance does
  not have the appropriate (internal) VLAN tag (no tag is set).
  Consequently, the instance is unable to obtain an IP address from
  DHCP, etc. Setting the tag manually with 'ovs-vsctl set port tapXXX
  tag=1' is a workaround (but not a very good one).

  Exploring this, I find that the neutron OVS agent scans the OVS ports
  and examines the 'external-ids' to see which ones are of interest.
  When it sees a new port that is of interest, it sets the VLAN tag as
  required. In my case, the VIF port that's added when an instance is
  launched has empty 'external-ids', and so the agent ignores it. The
  port is getting added to the OVS integration bridge by the Xen
  scripts, but the 'external-ids' are not getting set (Xen knows nothing
  about this part).

  Looking further; when nova.conf has
  'firewall_driver=nova.virt.firewall.NoopFirewallDriver', the
  LibvirtBaseVIFDriver (nova/virt/libvirt/vif.py) uses function
  plug_ovs_bridge(), which is a no-op.  When
  firewall_driver=nova.virt.libvirt.firewall.IptablesFirewallDriver, a
  different function, plug_ovs_hybrid(), is used. When OVS is older than
  version 0.9.11, a function called plug_ovs_ethernet() is used. Both
  plug_ovs_hybrid() and plug_ovs_ethernet() call
  linux_net.create_ovs_vif_port(), and that's where the 'external-ids'
  get set.

  I tried modifying plug_ovs_bridge() to call
  linux_net.create_ovs_vif_port(), but that causes the Xen hotplug
  scripts to fail (ovs-vsctl: cannot create a port named tap3ccfe10f-c4
  because a port named tap3ccfe10f-c4 already exists on bridge br-int)

  When the Noop firewall_driver is used in conjunction with newer OVS,
  something needs to set the 'external-ids' on the VIF port so that the
  neutron agent will see it and set the VLAN tag.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1293693/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1273496] [NEW] libvirt iSCSI driver sets is_block_dev=False

2014-01-27 Thread Iain MacDonnell
Public bug reported:

Trying to use iSCSI with libvirt/Xen, attaching volumes to instances was
failing. I tracked this down to the libvirt XML looking like:

disk type=block device=disk
  driver name=file type=raw cache=none/
  source 
dev=/dev/disk/by-path/ip-192.168.8.11:3260-iscsi-iqn.1986-03.com.sun:02:ecd142ab-b1c7-6bcf-8f91-f55b6c766bcc-lun-0/
  target bus=xen dev=xvdb/
  seriale8c640c6-641b-4940-88f2-79555cdd5551/serial
/disk


The driver name should be phy, not file.


More digging lead to the iSCSI volume driver in nova/virt/libvirt/volume.py, 
which does:

class LibvirtISCSIVolumeDriver(LibvirtBaseVolumeDriver):
Driver to attach Network volumes to libvirt.
def __init__(self, connection):
super(LibvirtISCSIVolumeDriver,
  self).__init__(connection, is_block_dev=False)


Surely is_block_dev should be True for iSCSI?? Changing this makes the 
problem go away - now pick_disk_driver_name() in nova/virt/libvirt/utils.py 
does the right thing and my volume attaches successfully.

Am I missing something here... ?

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1273496

Title:
  libvirt iSCSI driver sets is_block_dev=False

Status in OpenStack Compute (Nova):
  New

Bug description:
  Trying to use iSCSI with libvirt/Xen, attaching volumes to instances
  was failing. I tracked this down to the libvirt XML looking like:

  disk type=block device=disk
driver name=file type=raw cache=none/
source 
dev=/dev/disk/by-path/ip-192.168.8.11:3260-iscsi-iqn.1986-03.com.sun:02:ecd142ab-b1c7-6bcf-8f91-f55b6c766bcc-lun-0/
target bus=xen dev=xvdb/
seriale8c640c6-641b-4940-88f2-79555cdd5551/serial
  /disk

  
  The driver name should be phy, not file.

  
  More digging lead to the iSCSI volume driver in nova/virt/libvirt/volume.py, 
which does:

  class LibvirtISCSIVolumeDriver(LibvirtBaseVolumeDriver):
  Driver to attach Network volumes to libvirt.
  def __init__(self, connection):
  super(LibvirtISCSIVolumeDriver,
self).__init__(connection, is_block_dev=False)

  
  Surely is_block_dev should be True for iSCSI?? Changing this makes the 
problem go away - now pick_disk_driver_name() in nova/virt/libvirt/utils.py 
does the right thing and my volume attaches successfully.

  Am I missing something here... ?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1273496/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp