[Yahoo-eng-team] [Bug 1841514] [NEW] disk_io_limits settings are not reflected when resize using vmware driver

2019-08-26 Thread Akira KAMIO
Public bug reported:

Description
===
We found that disk_io_limits settings are not reflected when resize using 
vmware driver.

Steps to reproduce
==
* I did command resize using CLI or horizon
* then VM status VERIFY_RESIZE with no problem
* then I did command resize-confirm
* It looks like it worked

* But when I check vCenter, IOPS has not changed

Expected result
===
* IOPS settings are configured for the resized VM

Actual result
=
* IOPS settings are not configured for the resized VM

Environment
===
1. Exact version of OpenStack you are running.
* Community OpenStack Mitaka

2. Which hypervisor did you use?
* VMware

3. Which networking type did you use?
* Neutron ML2 Driver For VMWare vCenter DVS

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1841514

Title:
  disk_io_limits settings are not reflected when resize using vmware
  driver

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===
  We found that disk_io_limits settings are not reflected when resize using 
vmware driver.

  Steps to reproduce
  ==
  * I did command resize using CLI or horizon
  * then VM status VERIFY_RESIZE with no problem
  * then I did command resize-confirm
  * It looks like it worked

  * But when I check vCenter, IOPS has not changed

  Expected result
  ===
  * IOPS settings are configured for the resized VM

  Actual result
  =
  * IOPS settings are not configured for the resized VM

  Environment
  ===
  1. Exact version of OpenStack you are running.
  * Community OpenStack Mitaka

  2. Which hypervisor did you use?
  * VMware

  3. Which networking type did you use?
  * Neutron ML2 Driver For VMWare vCenter DVS

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1841514/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841509] [NEW] soft delete instance will be reclaimed if power on failed when do restore

2019-08-26 Thread zhangyujun
Public bug reported:

I found an instance disappeared after do restore instance, check the
nova code and log, I think its a logic bug here

1. restore instance with power on failed

nova-api `restore` set  `instance.task_state = task_states.RESTORING 
instance.deleted_at = None`
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/api.py#L2344

nova-compute `restore_instance`  will call `self._power_on` if virt driver did 
not implement the `restore` method
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L3009

instance state will be set to None if any exceptions raise when call 
`self._power_on` in `reverts_task_state`
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L178

finally the instnace state will be set to
{vm_state=vm_state.SOFT_DELETED, task_state=None, deleted_at=None}

2. reclaim instance

nova-compute periodic task `_reclaim_queued_deletes` running every 60s,
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L8209

it will select instance with filte `{'vm_state': vm_states.SOFT_DELETED, 
'task_state': None,'host': self.host}`,  the instance of step 1 will be slected
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L8216

and it will be in the return list of `_deleted_old_enough` with its 
`deleted_at=None`
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L8430

and then be deleted soon
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L8229

I don't think the instance should be reclaimed with the above situation

** Affects: nova
 Importance: Undecided
 Assignee: zhangyujun (zhangyujun)
 Status: New

** Changed in: nova
 Assignee: (unassigned) => zhangyujun (zhangyujun)

** Description changed:

  I found an instance disappeared after do restore instance, check the
  nova code and log, I think its a logic bug here
  
  1. restore instance with power on failed
  
  nova-api `restore` set  `instance.task_state = task_states.RESTORING 
instance.deleted_at = None`
  
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/api.py#L2344
  
  nova-compute `restore_instance`  will call `self._power_on` if virt driver 
did not implement the `restore` method
  
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L3009
  
  instance state will be set to None if any exceptions raise when call 
`self._power_on` in `reverts_task_state`
  
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L178
  
  finally the instnace state will be set to
- {vm_state=vm_state.SOFT_DELETED, task_state=None, deleted=None}
+ {vm_state=vm_state.SOFT_DELETED, task_state=None, deleted_at=None}
  
  2. reclaim instance
  
  nova-compute periodic task `_reclaim_queued_deletes` running every 60s,
  
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L8209
  
  it will select instance with filte `{'vm_state': vm_states.SOFT_DELETED, 
'task_state': None,'host': self.host}`,  the instance of step 1 will be slected
  
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L8216
  
  and it will be in the return list of `_deleted_old_enough` with its 
`deleted_at=None`
  
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L8430
  
  and then be deleted soon
  
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L8229
  
  I don't think the instance should be reclaimed with the above situation

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1841509

Title:
  soft delete instance will be reclaimed if power on failed when do
  restore

Status in OpenStack Compute (nova):
  New

Bug description:
  I found an instance disappeared after do restore instance, check the
  nova code and log, I think its a logic bug here

  1. restore instance with power on failed

  nova-api `restore` set  `instance.task_state = task_states.RESTORING 
instance.deleted_at = None`
  
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/api.py#L2344

  nova-compute `restore_instance`  will call `self._power_on` if virt driver 
did not implement the `restore` method
  
https://github.com/openstack/nova/blob/4b8b4217fed897755f742afcb42f7994aea4c9a1/nova/compute/manager.py#L3009

  instance state will be set to None if any exceptions raise when call 
`self._power_on` in 

[Yahoo-eng-team] [Bug 1805569] Re: Report early when security group doesn't belong to current tenant

2019-08-26 Thread Launchpad Bug Tracker
[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]

** Changed in: nova
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1805569

Title:
  Report early when security group doesn't belong to current tenant

Status in OpenStack Compute (nova):
  Expired

Bug description:
  see this error in compute node ,actually it should be in api layer.
  the get API in neutron/secuirty_group should be used to validate

  
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0]   File 
"/opt/stack/nova/nova/network/model.py", line 583, in wait
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0] self[:] = 
self._gt.wait()
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0]   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 180, in 
wai
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0] return 
self._exit_event.wait()
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0]   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 132, in wait
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0] 
current.throw(*self._exc)
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0]   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 219, in 
mai
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0] result = 
function(*args, **kwargs)
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0]   File 
"/opt/stack/nova/nova/utils.py", line 799, in context_wrapper
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0] return func(*args, 
**kwargs)
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0]   File 
"/opt/stack/nova/nova/compute/manager.py", line 1510, in _allocate_network_async
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0] 
six.reraise(*exc_info)
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0]   File 
"/opt/stack/nova/nova/compute/manager.py", line 1493, in _allocate_network_async
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0] 
bind_host_id=bind_host_id)
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0]   File 
"/opt/stack/nova/nova/network/neutronv2/api.py", line 1025, in 
allocate_for_instan
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0] instance, neutron, 
security_groups)
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0]   File 
"/opt/stack/nova/nova/network/neutronv2/api.py", line 812, in 
_process_security_gr
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0] 
security_group_id=security_group)
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0] SecurityGroupNotFound: 
Security group 15082515-3535-4304-84c0-a00b7c7ae376 not found.
  ager [instance: 026512be-8a6e-4e82-8f88-3a9260f350a0]

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1805569/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1830349] Re: Router external gateway wrongly marked as DOWN

2019-08-26 Thread Launchpad Bug Tracker
[Expired for neutron because there has been no activity for 60 days.]

** Changed in: neutron
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1830349

Title:
  Router external gateway wrongly marked as DOWN

Status in neutron:
  Expired

Bug description:
  neutron version: 2:8.4.0-0ubuntu7.3~cloud0
  openstack version: cloud:trusty-mitaka

  In bootstack a customer had a non-ha router.
  After updating the router to HA mode,
  it is external gateway is wrongly marked as Down,
  but we can see traffic going through the interface:

  openstack router show  7d7a37e0-33f3-474f-adbf-ab27033c6bc8
  
+-+-+
  | Field   | Value 




  |
  
+-+-+
  | admin_state_up  | UP




  |
  | availability_zone_hints |   




  |
  | availability_zones  | nova  




  |
  | created_at  | None  




  |
  | description |   




  |
  | distributed | False 




  |
  | external_gateway_info   | {"enable_snat": true, "external_fixed_ips": 
[{"subnet_id": "dbfee73f-7094-4596-a79c-e05c2ce7d738", "ip_address": 
"185.170.7.198"}], "network_id": "43c6a5c6-d44c-43d9-a0e9-1c0311b41626"} 

[Yahoo-eng-team] [Bug 1841486] [NEW] federation mapping debug has useless direct_maps information

2019-08-26 Thread John Dennis
Public bug reported:

If you use keystone-manage mapping_engine --engine-debug to test your
rules (or when debug logging is on during run time) the diagnostic
output fails to emit a piece of crucial information, the contents direct
map array. What you'll get instead is this:

direct_maps: 

That's because the DirectMaps class does not have a __str__() method and
Python resorts to __ref__() in the absence of __str__() and all
__ref__() does is print the class name and it's memory location, not
very useful.

If DirectMaps had a __str__() function like this:

def __str__(self):
return '%s' % self._matches


the debug output would include the actual direct map data like this:

direct_maps: [['j...@example.com'], ['Group1', 'Group3']]

** Affects: keystone
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Identity (keystone).
https://bugs.launchpad.net/bugs/1841486

Title:
  federation mapping debug has useless direct_maps information

Status in OpenStack Identity (keystone):
  New

Bug description:
  If you use keystone-manage mapping_engine --engine-debug to test your
  rules (or when debug logging is on during run time) the diagnostic
  output fails to emit a piece of crucial information, the contents
  direct map array. What you'll get instead is this:

  direct_maps: 

  That's because the DirectMaps class does not have a __str__() method
  and Python resorts to __ref__() in the absence of __str__() and all
  __ref__() does is print the class name and it's memory location, not
  very useful.

  If DirectMaps had a __str__() function like this:

  def __str__(self):
  return '%s' % self._matches

  
  the debug output would include the actual direct map data like this:

  direct_maps: [['j...@example.com'], ['Group1', 'Group3']]

To manage notifications about this bug go to:
https://bugs.launchpad.net/keystone/+bug/1841486/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841481] [NEW] Race during ironic re-balance corrupts local RT ProviderTree and compute_nodes cache

2019-08-26 Thread Matt Riedemann
Public bug reported:

Seen with an ironic re-balance in this job:

https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check
/ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode/92c65ac/

On the subnode we see the RT detect that the node is moving hosts:

Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova-
compute[747]: INFO nova.compute.resource_tracker [None req-a894abee-
a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42
-b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to
ubuntu-bionic-rax-ord-0010443319

On that new host, the ProviderTree cache is getting updated with
refreshed associations for inventory:

Aug 26 18:41:38.881026 ubuntu-bionic-rax-ord-0010443319 nova-
compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee-
a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing inventories for
resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f {{(pid=747)
_refresh_associations
/opt/stack/nova/nova/scheduler/client/report.py:761}}

aggregates:

Aug 26 18:41:38.953685 ubuntu-bionic-rax-ord-0010443319 nova-
compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee-
a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing aggregate associations
for resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f, aggregates:
None {{(pid=747) _refresh_associations
/opt/stack/nova/nova/scheduler/client/report.py:770}}

and traits - but when we get traits the provider is gone:

Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None 
None] Error updating resources for node 61dbc9c7-828b-4c42-b19c-a3716037965f.: 
ResourceProviderTraitRetrievalFailed: Failed to get traits for resource 
provider with UUID 61dbc9c7-828b-4c42-b19c-a3716037965f
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager Traceback (most recent call last):
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File "/opt/stack/nova/nova/compute/manager.py", 
line 8250, in _update_available_resource_for_node
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager startup=startup)
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 715, in 
update_available_resource
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager self._update_available_resource(context, 
resources, startup=startup)
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 
328, in inner
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager return f(*args, **kwargs)
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 738, in 
_update_available_resource
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager is_new_compute_node = 
self._init_compute_node(context, resources)
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 561, in 
_init_compute_node
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager if self._check_for_nodes_rebalance(context, 
resources, nodename):
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 516, in 
_check_for_nodes_rebalance
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager self._update(context, cn)
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/opt/stack/nova/nova/compute/resource_tracker.py", line 1054, in _update
Aug 26 18:41:38.995595 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager self._update_to_placement(context, compute_node, 
startup)
Aug 26 18:41:38.996935 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/retrying.py", line 49, in wrapped_f
Aug 26 18:41:38.996935 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, 
**kw)
Aug 26 18:41:38.996935 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: 
ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/retrying.py", line 206, in call

[Yahoo-eng-team] [Bug 1841476] [NEW] Spurious ComputeHostNotFound warnings in nova-compute logs during ironic node re-balance

2019-08-26 Thread Matt Riedemann
Public bug reported:

Seen here:

https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check
/ironic-tempest-ipa-wholedisk-direct-tinyipa-
multinode/92c65ac/compute1/logs/screen-n-cpu.txt.gz

We see a warning that a compute node could not be found by host and node
but then later is found just by nodename and is moving to the current
host:

Aug 26 18:41:38.800657 ubuntu-bionic-rax-ord-0010443319 nova-
compute[747]: WARNING nova.compute.resource_tracker [None req-a894abee-
a2f1-4423-8ede-2a1b9eef28a4 None None] No compute node record for
ubuntu-bionic-rax-ord-0010443319:61dbc9c7-828b-4c42-b19c-a3716037965f:
ComputeHostNotFound_Remote: Compute host ubuntu-bionic-rax-
ord-0010443319 could not be found.

Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova-
compute[747]: INFO nova.compute.resource_tracker [None req-a894abee-
a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42
-b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to
ubuntu-bionic-rax-ord-0010443319

The warning comes from this call:

https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L554

And the re-balance is found here:

https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L561

The warning is then a red herring. We could:

1. add something to the warning message saying this could be due to a
re-balance but that might be confusing for non-ironic computes

and/or

2. check if self.driver.rebalances_nodes and if True, change the warning
to an info level message (and potentially modify the message with the
re-balance wording in #1 above).

** Affects: nova
 Importance: Low
 Status: Triaged


** Tags: ironic resource-tracker serviceability

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1841476

Title:
  Spurious ComputeHostNotFound warnings in nova-compute logs during
  ironic node re-balance

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Seen here:

  
https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check
  /ironic-tempest-ipa-wholedisk-direct-tinyipa-
  multinode/92c65ac/compute1/logs/screen-n-cpu.txt.gz

  We see a warning that a compute node could not be found by host and
  node but then later is found just by nodename and is moving to the
  current host:

  Aug 26 18:41:38.800657 ubuntu-bionic-rax-ord-0010443319 nova-
  compute[747]: WARNING nova.compute.resource_tracker [None req-
  a894abee-a2f1-4423-8ede-2a1b9eef28a4 None None] No compute node record
  for ubuntu-bionic-rax-ord-0010443319:61dbc9c7-828b-4c42-b19c-
  a3716037965f: ComputeHostNotFound_Remote: Compute host ubuntu-bionic-
  rax-ord-0010443319 could not be found.

  Aug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova-
  compute[747]: INFO nova.compute.resource_tracker [None req-a894abee-
  a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42
  -b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to
  ubuntu-bionic-rax-ord-0010443319

  The warning comes from this call:

  
https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L554

  And the re-balance is found here:

  
https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L561

  The warning is then a red herring. We could:

  1. add something to the warning message saying this could be due to a
  re-balance but that might be confusing for non-ironic computes

  and/or

  2. check if self.driver.rebalances_nodes and if True, change the
  warning to an info level message (and potentially modify the message
  with the re-balance wording in #1 above).

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1841476/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841400] Re: nonexistent hacking rules descriptions in HACKING.rst

2019-08-26 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/678462
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=97a5f0e216ad02bd9ba805436a1b553c3dacf6d2
Submitter: Zuul
Branch:master

commit 97a5f0e216ad02bd9ba805436a1b553c3dacf6d2
Author: Takashi NATSUME 
Date:   Mon Aug 26 13:19:08 2019 +0900

Remove descriptions of nonexistent hacking rules

N321, N328, N329, N330 hacking rules have been removed
since I9c334162fe1799e7b24563fdc11256b91bbafc9f.
However the descriptions are still in HACKING.rst.
So remove them.
The rule number N307 is missing in HACKING.rst.
So add it.

Change-Id: I868c421a0f5a3329ab36f786f8519accae623f1a
Closes-Bug: #1841400


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1841400

Title:
  nonexistent hacking rules descriptions in HACKING.rst

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  N321, N328, N329, N330 hacking rules have been removed, but the descriptions 
are still in HACKING.rst.
  The rule number N307 is missing in HACKING.rst.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1841400/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1838793] Re: "KeepalivedManagerTestCase" tests failing during namespace deletion

2019-08-26 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/674820
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=be7bb4d0f584a05d3e2725f1179ffaed6e8f449d
Submitter: Zuul
Branch:master

commit be7bb4d0f584a05d3e2725f1179ffaed6e8f449d
Author: Rodolfo Alonso Hernandez 
Date:   Mon Aug 5 15:03:27 2019 +

Kill all processes running in a namespace before deletion

In "NamespaceFixture", before deleting the namespace, this patch
introduces a check to first kill all processes running on it.

Closes-Bug: #1838793

Change-Id: I27f3db33f2e7ab685523fd2d6922177d7c9cb71b


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1838793

Title:
  "KeepalivedManagerTestCase" tests failing during namespace deletion

Status in neutron:
  Fix Released

Bug description:
  During the execution of those two test cases 
(test_keepalived_spawns_conflicting_pid_base_process, 
  test_keepalived_spawns_conflicting_pid_vrrp_subprocess), sometimes the 
namespace fixture fails during the deletion.

  Logstash information:
  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22fixtures._fixtures.timeout.TimeoutException%5C%22%20AND%20%20project%3A%5C%22openstack%2Fneutron%5C%22

  Example: http://logs.openstack.org/50/670850/3/check/neutron-
  functional-python27/1d27dda/testr_results.html.gz

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1838793/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841253] Re: "FdbInterfaceTestCase" fails if VXLAN interface is created (no-namespace cases)

2019-08-26 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/678275
Committed: 
https://git.openstack.org/cgit/openstack/neutron/commit/?id=d3359a2bc6c8fd6dbb068bf7f373cbc2922f1173
Submitter: Zuul
Branch:master

commit d3359a2bc6c8fd6dbb068bf7f373cbc2922f1173
Author: Rodolfo Alonso Hernandez 
Date:   Fri Aug 23 17:31:51 2019 +

Force deletion of interfaces to create in "FdbInterfaceTestCase"

In the no-namespace test cases, sometimes the interfaces to be created
exist in the kernel namespace. To avoid this possible problem, we first
force the deletion of those interfaces.

Change-Id: I9eba21d872263665481303fbab1ee3ec9bdaa044
Closes-Bug: #1841253


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1841253

Title:
  "FdbInterfaceTestCase" fails if VXLAN interface is created (no-
  namespace cases)

Status in neutron:
  Fix Released

Bug description:
  Occasionally, in the no-namespace test cases, the interfaces to be
  used, created in the kernel namespace, are already created. Just in
  case, to avoid problems like in [1], we should force before the
  deletion of the interfaces we are going to create.

  ft1.3: 
neutron.tests.functional.agent.linux.test_bridge_lib.FdbInterfaceTestCase.test_add_delete(no_namespace)testtools.testresult.real._StringException:
 traceback-1: {{{
  Traceback (most recent call last):
File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/linux/test_bridge_lib.py",
 line 134, in _cleanup
  priv_ip_lib.delete_interface(self.device_vxlan, None)
File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/oslo_privsep/priv_context.py",
 line 242, in _wrap
  return self.channel.remote_call(name, args, kwargs)
File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/oslo_privsep/daemon.py",
 line 204, in remote_call
  raise exc_type(*result[2])
  neutron.privileged.agent.linux.ip_lib.NetworkInterfaceNotFound: Network 
interface vxlan_bec4e81a- not found in namespace None.
  }}}

  Traceback (most recent call last):
File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/linux/test_bridge_lib.py",
 line 122, in setUp
  ip_wrapper.add_vxlan(self.device_vxlan, 100, dev=self.device)
File 
"/home/zuul/src/opendev.org/openstack/neutron/neutron/agent/linux/ip_lib.py", 
line 296, in add_vxlan
  privileged.create_interface(name, self.namespace, "vxlan", **kwargs)
File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/oslo_privsep/priv_context.py",
 line 242, in _wrap
  return self.channel.remote_call(name, args, kwargs)
File 
"/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.6/site-packages/oslo_privsep/daemon.py",
 line 204, in remote_call
  raise exc_type(*result[2])
  neutron.privileged.agent.linux.ip_lib.InterfaceAlreadyExists: Interface 
vxlan_bec4e81a- already exists.

  
  [1] 
https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_34/674434/10/check/neutron-functional/474856f/testr_results.html.gz

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1841253/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841466] [NEW] ds-identify fails to detect NoCloud datastore with LABEL_FATBOOT instead of LABEL (change introduced recently in util-linux-2.33-rc1)

2019-08-26 Thread Hans Dampf
Public bug reported:

Original bug report with detailed description was created for Xen
Orchestra here: https://github.com/vatesfr/xen-orchestra/issues/4449

Brief description:

On systems with util-linux-2.33-rc1 or younger (e.g. Debian 10 Buster),
ds-identify fails to detect when a disk of a NoCloud datasource has a
label written to the boot sector of the disk. Before util-
linux-2.33-rc1, blkid showed "LABEL=cidata". With the change, blkid
shows "LABEL_FATBOOT=cidata" (newly introduced / additional label).

Longer description:

I ran into this when using cloud-init together with Xen Orchestra v5.48
(Xen Orchestra is a management interface for xen; in my case xcp-ng
v8.0.0). I created a VM template based on the recently released Debian
10.0 Buster, which uses util-linux 2.33.1. Upon boot, ds-identify fails
to detect the NoCloud datasource / virtual disk which Xen Orchestra
generated (the disk is created with this code from
https://github.com/natevw/fatfs. With an older Debian 8 (util-
linux-2.25.0) based template, ds-identify detects the NoCloud datasource
disk fine.

Likely explanation:

Xen Orchestra creates the NoCloud as a partition-less disk with a FAT16
filesystem which has the NoCloud user-data and meta-data files. The
label "cidata" is written into the boot sector of the virtual disk. With
the same disk, oder versions of blkid report "LABEL=cidata" whereas
newer versions detect "LABEL_FATBOOT=cidata". The ds-identify shell
script checks only for the presence of the field called "LABEL" and not
for "LABEL_FATBOOT".

Relevant commit message from the util-linux-2.33-rc1 changelog (commit
f0ca7e80d7a171701d0d04a3eae22d97f15d0683):

libblkid: vfat: Change parsing label in special cases

* Use only label from the root directory and do not fallback to the
label stored in boot sector. This is how MS-DOS 6.22, MS-DOS 7.10,
Windows 98, Windows XP and also Windows 10 behave. Moreover Windows XP
and Windows 10 do not touch label in boot sector anymore, so removing
FAT label on those Windowses leads to having old label still stored in
boot sector (which MS-DOS and Windows fully ignore).

* Label entry "NO NAME" in root directory is treated as label "NO NAME"
instead of empty label. In root directory it has no special meaning.
String "NO NAME" has a special meaning (empty label) only for label
stored in boot sector.

 * Label from the boot sector is now stored into LABEL_FATBOOT field. So
if there are applications which depends or needs to read this label,
they have ability.

* After this change LABEL always correspondent to the label from the
root directory and LABEL_FATBOOT to the label stored in the boot sector.
If some of those labels is missing or is not present (e.g. "NO LABEL" in
boot sector) then particular field is not set.

Possible fix:

I did a trivial change of 2 lines to ds-identify to check for
LABEL_FATBOOT after the check for LABEL. For me this solves the problem,
as in: the cloud-init enabled VM boots up, ds-identify finds
"LABEL_FATBOOT=cidata" and cloud-init correctly executes. In cases where
both labels are written, the latter over-writes the former, which could
be a theoretical problem if the values differ, but I am not sure how
likely this case is.

Further debug information as requested by @rharper on IRC:

- cloud-init.tar.gz (Debian 10 / ds-identify fail)

- Debian version:

debian@cloudbuster:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:Debian GNU/Linux 10 (buster)
Release:10
Codename:   buster

- util-linux version:

debian@cloudbuster:~$ sudo blkid -V
blkid from util-linux 2.33.1  (libblkid 2.33.1, 09-Jan-2019)

- blkid output:

debian@cloudbuster:~$ sudo blkid /dev/xvdb
/dev/xvdb: SEC_TYPE="msdos" LABEL_FATBOOT="cidata" UUID="355A-4FC2" TYPE="vfat"

- udevadm outout:
debian@cloudbuster:~$ udevadm info --query=all /sys/class/block/xvdb
P: /devices/vbd-832/block/xvdb
N: xvdb
L: 0
S: disk/by-uuid/355A-4FC2
E: DEVPATH=/devices/vbd-832/block/xvdb
E: DEVNAME=/dev/xvdb
E: DEVTYPE=disk
E: MAJOR=202
E: MINOR=16
E: SUBSYSTEM=block
E: USEC_INITIALIZED=4239917
E: ID_FS_UUID=355A-4FC2
E: ID_FS_UUID_ENC=355A-4FC2
E: ID_FS_VERSION=FAT16
E: ID_FS_TYPE=vfat
E: ID_FS_USAGE=filesystem
E: DEVLINKS=/dev/disk/by-uuid/355A-4FC2
E: TAGS=:systemd:

# Some experiments:

- This is interesting - dosfslabel incorrectly reports the label, while
blkid (above) clearly shows the field is empty / not set:

debian@cloudbuster:~$ sudo dosfslabel /dev/xvdb
cidata

- Here I am first setting the label with dosfslabel to see what happens
and then check blkid again:

debian@cloudbuster:~$ sudo dosfslabel /dev/xvdb cidata
fatlabel: warning - lowercase labels might not work properly with DOS or Windows

debian@cloudbuster:~$ sudo blkid /dev/xvdb
/dev/xvdb: SEC_TYPE="msdos" LABEL_FATBOOT="cidata" LABEL="cidata" 
UUID="355A-4FC2" TYPE="vfat"
# Now blkid reports both labels

** Affects: cloud-init
 Importance: Undecided
 Status: New

** Attachment added: 

[Yahoo-eng-team] [Bug 1560961] Re: [RFE] Allow instance-ingress bandwidth limiting

2019-08-26 Thread Corey Bryant
** No longer affects: cloud-archive

** No longer affects: cloud-archive/mitaka

** No longer affects: cloud-archive/ocata

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1560961

Title:
  [RFE] Allow instance-ingress bandwidth limiting

Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New
Status in neutron source package in Xenial:
  New

Bug description:
  The current implementation of bandwidth limiting rules only supports egress 
bandwidth
  limiting.

  Use cases
  =
  There are cases where ingress bandwidth limiting is more important than
  egress limiting, for example when the workload of the cloud is mostly a 
consumer of data (crawlers, datamining, etc), and administrators need to ensure 
other workloads won't be affected.

  Other example are CSPs which need to plan & allocate the bandwidth
  provided to customers, or provide different levels of network service.

  API/Model impact
  ===
  The BandwidthLimiting rules will be added a direction field (egress/ingress), 
which by default will be egress to match the current behaviour and, therefore
  be backward compatible.

  Combining egress/ingress would be achieved by including an egress
  bandwidth limit and an ingress bandwidth limit.

  Additional information
  ==
  The CLI and SDK modifications are addressed in 
https://bugs.launchpad.net/python-openstackclient/+bug/1614121

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1560961/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1560961] Re: [RFE] Allow instance-ingress bandwidth limiting

2019-08-26 Thread Corey Bryant
** Also affects: cloud-archive/mitaka
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/ocata
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1560961

Title:
  [RFE] Allow instance-ingress bandwidth limiting

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive mitaka series:
  New
Status in Ubuntu Cloud Archive ocata series:
  New
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New
Status in neutron source package in Xenial:
  New

Bug description:
  The current implementation of bandwidth limiting rules only supports egress 
bandwidth
  limiting.

  Use cases
  =
  There are cases where ingress bandwidth limiting is more important than
  egress limiting, for example when the workload of the cloud is mostly a 
consumer of data (crawlers, datamining, etc), and administrators need to ensure 
other workloads won't be affected.

  Other example are CSPs which need to plan & allocate the bandwidth
  provided to customers, or provide different levels of network service.

  API/Model impact
  ===
  The BandwidthLimiting rules will be added a direction field (egress/ingress), 
which by default will be egress to match the current behaviour and, therefore
  be backward compatible.

  Combining egress/ingress would be achieved by including an egress
  bandwidth limit and an ingress bandwidth limit.

  Additional information
  ==
  The CLI and SDK modifications are addressed in 
https://bugs.launchpad.net/python-openstackclient/+bug/1614121

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1560961/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1560961] Re: [RFE] Allow instance-ingress bandwidth limiting

2019-08-26 Thread Jorge Niedbalski
** Also affects: neutron (Ubuntu)
   Importance: Undecided
   Status: New

** Also affects: neutron (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1560961

Title:
  [RFE] Allow instance-ingress bandwidth limiting

Status in Ubuntu Cloud Archive:
  New
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  New
Status in neutron source package in Xenial:
  New

Bug description:
  The current implementation of bandwidth limiting rules only supports egress 
bandwidth
  limiting.

  Use cases
  =
  There are cases where ingress bandwidth limiting is more important than
  egress limiting, for example when the workload of the cloud is mostly a 
consumer of data (crawlers, datamining, etc), and administrators need to ensure 
other workloads won't be affected.

  Other example are CSPs which need to plan & allocate the bandwidth
  provided to customers, or provide different levels of network service.

  API/Model impact
  ===
  The BandwidthLimiting rules will be added a direction field (egress/ingress), 
which by default will be egress to match the current behaviour and, therefore
  be backward compatible.

  Combining egress/ingress would be achieved by including an egress
  bandwidth limit and an ingress bandwidth limit.

  Additional information
  ==
  The CLI and SDK modifications are addressed in 
https://bugs.launchpad.net/python-openstackclient/+bug/1614121

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1560961/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841454] [NEW] Exoscale datasource overwrites *all* cloud_config_modules

2019-08-26 Thread Chris Glass
Public bug reported:

While testing the Exoscale datasource for its inclusion in a SRU, it was
discovered that a cloud_config_module didn't work.

Passing user data such as:
https://gist.github.com/chrisglass/fb0cf860be8cf01f456dfff8e162e004
results in the "runcmd" stanza not to be executed.

(feel free to get in touch should you like to play with an instance
displaying the problem on Eoan)

Hypothesis:

The merge of the datasource's extra_config field 
(https://git.launchpad.net/cloud-init/tree/cloudinit/sources/DataSourceExoscale.py#n124)
 is erroneous: instead of *overwriting* the cloud_config_modules entry from the 
cloud.cfg file/user data, the cloud_config_modules should be *merged*. 
An additional difficulty being that we insert a two-elements list 
(["set-passwords", "always"]) and it needs to be merge with a list containing 
just "set-passwords".

** Affects: cloud-init
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1841454

Title:
  Exoscale datasource overwrites *all* cloud_config_modules

Status in cloud-init:
  New

Bug description:
  While testing the Exoscale datasource for its inclusion in a SRU, it
  was discovered that a cloud_config_module didn't work.

  Passing user data such as:
  https://gist.github.com/chrisglass/fb0cf860be8cf01f456dfff8e162e004
  results in the "runcmd" stanza not to be executed.

  (feel free to get in touch should you like to play with an instance
  displaying the problem on Eoan)

  Hypothesis:

  The merge of the datasource's extra_config field 
(https://git.launchpad.net/cloud-init/tree/cloudinit/sources/DataSourceExoscale.py#n124)
 is erroneous: instead of *overwriting* the cloud_config_modules entry from the 
cloud.cfg file/user data, the cloud_config_modules should be *merged*. 
  An additional difficulty being that we insert a two-elements list 
(["set-passwords", "always"]) and it needs to be merge with a list containing 
just "set-passwords".

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1841454/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1833902] Re: Revert resize tests are failing in jobs with iptables_hybrid fw driver

2019-08-26 Thread Matt Riedemann
** No longer affects: neutron

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1833902

Title:
  Revert resize tests are failing in jobs with iptables_hybrid fw driver

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Tests:

  
tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_resize_server_revert_deleted_flavor
  
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert
  
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert_with_volume_attached

  are failing 100% times since last ~2 days.
  And it happens only in jobs with iptables_hybrid fw driver but I don't know 
if this is really some source of issue or maybe just red herring.

  Logstash query:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_resize_server_revert_deleted_flavor%5C%22%20AND%20message%3A%5C%22FAILED%5C%22

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1833902/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1840978] Re: nova-manage commands with unexpected errors returning 1 conflict with expected cases of 1 for flow control

2019-08-26 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/677832
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=df2845308dd32e1abd0b75a70f6997b1e4698745
Submitter: Zuul
Branch:master

commit df2845308dd32e1abd0b75a70f6997b1e4698745
Author: Matt Riedemann 
Date:   Wed Aug 21 17:03:11 2019 -0400

Change nova-manage unexpected error return code to 255

If any nova-manage command fails in an unexpected way and
it bubbles back up to main() the return code will be 1.
There are some commands like archive_deleted_rows,
map_instances and heal_allocations which return 1 for flow
control with automation systems. As a result, those tools
could be calling the command repeatedly getting rc=1 thinking
there is more work to do when really something is failing.

This change makes the unexpected error code 255, updates the
relevant nova-manage command docs that already mention return
codes in some kind of list/table format, and adds an upgrade
release note just to cover our bases in case someone was for
some weird reason relying on 1 specifically for failures rather
than anything greater than 0.

Change-Id: I2937c9ef00f1d1699427f9904cb86fe2f03d9205
Closes-Bug: #1840978


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1840978

Title:
  nova-manage commands with unexpected errors returning 1 conflict with
  expected cases of 1 for flow control

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  The archive_deleted_rows command returns 1 meaning some records were
  archived and the code documents that if automating and not using
  --until-complete, you should keep going while you get rc=1 until you
  get rc=0:

  
https://github.com/openstack/nova/blob/0bf81cfe73340ba5cfd9cf44a38905014ba780f0/nova/cmd/manage.py#L505

  The problem is if some unexpected error happens, let's say there is a
  TypeError in the code or something, the command will also return 1:

  
https://github.com/openstack/nova/blob/0bf81cfe73340ba5cfd9cf44a38905014ba780f0/nova/cmd/manage.py#L2625

  That unexpected error should probably be a 255 which generally means a
  command failed in some unexpected way. There might be other nova-
  manage commands that return 1 for flow control as well.

  Note that changing the "unexpected error" code from 1 to 255 is an
  upgrade impacting change worth a release note.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1840978/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1834875] Re: cloud-init growpart race with udev

2019-08-26 Thread Scott Moser
** Also affects: cloud-utils
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1834875

Title:
  cloud-init growpart race with udev

Status in cloud-init:
  Incomplete
Status in cloud-utils:
  New
Status in systemd package in Ubuntu:
  New

Bug description:
  On Azure, it happens regularly (20-30%), that cloud-init's growpart
  module fails to extend the partition to full size.

  Such as in this example:

  

  2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', 
'--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, 
capture=True)
  2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', 
'/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
  2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds
  2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: 
init-network/config-growpart: FAIL: running config-growpart with frequency 
always
  2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart () failed
  2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart () failed
  Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in 
_run_modules
  freq=freq)
File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
  return self._runners.run(name, functor, args, freq, clear_on_fail)
File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
  results = functor(*args)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
351, in handle
  func=resize_devices, args=(resizer, devices))
File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in 
log_time
  ret = func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
298, in resize_devices
  (old, new) = resizer.resize(disk, ptnum, blockdev)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
159, in resize
  return (before, get_size(partdev))
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
198, in get_size
  fd = os.open(filename, os.O_RDONLY)
  FileNotFoundError: [Errno 2] No such file or directory: 
'/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3'

  

  @rcj suggested this is a race with udev. This seems to only happen on
  Cosmic and later.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1778207] Re: fwaas v2 add port into firewall group failed

2019-08-26 Thread Dr. Jens Harbott
*** This bug is a duplicate of bug 1762454 ***
https://bugs.launchpad.net/bugs/1762454

** This bug has been marked a duplicate of bug 1762454
   FWaaS: Invalid port error on associating ports (distributed router) to 
firewall group

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1778207

Title:
  fwaas v2 add port into firewall group failed

Status in neutron:
  Confirmed

Bug description:
  Hey, stackers. There are some errors when I added router ports with
  DVR/HA mode into a fwaasv2 firewall group.

  The error msg was that:

  Error: Failed to update firewallgroup 3c8dbcab-
  0cfb-4189-bd60-dc4b40a346a4: Port 002c3fff-5b00-42b5-83ab-6413afc083c4
  of firewall group is invalid. Neutron server returns request_ids:
  ['req-da8b946c-aa69-456f-b1d3-d956eff49110']

  My router HA interface:

  Device Owner
  network:router_ha_interface
  Device ID
  a804ad96-42c4-437b-a945-9ecc4cdef34c

  And I traced the related source code about how to validate the port for 
firewall group
  
https://github.com/openstack/neutron-fwaas/blob/9346ced4b0f90e1c7acf855ac9db76ed960510e6/neutron_fwaas/services/firewall/fwaas_plugin_v2.py#L147

  I found that there is not any condition to determine whether the
  router is in DVR/HA mode or not. Therefore, maybe we have to update
  this code snippet https://github.com/openstack/neutron-
  
fwaas/blob/9346ced4b0f90e1c7acf855ac9db76ed960510e6/neutron_fwaas/services/firewall/fwaas_plugin_v2.py#L147

  to support router with DVR/HA mode.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1778207/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841411] [NEW] Instances recovered after failed migrations enter error state

2019-08-26 Thread Lucian Petrut
Public bug reported:

Most users expect that if a live migration fails but the instance is
fully recovered, it shouldn't enter 'error' state. Setting the migration
status to 'error' should be enough. This simplifies debugging, making it
clear that the instance dosn't have to be manually recovered.

This patch changed this behavior, indirectly affecting the Hyper-V
driver, which propagates migration errors:
Idfdce9e7dd8106af01db0358ada15737cb846395

When using the Hyper-V driver, instances enter error state even after
successful recoveries. We may copy the Libvirt driver behavior and avoid
propagating exceptions in this case.

** Affects: compute-hyperv
 Importance: Undecided
 Status: New

** Affects: nova
 Importance: Undecided
 Assignee: Lucian Petrut (petrutlucian94)
 Status: In Progress


** Tags: hyper-v

** Also affects: nova
   Importance: Undecided
   Status: New

** Tags added: hyper-v

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1841411

Title:
  Instances recovered after failed migrations enter error state

Status in compute-hyperv:
  New
Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Most users expect that if a live migration fails but the instance is
  fully recovered, it shouldn't enter 'error' state. Setting the
  migration status to 'error' should be enough. This simplifies
  debugging, making it clear that the instance dosn't have to be
  manually recovered.

  This patch changed this behavior, indirectly affecting the Hyper-V
  driver, which propagates migration errors:
  Idfdce9e7dd8106af01db0358ada15737cb846395

  When using the Hyper-V driver, instances enter error state even after
  successful recoveries. We may copy the Libvirt driver behavior and
  avoid propagating exceptions in this case.

To manage notifications about this bug go to:
https://bugs.launchpad.net/compute-hyperv/+bug/1841411/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1841400] [NEW] nonexistent hacking rules descriptions in HACKING.rst

2019-08-26 Thread Takashi NATSUME
Public bug reported:

N321, N328, N329, N330 hacking rules have been removed, but the descriptions 
are still in HACKING.rst.
The rule number N307 is missing in HACKING.rst.

** Affects: nova
 Importance: Undecided
 Assignee: Takashi NATSUME (natsume-takashi)
 Status: In Progress


** Tags: doc

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1841400

Title:
  nonexistent hacking rules descriptions in HACKING.rst

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  N321, N328, N329, N330 hacking rules have been removed, but the descriptions 
are still in HACKING.rst.
  The rule number N307 is missing in HACKING.rst.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1841400/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp