[Yahoo-eng-team] [Bug 1905276] Re: Overriding hypervisor name for resource provider always requires a complete list of interfaces/bridges

2021-04-30 Thread Takashi Kajinami
Closing this because our expectation here is that neutron and libvirt should 
detect the same hostname.
I've reported another bug to fix current incompatibility.

 https://bugs.launchpad.net/neutron/+bug/1926693

** Changed in: neutron
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1905276

Title:
  Overriding hypervisor name for resource provider always requires a
  complete list of interfaces/bridges

Status in neutron:
  Invalid

Bug description:
  In some deployments, hostnames can be different between hosts and resource 
provider records. For example in deployment managed by TripleO we use short 
host name (without domain name) in host, while we use FQDN for resource 
provider records (this comes from the FQDN set to the host
  option in nova.conf).

  This causes an issue with the way how currently neutron looks up the
  root resource provider because placment API requires the exact
  hostname and doesn't automatically translate short name and FQDN.

  To fix the issue we need to set the resource_provider_hypervisors
  option[1] now but it is very redundant to list up all devices or
  bridges in this option to override hostnames for the devices/bridges
  by the same value.

  [1] https://review.opendev.org/#/c/696600/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1905276/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1921381] Re: iSCSI: Flushing issues when multipath config has changed

2021-04-30 Thread Lee Yarwood
** Changed in: nova/wallaby
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1921381

Title:
  iSCSI: Flushing issues when multipath config has changed

Status in OpenStack Compute (nova):
  Fix Committed
Status in OpenStack Compute (nova) wallaby series:
  Fix Released
Status in OpenStack Compute (nova) xena series:
  Fix Released
Status in os-brick:
  Fix Committed
Status in os-brick queens series:
  Triaged
Status in os-brick rocky series:
  Triaged
Status in os-brick stein series:
  Triaged
Status in os-brick train series:
  Fix Committed
Status in os-brick ussuri series:
  In Progress
Status in os-brick victoria series:
  Fix Committed
Status in os-brick wallaby series:
  Fix Released
Status in os-brick xena series:
  Fix Committed

Bug description:
  OS-Brick disconnect_volume code assumes that the use_multipath
  parameter that is used to instantiate the connector has the same value
  than the connector that was used on the original connect_volume call.

  Unfortunately this is not necessarily true, because Nova can attach a
  volume, then its multipath configuration can be enabled or disabled,
  and then a detach can be issued.

  This leads to a series of serious issues such as:

  - Not flushing the single path on disconnect_volume (possible data loss) and 
leaving it as a leftover device on the host when Nova calls 
terminate-connection on Cinder.
  - Not flushing the multipath device (possible data loss) and leaving it as a 
lefover device similarly to the other case.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1921381/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1449084] Re: Boot from volume does not boot from volume

2021-04-30 Thread Lee Yarwood
I've got not idea if this was ever valid but it definitely isn't now.


$ openstack volume create --image cirros-0.5.1-x86_64-disk --size 1 test
[..]
$ openstack volume set --bootable test
$ openstack image delete cirros-0.5.1-x86_64-disk
$ openstack server create --volume test --flavor 1 --network private test
[..]
$ sudo virsh domblklist d9d4dcc7-4f9e-4e1d-a2ce-c52f201ac00a
 Target   Source

 vda  /dev/sdb

$ sudo virsh console d9d4dcc7-4f9e-4e1d-a2ce-c52f201ac00a
[..]
=== cirros: current=0.5.1 uptime=4.52 ===
       
 / __/ __   / __ \/ __/
/ /__ / // __// __// /_/ /\ \ 
\___//_//_/  /_/   \/___/ 
   http://cirros-cloud.net



** Changed in: nova
   Status: Confirmed => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1449084

Title:
  Boot from volume does not boot from volume

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Booting from volume does not actually boot from the volume; it boots
  from a Glance image. Perform the following steps to test this:

  Using the GUI steps:
  1. In the "Volumes" tab, select "Create Volume". For "Volume Source", select 
an image (I use CirrOS). Click "Create Volume". 
  2. On your host machine, open a terminal and overwrite the volume:
  $ sudo dd if=/dev/zero of=dev/stack-volumes-lvmdriver-1/volume-[ID OF VOLUME] 
bs=10M
  3. In the "Instances" tab, select "Launch Instance". For "Instance Boot 
Source", select "Boot from volume". Be sure to select a flavor with enough 
storage to support the volume (if using CirrOS, pick m1.tiny). For "Volume", 
select the volume you created in step 1. Click "Launch".

  Using the CLI:
  1. Create the volume:
  cinder create --image-id $(glance image-list | grep 
cirros-0.3.1-x86_64-uec[^-] | cut -d '|' -f 2 | xargs echo) --name 
sample-volume 1
  2. Overwrite the volume:
  $ sudo dd if=/dev/zero of=dev/stack-volumes-lvmdriver-1/volume-[ID OF VOLUME] 
bs=10M
  3. Boot the volume:
  nova boot --flavor m1.tiny --boot-volume sample-volume instance

  Expected result: The instance should not boot in either of these cases; the 
volumes are empty.
  Actual result: The instance boots successfully in both of these cases. 

  Additional test to show that the instance is actually being booted
  from the Glance image:

  Using the CLI:
  1. Create the volume:
  cinder create --image-id $(glance image-list | grep 
cirros-0.3.1-x86_64-uec[^-] | cut -d '|' -f 2 | xargs echo) --name 
sample-volume 1
  2. Delete the Glance image:
  glance image-list | grep cirros-0.3.1-x86_64-uec | cut -d '|' -f 2 | xargs 
glance image-delete
  3. Attempt to boot the volume:
  nova boot --flavor m1.tiny --boot-volume sample-volume instance

  Expected result: This should succeed; we are attempting to boot from the 
volume.
  Actual result: This fails.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1449084/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1524898] Re: Volume based live migration aborted unexpectedly

2021-04-30 Thread Lee Yarwood
** Changed in: nova
   Status: In Progress => Invalid

** Changed in: nova
 Assignee: melanie witt (melwitt) => (unassigned)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1524898

Title:
  Volume based live migration aborted unexpectedly

Status in OpenStack Compute (nova):
  Invalid

Bug description:
  Volume based live migration is failing during tempest testing in the
  check and experimental pipelines

  
http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:\%22Live%20Migration%20failure:%20operation%20failed:%20migration%20job:%20unexpectedly%20failed\%22%20AND%20tags:\%22screen-n-cpu.txt\%22

  shows 42 failures since 12/8

  2015-12-18 15:26:26.411 ERROR nova.virt.libvirt.driver 
[req-9a4dda34-987c-42a6-a20b-22729cd202e7 
tempest-LiveBlockMigrationTestJSON-1501972407 
tempest-LiveBlockMigrationTestJSON-1128863260] [instance: 
b50e1dc8-9b4c-4edb-be0e-8777cae14f01] Live Migration failure: operation failed: 
migration job: unexpectedly failed
  2015-12-18 15:26:26.412 DEBUG nova.virt.libvirt.driver 
[req-9a4dda34-987c-42a6-a20b-22729cd202e7 
tempest-LiveBlockMigrationTestJSON-1501972407 
tempest-LiveBlockMigrationTestJSON-1128863260] [instance: 
b50e1dc8-9b4c-4edb-be0e-8777cae14f01] Migration operation thread notification 
thread_finished /opt/stack/new/nova/nova/virt/libvirt/driver.py:6094
  Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 
457, in fire_timers
  timer()
File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 
58, in __call__
  cb(*args, **kw)
File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 168, 
in _do_send
  waiter.switch(result)
File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 
214, in main
  result = function(*args, **kwargs)
File "/opt/stack/new/nova/nova/utils.py", line 1161, in context_wrapper
  return func(*args, **kwargs)
File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 5696, in 
_live_migration_operation
  instance=instance)
File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 
204, in __exit__
  six.reraise(self.type_, self.value, self.tb)
File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 5651, in 
_live_migration_operation
  CONF.libvirt.live_migration_bandwidth)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, 
in doit
  result = proxy_call(self._autowrap, f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, 
in proxy_call
  rv = execute(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, 
in execute
  six.reraise(c, e, tb)
File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, 
in tworker
  rv = meth(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1511, in 
migrateToURI
  if ret == -1: raise libvirtError ('virDomainMigrateToURI() failed', 
dom=self)
  libvirtError: operation failed: migration job: unexpectedly failed

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1524898/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1454252] Re: Support volume retype (volume migration) of attached volumes when VM is inactive

2021-04-30 Thread Lee Yarwood
*** This bug is a duplicate of bug 1673090 ***
https://bugs.launchpad.net/bugs/1673090

https://review.opendev.org/q/Iff17f7cee7a56037b35d1a361a0b3279d0a885d6
fixed this a while ago by limiting the use of swap_volume for times
where we have an actual domain on the compute.

** This bug has been marked a duplicate of bug 1673090
   Swap disk on stopped instance fails silently

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1454252

Title:
  Support volume retype (volume migration) of attached volumes when VM
  is inactive

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  Steps to reproduce:

  1. Boot a VM from the volume
  2. Shutoff the VM
  3. Try to migrate the volume between different storages of the same type  
(cinder retype with --migration-policy on-demand )
  4. The process fails in Nova with the libvirt error in blockRebase, because 
libvirt can't find a VM instance.

  Expected result:

  The volume should move to another storage.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1454252/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1926780] [NEW] Multicast traffic scenario test is failing sometimes on OVN job

2021-04-30 Thread Slawek Kaplonski
Public bug reported:

Logstash query:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22RuntimeError%3A%20Unregistered%20server%20received%20unexpected%20packet(s).%5C%22

It seems to be happening mostly on wallaby and victoria jobs. It's not
very often but happens from time to time.

Example of the failure:
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b66/712474/7/check
/neutron-tempest-plugin-scenario-ovn/b661cd4/testr_results.html

Traceback (most recent call last):
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/common/utils.py",
 line 80, in wait_until_true
eventlet.sleep(sleep)
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/eventlet/greenthread.py",
 line 36, in sleep
hub.switch()
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/eventlet/hubs/hub.py",
 line 313, in switch
return self.greenlet.switch()
eventlet.timeout.Timeout: 60 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/test_multicast.py",
 line 274, in test_multicast_between_vms_on_same_network
self._check_multicast_conectivity(sender=sender, receivers=receivers,
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/test_multicast.py",
 line 381, in _check_multicast_conectivity
utils.wait_until_true(
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/common/utils.py",
 line 84, in wait_until_true
raise exception
RuntimeError: Unregistered server received unexpected packet(s).

** Affects: neutron
 Importance: High
 Status: Confirmed


** Tags: gate-failure ovn

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1926780

Title:
  Multicast traffic scenario test is failing sometimes on OVN job

Status in neutron:
  Confirmed

Bug description:
  Logstash query:
  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22RuntimeError%3A%20Unregistered%20server%20received%20unexpected%20packet(s).%5C%22

  It seems to be happening mostly on wallaby and victoria jobs. It's not
  very often but happens from time to time.

  Example of the failure:
  
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b66/712474/7/check
  /neutron-tempest-plugin-scenario-ovn/b661cd4/testr_results.html

  Traceback (most recent call last):
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/common/utils.py",
 line 80, in wait_until_true
  eventlet.sleep(sleep)
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/eventlet/greenthread.py",
 line 36, in sleep
  hub.switch()
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/eventlet/hubs/hub.py",
 line 313, in switch
  return self.greenlet.switch()
  eventlet.timeout.Timeout: 60 seconds

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/test_multicast.py",
 line 274, in test_multicast_between_vms_on_same_network
  self._check_multicast_conectivity(sender=sender, receivers=receivers,
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/scenario/test_multicast.py",
 line 381, in _check_multicast_conectivity
  utils.wait_until_true(
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/common/utils.py",
 line 84, in wait_until_true
  raise exception
  RuntimeError: Unregistered server received unexpected packet(s).

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1926780/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1926787] [NEW] [DB] Neutron quota request implementation can end in a lock status

2021-04-30 Thread Rodolfo Alonso
Public bug reported:

Neutron quota request implementation can end in a DB lock status. The
quota is assigned per resource (port, network, security group, etc.) and
per project. When a request is done, a DB lock is set for this
(resource, project) tuple. This lock in the DB engine to lock this tuple
in all workers of all API servers.

That implies there is a a bottleneck when a high number of requests
arrive to the API at the same time. If the number of requests exceeds
the number of resources processes, the DB locked transactions will
increase indefinitely. This can be seen in the DB executing:

  $ mysql -e "show processlist;" | egrep "reservations|quotausages"

The query used by Neutron to lock this (resource, project) tuple is:

UPDATE quotausages SET dirty=1 WHERE quotausages.project_id =  \
  AND quotausages.resource = 

An improved quota system should be implemented that allow parallel
resource request and avoids this DB lock status.

NOTE: please check [2][3]. "Neutron does not enforce quotas in such a
way that a quota violation like this could never occur". That means even
with this restrictive DB locking method, resource overcommit is
possible.

[1]https://github.com/openstack/neutron/blob/b4812af4ee3cd651b0b03d5f90e71e8201ccfed7/neutron/objects/quota.py#L150
[2]https://bugzilla.redhat.com/show_bug.cgi?id=1884455#c2
[3]https://bugs.launchpad.net/neutron/+bug/1862050/comments/5

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1955661

** Affects: neutron
 Importance: Wishlist
 Status: New

** Changed in: neutron
   Importance: Undecided => Wishlist

** Description changed:

  Neutron quota request implementation can end in a DB lock status. The
  quota is assigned per resource (port, network, security group, etc.) and
  per project. When a request is done, a DB lock is set for this
  (resource, project) tuple. This lock in the DB engine to lock this tuple
  in all workers of all API servers.
  
  That implies there is a a bottleneck when a high number of requests
  arrive to the API at the same time. If the number of requests exceeds
  the number of resources processes, the DB locked transactions will
  increase indefinitely. This can be seen in the DB executing:
  
-   $ mysql -e "show processlist;" | egrep "reservations|quotausages"
- 
+   $ mysql -e "show processlist;" | egrep "reservations|quotausages"
  
  The query used by Neutron to lock this (resource, project) tuple is:
  
- UPDATE quotausages SET dirty=1 WHERE quotausages.project_id = 
 \
-   AND quotausages.resource = 
+ UPDATE quotausages SET dirty=1 WHERE quotausages.project_id = 
 \
+   AND quotausages.resource = 
  
- 
- An improved quota system should be implemented that allow parallel resource 
request and avoids this DB lock status.
+ An improved quota system should be implemented that allow parallel
+ resource request and avoids this DB lock status.
  
  NOTE: please check [2][3]. "Neutron does not enforce quotas in such a
  way that a quota violation like this could never occur". That means even
  with this restrictive DB locking method, resource overcommit is
  possible.
  
- 
  
[1]https://github.com/openstack/neutron/blob/b4812af4ee3cd651b0b03d5f90e71e8201ccfed7/neutron/objects/quota.py#L150
  [2]https://bugzilla.redhat.com/show_bug.cgi?id=1884455#c2
  [3]https://bugs.launchpad.net/neutron/+bug/1862050/comments/5
+ 
+ Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1955661

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1926787

Title:
  [DB] Neutron quota request implementation can end in a lock status

Status in neutron:
  New

Bug description:
  Neutron quota request implementation can end in a DB lock status. The
  quota is assigned per resource (port, network, security group, etc.)
  and per project. When a request is done, a DB lock is set for this
  (resource, project) tuple. This lock in the DB engine to lock this
  tuple in all workers of all API servers.

  That implies there is a a bottleneck when a high number of requests
  arrive to the API at the same time. If the number of requests exceeds
  the number of resources processes, the DB locked transactions will
  increase indefinitely. This can be seen in the DB executing:

    $ mysql -e "show processlist;" | egrep "reservations|quotausages"

  The query used by Neutron to lock this (resource, project) tuple is:

  UPDATE quotausages SET dirty=1 WHERE quotausages.project_id = 
 \
    AND quotausages.resource = 

  An improved quota system should be implemented that allow parallel
  resource request and avoids this DB lock status.

  NOTE: please check [2][3]. "Neutron does not enforce quotas in such a
  way that a quota violation like this could never occur". That means
  even with this restrictive DB locking method, resource overcommit is
  possible.

  
[1]https://github.com/openstack/neutron/blob/b

[Yahoo-eng-team] [Bug 1925388] Re: Incorrect reference to 802.1ad in network_data.json schema

2021-04-30 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/nova/+/788790
Committed: 
https://opendev.org/openstack/nova/commit/0b05b838a47f4c05ecf8443ec48a6d6b2670f579
Submitter: "Zuul (22348)"
Branch:master

commit 0b05b838a47f4c05ecf8443ec48a6d6b2670f579
Author: Balazs Gibizer 
Date:   Thu Apr 29 17:42:59 2021 +0200

Fix bond_mode enum 802.1ad -> 802.3ad

This seems to me a clerical error made a long time ago in the spec [1].
The 802.1ad values does not seems to be a valid bonding mode but 802.3ad
does.

This patch fixes the schema in the nova doc. No test are changed as nova
does not generate this part of the network metadata.

[1] 
https://specs.openstack.org/openstack/nova-specs/specs/kilo/approved/metadata-service-network-info.html#rest-api-impact

Change-Id: I0055d13b055e34372a8186008ba75be68aa2edf9
Closes-Bug: #1925388


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1925388

Title:
  Incorrect reference to 802.1ad in network_data.json schema

Status in Ironic:
  New
Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  This affects multiple projects, including nova and ironic, and can
  (currently, at least) be seen in multiple references to "802.1ad"
  across multiple projects:

  
https://codesearch.opendev.org/?q=802%5C.1ad&i=nope&files=&excludeFiles=&repos=

  I'm not sure how, or if it's even appropriate for these projects, to
  file the bug in multiple places, so I'm filing it where I perceive the
  error originates, in nova's definition of the network_data.json
  schema.

  802.1ad is a nested VLAN specification, not a bonding mode
  specification. When referencing VLANs in the above codesearch results,
  it is used correctly. However, when seen in the context of bonding
  interfaces, "802.1ad" is not a valid bond mode. It should instead, be
  "802.3ad" (s/1/3/), indicating the LACP bonding mode.

  This can be confirmed a number of ways, including searching for the
  correct string across projects. This can also be seen by comparing the
  enum of valid bond modes in the schema to the actual output of the
  "bonding" kernel module info.

  schema enum in what I believe is the corresponding version tag to
  wallaby:

  
https://opendev.org/openstack/nova/src/tag/23.0.0/doc/api_schemas/network_data.json#L177

  output of `modinfo bonding | grep mode:`

parm:   mode:Mode of operation; 0 for balance-rr, 1 for
  active-backup, 2 for balance-xor, 3 for broadcast, 4 for 802.3ad, 5
  for balance-tlb, 6 for balance-alb (charp)

  Note that the list of modes in the schema enum is almost exactly the
  same as the list of modes mentioned in the bonding kernel driver info,
  with the exception of the 802.1ad vs. 802.3ad difference. The list
  given by the kernel driver is correct.

  In terms of expected vs. actual results, my expected result was that
  using a "bond_mode" of "802.3ad" when setting up a bond link in my
  network_data.json would not trigger a validation error when validating
  my JSON against the schema, since "802.3ad" is otherwise a valid bond
  mode. However, due to this error, it does trigger a validation error:

Message: Value "802.3ad" is not defined in enum.
Schema path: 
http://openstack.org/nova/network_data.json#/definitions/bond/properties/bond_mode/enum

  I have not yet attempted to use this network_data.json configuration,
  so I don't yet know if specifying "802.1ad" results in the correct
  bonding mode being used anyway. This bug appears to have existed since
  the introduction of the schema: https://specs.openstack.org/openstack
  /nova-specs/specs/liberty/implemented/metadata-service-network-
  info.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/ironic/+bug/1925388/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1815989] Re: OVS drops RARP packets by QEMU upon live-migration causes up to 40s ping pause in Rocky

2021-04-30 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/nova/+/602432
Committed: 
https://opendev.org/openstack/nova/commit/a62dd42c0dbb6b2ab128e558e127d76962738446
Submitter: "Zuul (22348)"
Branch:master

commit a62dd42c0dbb6b2ab128e558e127d76962738446
Author: Stephen Finucane 
Date:   Fri Apr 30 12:51:35 2021 +0100

libvirt: Delegate OVS plug to os-vif

os-vif 1.15.0 added the ability to create an OVS port during plugging
by specifying the 'create_port' attribute in the 'port_profile' field.
By delegating port creation to os-vif, we can rely on it's 'isolate_vif'
config option [1] that will temporarily configure the VLAN to 4095
(0xfff), which is reserved for implementation use [2] and is used by
neutron to as a dead VLAN [3]. By doing this, we ensure VIFs are plugged
securely, preventing guests from accessing other tenants' networks
before the neutron OVS agent can wire up the port.

This change requires a little dance as part of the live migration flow.
Since we can't be certain the destination host has a version of os-vif
that supports this feature, we need to use a sentinel to indicate when
it does. Typically we would do so with a field in
'LibvirtLiveMigrateData', such as the 'src_supports_numa_live_migration'
and 'dst_supports_numa_live_migration' fields used to indicate support
for NUMA-aware live migration. However, doing this prevents us
backporting this important fix since o.vo changes are not backportable.
Instead, we (somewhat evilly) rely on the free-form nature of the
'VIFMigrateData.profile_json' string field, which stores JSON blobs and
is included in 'LibvirtLiveMigrateData' via the 'vifs' attribute, to
transport this sentinel. This is a hack but is necessary to work around
the lack of a free-form "capabilities" style dict that would allow us do
backportable fixes to live migration features.

Note that this change has the knock on effect of modifying the XML
generated for OVS ports: when hybrid plug is false will now be of type
'ethernet' rather than 'bridge' as before. This explains the larger than
expected test damage but should not affect users.

[1] 
https://opendev.org/openstack/os-vif/src/tag/2.4.0/vif_plug_ovs/ovs.py#L90-L93
[2] https://en.wikipedia.org/wiki/IEEE_802.1Q#Frame_format
[3] https://answers.launchpad.net/neutron/+question/231806

Change-Id: I11fb5d3ada7f27b39c183157ea73c8b72b4e672e
Depends-On: Id12486b3127ab4ac8ad9ef2b3641da1b79a25a50
Closes-Bug: #1734320
Closes-Bug: #1815989


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1815989

Title:
  OVS drops RARP packets by QEMU upon live-migration causes up to 40s
  ping pause in Rocky

Status in neutron:
  In Progress
Status in OpenStack Compute (nova):
  Fix Released
Status in os-vif:
  Invalid

Bug description:
  This issue is well known, and there were previous attempts to fix it,
  like this one

  https://bugs.launchpad.net/neutron/+bug/1414559

  
  This issue still exists in Rocky and gets worse. In Rocky, nova compute, nova 
libvirt and neutron ovs agent all run inside containers.

  So far the only simply fix I have is to increase the number of RARP
  packets QEMU sends after live-migration from 5 to 10. To be complete,
  the nova change (not merged) proposed in the above mentioned activity
  does not work.

  I am creating this ticket hoping to get an up-to-date (for Rockey and
  onwards) expert advise on how to fix in nova-neutron.

  
  For the record, below are the time stamps in my test between neutron ovs 
agent "activating" the VM port and rarp packets seen by tcpdump on the compute. 
10 RARP packets are sent by (recompiled) QEMU, 7 are seen by tcpdump, the 2nd 
last packet barely made through.

  openvswitch-agent.log:

  2019-02-14 19:00:13.568 73453 INFO
  neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent
  [req-26129036-b514-4fa0-a39f-a6b21de17bb9 - - - - -] Port
  57d0c265-d971-404d-922d-963c8263e6eb updated. Details: {'profile': {},
  'network_qos_policy_id': None, 'qos_policy_id': None,
  'allowed_address_pairs': [], 'admin_state_up': True, 'network_id':
  '1bf4b8e0-9299-485b-80b0-52e18e7b9b42', 'segmentation_id': 648,
  'fixed_ips': [

  {'subnet_id': 'b7c09e83-f16f-4d4e-a31a-e33a922c0bac', 'ip_address': 
'10.0.1.4'}
  ], 'device_owner': u'compute:nova', 'physical_network': u'physnet0', 
'mac_address': 'fa:16:3e:de:af:47', 'device': 
u'57d0c265-d971-404d-922d-963c8263e6eb', 'port_security_enabled': True, 
'port_id': '57d0c265-d971-404d-922d-963c8263e6eb', 'network_type': u'vlan', 
'security_groups': [u'5f2175d7-c2c1-49fd-9d05-3a8de3846b9c']}
  2019-02-14 19:00:13.568 73453 INFO 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent 
[req-26129036-b514-4fa0