[Yahoo-eng-team] [Bug 1680918] Re: Nova upgrade fails if PCI devices of type-PF or type-PCI are present in the database

2017-04-20 Thread OpenStack Infra
Reviewed:  https://review.openstack.org/456397
Committed: 
https://git.openstack.org/cgit/openstack/nova/commit/?id=7f3f0ef1fbb51f6f17d2c13840e0f98d17fa9093
Submitter: Jenkins
Branch:master

commit 7f3f0ef1fbb51f6f17d2c13840e0f98d17fa9093
Author: Steven Webster 
Date:   Wed Apr 5 09:05:07 2017 -0400

Fix mitaka online migration for PCI devices

Currently, a validation error is thrown if we find any PCI device
records which have not populated the parent_addr column on a nova
upgrade. However, the only PCI records for which a parent_addr
makes sense for are those with a device type of 'type-VF' (ie. an
SRIOV virtual function).  PCI records with a device type of 'type-PF'
or 'type-PCI' will not have a parent_addr.  If any of those records
are present on upgrade, the validation will fail.

This change checks that the device type of the PCI record is
'type-VF' when making sure the parent_addr has been correctly
populated

Closes-Bug: #1680918
Change-Id: Ia7e773674a4976fc03deee3f08a6ddb45568ec11


** Changed in: nova
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1680918

Title:
  Nova upgrade fails if PCI devices of type-PF or type-PCI are present
  in the database

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) newton series:
  New
Status in OpenStack Compute (nova) ocata series:
  New

Bug description:
  Description
  ===
  If a Nova DB is upgraded (migrated) while containing PCI devices with device 
type 'type-PF' or 'type-PCI',
  a validation error similar to this will be thrown:

  "ValidationError: There are still 2 unmigrated records in the
  pci_devices table. Migration cannot continue until all records have
  been migrated."

  The error is generated by the 330_enforce_mitaka_online_migrations.py
  upgrade script.

  The PCI device migration validation will fail if any PCI device
  entries without a populated parent_addr are found.  However, the
  parent_addr really only applies to PCI device entries of 'type-VF'
  (ie. SRIOV virtual functions)

  This is an example of what the pci_devices table looks like with SRIOV
  enabled PCI devices if the appropriate entries are whitelisted in
  nova.conf:

  MariaDB [nova]> select * from pci_devices;
  
+-+-++-++-+--++---+--+--+-+---++---++---+--+
  | created_at  | updated_at  | deleted_at | deleted | id | 
compute_node_id | address  | product_id | vendor_id | dev_type | dev_id 
  | label   | status| extra_info | instance_uuid | request_id | 
numa_node | parent_addr  |
  
+-+-++-++-+--++---+--+--+-+---++---++---+--+
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL   |   0 |  1 | 
  1 | :05:10.1 | 10ed   | 8086  | type-VF  | 
pci__05_10_1 | label_8086_10ed | available | {} | NULL  | 
NULL   | 0 | :05:00.1 |
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL   |   0 |  2 | 
  1 | :05:10.3 | 10ed   | 8086  | type-VF  | 
pci__05_10_3 | label_8086_10ed | available | {} | NULL  | 
NULL   | 0 | :05:00.1 |
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL   |   0 |  3 | 
  1 | :05:10.5 | 10ed   | 8086  | type-VF  | 
pci__05_10_5 | label_8086_10ed | available | {} | NULL  | 
NULL   | 0 | :05:00.1 |
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL   |   0 |  4 | 
  1 | :05:10.7 | 10ed   | 8086  | type-VF  | 
pci__05_10_7 | label_8086_10ed | available | {} | NULL  | 
NULL   | 0 | :05:00.1 |
  | 2017-04-06 21:53:13 | NULL| NULL   |   0 |  5 | 
  1 | :05:00.0 | 10fb   | 8086  | type-PF  | 
pci__05_00_0 | label_8086_10fb | available | {} | NULL  | 
NULL   | 0 | NULL |
  | 2017-04-06 21:53:13 | NULL| NULL   |   0 |  6 | 
  1 | :05:00.1 | 10fb   | 8086  | type-PF  | 
pci__05_00_1 | label_8086_10fb | available | {} | NULL  | 
NULL   | 0 | NULL |
  

[Yahoo-eng-team] [Bug 1680918] Re: Nova upgrade fails if PCI devices of type-PF or type-PCI are present in the database

2017-04-12 Thread Matt Riedemann
** Also affects: nova/ocata
   Importance: Undecided
   Status: New

** Also affects: nova/newton
   Importance: Undecided
   Status: New

** Changed in: nova
   Importance: Undecided => High

** Changed in: nova/newton
   Importance: Undecided => High

** Changed in: nova/ocata
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1680918

Title:
  Nova upgrade fails if PCI devices of type-PF or type-PCI are present
  in the database

Status in OpenStack Compute (nova):
  In Progress
Status in OpenStack Compute (nova) newton series:
  New
Status in OpenStack Compute (nova) ocata series:
  New

Bug description:
  Description
  ===
  If a Nova DB is upgraded (migrated) while containing PCI devices with device 
type 'type-PF' or 'type-PCI',
  a validation error similar to this will be thrown:

  "ValidationError: There are still 2 unmigrated records in the
  pci_devices table. Migration cannot continue until all records have
  been migrated."

  The error is generated by the 330_enforce_mitaka_online_migrations.py
  upgrade script.

  The PCI device migration validation will fail if any PCI device
  entries without a populated parent_addr are found.  However, the
  parent_addr really only applies to PCI device entries of 'type-VF'
  (ie. SRIOV virtual functions)

  This is an example of what the pci_devices table looks like with SRIOV
  enabled PCI devices if the appropriate entries are whitelisted in
  nova.conf:

  MariaDB [nova]> select * from pci_devices;
  
+-+-++-++-+--++---+--+--+-+---++---++---+--+
  | created_at  | updated_at  | deleted_at | deleted | id | 
compute_node_id | address  | product_id | vendor_id | dev_type | dev_id 
  | label   | status| extra_info | instance_uuid | request_id | 
numa_node | parent_addr  |
  
+-+-++-++-+--++---+--+--+-+---++---++---+--+
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL   |   0 |  1 | 
  1 | :05:10.1 | 10ed   | 8086  | type-VF  | 
pci__05_10_1 | label_8086_10ed | available | {} | NULL  | 
NULL   | 0 | :05:00.1 |
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL   |   0 |  2 | 
  1 | :05:10.3 | 10ed   | 8086  | type-VF  | 
pci__05_10_3 | label_8086_10ed | available | {} | NULL  | 
NULL   | 0 | :05:00.1 |
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL   |   0 |  3 | 
  1 | :05:10.5 | 10ed   | 8086  | type-VF  | 
pci__05_10_5 | label_8086_10ed | available | {} | NULL  | 
NULL   | 0 | :05:00.1 |
  | 2017-04-06 21:01:21 | 2017-04-06 21:53:13 | NULL   |   0 |  4 | 
  1 | :05:10.7 | 10ed   | 8086  | type-VF  | 
pci__05_10_7 | label_8086_10ed | available | {} | NULL  | 
NULL   | 0 | :05:00.1 |
  | 2017-04-06 21:53:13 | NULL| NULL   |   0 |  5 | 
  1 | :05:00.0 | 10fb   | 8086  | type-PF  | 
pci__05_00_0 | label_8086_10fb | available | {} | NULL  | 
NULL   | 0 | NULL |
  | 2017-04-06 21:53:13 | NULL| NULL   |   0 |  6 | 
  1 | :05:00.1 | 10fb   | 8086  | type-PF  | 
pci__05_00_1 | label_8086_10fb | available | {} | NULL  | 
NULL   | 0 | NULL |
  
+-+-++-++-+--++---+--+--+-+---++---++---+--+
  6 rows in set (0.00 sec)

  
  I think the upgrade script should be checking the PciDevice dev_type field 
for 'type-VF' when validating the parent_addr.

  
  Steps to reproduce
  ==

  1. Install a Mitaka control node and edit the nova.conf file to
  include 1 or more PCI devices in the pci_passthrough_whitelist.  ie:

  pci_passthrough_whitelist = {"vendor_id": "8086", "product_id":"10fb"}

  2. Install a second Newton or newer control node and edit the
  nova.conf to point to the SQL database of the Mitaka node. ie:

  [database]
  connection = mysql+pymysql://root:supersecret@/nova?charset=utf8
   
  [api_database]
  connection =