[Yahoo-eng-team] [Bug 1839920] [NEW] Macvtap CI fails on Train
Public bug reported: MacVtap CI[1] started to fail after merging commit[2] We think it related to this https://github.com/libvirt/libvirt/commit/b91a33638476cf57d910b6056a8fc11921edd029#diff-28bc83a0c3470bba712dfa6824a79c9d. So they change from setting the admin mac to the effective mac. The problem is that the sriov-nic agent relay on the admin mac to send rpc to the neutron server. If the mac and the pci slot don't much it ignores it and the VM stuck in spawn until timeout [1] https://wiki.openstack.org/wiki/ThirdPartySystems/Mellanox_CI [2] https://review.opendev.org/#/c/31/ ** Affects: nova Importance: Undecided Assignee: Lenny (lennyb) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1839920 Title: Macvtap CI fails on Train Status in OpenStack Compute (nova): In Progress Bug description: MacVtap CI[1] started to fail after merging commit[2] We think it related to this https://github.com/libvirt/libvirt/commit/b91a33638476cf57d910b6056a8fc11921edd029#diff-28bc83a0c3470bba712dfa6824a79c9d. So they change from setting the admin mac to the effective mac. The problem is that the sriov-nic agent relay on the admin mac to send rpc to the neutron server. If the mac and the pci slot don't much it ignores it and the VM stuck in spawn until timeout [1] https://wiki.openstack.org/wiki/ThirdPartySystems/Mellanox_CI [2] https://review.opendev.org/#/c/31/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1839920/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1708920] [NEW] Cold migration fails
Public bug reported: Nova cold migration intermediate fails due to broken connection to SQL cell database setup: devstack master(pike) allinone physical server compute physical server SR-IOV over Mellanox ConnectX-4 NICs Scenario: Running tempest cold migration few times, it fails on the 3rd time. #testr run tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_cold_migration Issue: One of the computes loses sql connection to novacell[1] coldmigration fails since it's migration is not allowed to the same node[2] Logs: AllinOne http://52.169.200.208/tmp/cold_migration_bug_20170806/controller/ Compute http://52.169.200.208/tmp/cold_migration_bug_20170806/compute [1] novacell Error: http://52.169.200.208/tmp/cold_migration_bug_20170806/controller/logs/n-cond-cell1.log http://paste.openstack.org/show/617598/ [2] Compute error http://52.169.200.208/tmp/cold_migration_bug_20170806/compute/logs/n-cpu.log http://paste.openstack.org/show/617599/ ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1708920 Title: Cold migration fails Status in OpenStack Compute (nova): New Bug description: Nova cold migration intermediate fails due to broken connection to SQL cell database setup: devstack master(pike) allinone physical server compute physical server SR-IOV over Mellanox ConnectX-4 NICs Scenario: Running tempest cold migration few times, it fails on the 3rd time. #testr run tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_cold_migration Issue: One of the computes loses sql connection to novacell[1] coldmigration fails since it's migration is not allowed to the same node[2] Logs: AllinOne http://52.169.200.208/tmp/cold_migration_bug_20170806/controller/ Compute http://52.169.200.208/tmp/cold_migration_bug_20170806/compute [1] novacell Error: http://52.169.200.208/tmp/cold_migration_bug_20170806/controller/logs/n-cond-cell1.log http://paste.openstack.org/show/617598/ [2] Compute error http://52.169.200.208/tmp/cold_migration_bug_20170806/compute/logs/n-cpu.log http://paste.openstack.org/show/617599/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1708920/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1618382] [NEW] test_update_instance_port_admin_state Fails sumetime with DB Update error
Public bug reported: Sometimes in Mellanox CI we see that test_update_instance_port_admin_state[1] fails with error [2] "StaleDataError: UPDATE statement on table 'standardattributes' expected to update 1 row(s); 0 were matched." [3] [1] http://13.69.151.247/Test_Neutron_SRIOV_cloudx25/233_cloudx-25//testr_results.html.gz [2] http://paste.openstack.org/show/564796/ [3] http://13.69.151.247/Test_Neutron_SRIOV_cloudx25/233_cloudx-25/logs/q-svc.log.gz ** Affects: neutron Importance: Undecided Status: New ** Description changed: - - Sometimes tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_update_instance_port_admin_state [1] - fails with error [2] "StaleDataError: UPDATE statement on table 'standardattributes' expected to update 1 row(s); 0 were matched." [3] - + Sometimes in Mellanox CI we see that + test_update_instance_port_admin_state[1] fails with error [2] + "StaleDataError: UPDATE statement on table 'standardattributes' expected + to update 1 row(s); 0 were matched." [3] [1] http://13.69.151.247/Test_Neutron_SRIOV_cloudx25/233_cloudx-25//testr_results.html.gz [2] http://paste.openstack.org/show/564796/ [3] http://13.69.151.247/Test_Neutron_SRIOV_cloudx25/233_cloudx-25/logs/q-svc.log.gz -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1618382 Title: test_update_instance_port_admin_state Fails sumetime with DB Update error Status in neutron: New Bug description: Sometimes in Mellanox CI we see that test_update_instance_port_admin_state[1] fails with error [2] "StaleDataError: UPDATE statement on table 'standardattributes' expected to update 1 row(s); 0 were matched." [3] [1] http://13.69.151.247/Test_Neutron_SRIOV_cloudx25/233_cloudx-25//testr_results.html.gz [2] http://paste.openstack.org/show/564796/ [3] http://13.69.151.247/Test_Neutron_SRIOV_cloudx25/233_cloudx-25/logs/q-svc.log.gz To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1618382/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1560860] [NEW] mellanox infiniband SR-IOV(ib_hostdev vif) detach port fails
Public bug reported: detaching SRIOV port direct causes exception. # neutron port-create --binding:vnic_type=direct private # nova boot --flavor m1.small --image cirros-mellanox-x86_64-disk-ib --nic port-id=a247d89e-dae5-4d65-b414-e7bf3a26bfd1 vm1 # nova suspend vm1 logs: https://review.openstack.org/#/c/286668 http://144.76.193.39/ci-artifacts/286668/3/Neutron-Networking-MLNX-ML2/ Traceback message 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] Traceback (most recent call last): 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] File "/opt/stack/nova/nova/compute/manager.py", line 6515, in _error_out_instance_on_exception 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] yield 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] File "/opt/stack/nova/nova/compute/manager.py", line 4172, in suspend_instance 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] self.driver.suspend(context, instance) 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2638, in suspend 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] self._detach_sriov_ports(context, instance, guest) 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3425, in _detach_sriov_ports 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] if vif['vnic_type'] in network_model.VNIC_TYPES_SRIOV 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] AttributeError: 'LibvirtConfigGuestHostdevPCI' object has no attribute 'source_dev' 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] ** Affects: nova Importance: Undecided Assignee: Moshe Levi (moshele) Status: New ** Tags: pci -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1560860 Title: mellanox infiniband SR-IOV(ib_hostdev vif) detach port fails Status in OpenStack Compute (nova): New Bug description: detaching SRIOV port direct causes exception. # neutron port-create --binding:vnic_type=direct private # nova boot --flavor m1.small --image cirros-mellanox-x86_64-disk-ib --nic port-id=a247d89e-dae5-4d65-b414-e7bf3a26bfd1 vm1 # nova suspend vm1 logs: https://review.openstack.org/#/c/286668 http://144.76.193.39/ci-artifacts/286668/3/Neutron-Networking-MLNX-ML2/ Traceback message 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] Traceback (most recent call last): 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] File "/opt/stack/nova/nova/compute/manager.py", line 6515, in _error_out_instance_on_exception 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] yield 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] File "/opt/stack/nova/nova/compute/manager.py", line 4172, in suspend_instance 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] self.driver.suspend(context, instance) 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2638, in suspend 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] self._detach_sriov_ports(context, instance, guest) 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 3425, in _detach_sriov_ports 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] if vif['vnic_type'] in network_model.VNIC_TYPES_SRIOV 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] AttributeError: 'LibvirtConfigGuestHostdevPCI' object has no attribute 'source_dev' 2016-03-03 04:45:42.775 1801 ERROR nova.compute.manager [instance: cdf2e34d-bc2e-4edb-aff7-516b97487730] To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1560860/+subscriptions -- Mailing list: http
[Yahoo-eng-team] [Bug 1535367] [NEW] Failure on SR-IOV . Missing 'parent_addr
Public bug reported: Mellanox CI fails on SR-IOV hardware 1. Running nova from master commit ffa07781ab47baf096854cd6c22a3e433eab3f0d 2. Full logs http://144.76.193.39/ci-artifacts/269109/1/Nova-ML2-Sriov/ 3. Reproduce: ./stack.sh neutron port-create --binding:vnic_type=direct private nova boot --flavor m1.small --image mellanox_eth --nic port-id= vm1 4. port binding fails nova fails to find appropriate host ** Affects: nova Importance: Critical Status: Confirmed ** Tags: sr-iov -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1535367 Title: Failure on SR-IOV . Missing 'parent_addr Status in OpenStack Compute (nova): Confirmed Bug description: Mellanox CI fails on SR-IOV hardware 1. Running nova from master commit ffa07781ab47baf096854cd6c22a3e433eab3f0d 2. Full logs http://144.76.193.39/ci-artifacts/269109/1/Nova-ML2-Sriov/ 3. Reproduce: ./stack.sh neutron port-create --binding:vnic_type=direct private nova boot --flavor m1.small --image mellanox_eth --nic port-id= vm1 4. port binding fails nova fails to find appropriate host To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1535367/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp