Public bug reported:

Description
===========
While testing livemigration between ovs and ovs-dpdk on centos 7.5 i encountered
a qemu internal error the resulted in the qemu monitor closeing.
when the migration failed during the virDomainMigrateToURI3 it 
rolled back the migration leaving the vm running on the souce node
the vm however no longer had network connectivity because the vm port
on the source node had no vif type as the host-id in the port was no long set.
this is likely due to not reactivating the souce node binding when the 
migration fails.

if the vm is hard-rebooted in this state it enters teh error state as
the vif bingins are broken.

note the qemu error is out of scope of this bug. this bug just focuses on the 
fact
that nova does not correctly roll back the migration and result in the vm still 
running
correctly with networking on the source node when a qemu error is thrown.


Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
deploy a two node devstack with ovs on the controler and ovs-dpdk on the second 
node.
configure nova compute with libvirt/kvm virt type on both nodes.
alloate hugepages on both nodes and boot a vm on either node.
attempt to live migrate the vm to the other node.

Expected result
===============
if qemu has a bug that results in migration failure
the vif bindings should be resotred to there premigration
state and if networkig cannot be resore the vm should go directly
to error state.

Actual result
=============
the vm remained running on the source node with state active.
the vm interface no longer had a host_id set in the neutron vif:binding-details
and the vif type was none.

on hard reboot the vm entered error state with the following message
Error: Failed to perform requested operation on instance "dpdk", the instance 
has an error status: Please try again later [Error: vif_type parameter must be 
present for this vif_driver implementation].

Environment
===========
1. openstack rocky RC1 
   nova sha: afe4512bf66c89a061b1a7ccd3e7ac8e3b1b284d
   neutron sha: 1dda2bca862b1268c0f5ae39b7508f1b1cab6f15

2. Which hypervisor did you use?
   libvirt + kvm
   libvirtd (libvirt) 3.9.0
   QEMU emulator version 2.10.0(qemu-kvm-ev-2.10.0-21.el7_5.4.1)
   centos 7.5
    
2. Which storage type did you use?
  boot from volume using default lvm backed deploy by devstack.

3. Which networking type did you use?
  neutron ml2/ovs with ovs and ovs-dpdk
  this is not really relevent to the bug.

Logs & Configs
==============

n-cpu log

Aug 16 12:19:00 devstack5 nova-compute[14766]: 2018-08-16T11:19:00.057596Z 
qemu-kvm: load of migration failed: Invalid argument: libvirtError: internal 
error: qemu unexpectedly closed the monitor: 2018-08-16T11:19:00.055479Z 
qemu-kvm: get_pci_config_device: Bad config data: i=0x10 read: 61 device: 1 
cmask: ff wmask: c0 w1cmask:0
Aug 16 12:19:00 devstack5 nova-compute[14766]: DEBUG nova.virt.libvirt.driver 
[-] [instance: 05a564c2-2e25-4b07-911f-2ded60fb6756] Migration operation thread 
notification {{(pid=14766) thread_finished 
/opt/stack/nova/nova/virt/libvirt/driver.py:7478}}
Aug 16 12:19:00 devstack5 nova-compute[14766]: Traceback (most recent call 
last):
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 457, in 
fire_timers
Aug 16 12:19:00 devstack5 nova-compute[14766]: timer()
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__
Aug 16 12:19:00 devstack5 nova-compute[14766]: cb(*args, **kw)
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/event.py", line 168, in _do_send
Aug 16 12:19:00 devstack5 nova-compute[14766]: waiter.switch(result)
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
Aug 16 12:19:00 devstack5 nova-compute[14766]: result = function(*args, 
**kwargs)
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/opt/stack/nova/nova/utils.py", line 810, in context_wrapper
Aug 16 12:19:00 devstack5 nova-compute[14766]: return func(*args, **kwargs)
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 7134, in 
_live_migration_operation
Aug 16 12:19:00 devstack5 nova-compute[14766]: LOG.error("Live Migration 
failure: %s", e, instance=instance)
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
Aug 16 12:19:00 devstack5 nova-compute[14766]: self.force_reraise()
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
Aug 16 12:19:00 devstack5 nova-compute[14766]: six.reraise(self.type_, 
self.value, self.tb)
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 7127, in 
_live_migration_operation
Aug 16 12:19:00 devstack5 nova-compute[14766]: 
bandwidth=CONF.libvirt.live_migration_bandwidth)
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/opt/stack/nova/nova/virt/libvirt/guest.py", line 673, in migrate
Aug 16 12:19:00 devstack5 nova-compute[14766]: destination, params=params, 
flags=flags)
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
Aug 16 12:19:00 devstack5 nova-compute[14766]: result = 
proxy_call(self._autowrap, f, *args, **kwargs)
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
Aug 16 12:19:00 devstack5 nova-compute[14766]: rv = execute(f, *args, **kwargs)
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
Aug 16 12:19:00 devstack5 nova-compute[14766]: six.reraise(c, e, tb)
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
Aug 16 12:19:00 devstack5 nova-compute[14766]: rv = meth(*args, **kwargs)
Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib64/python2.7/site-packages/libvirt.py", line 1745, in migrateToURI3
Aug 16 12:19:00 devstack5 nova-compute[14766]: if ret == -1: raise libvirtError 
('virDomainMigrateToURI3() failed', dom=self)
Aug 16 12:19:00 devstack5 nova-compute[14766]: libvirtError: internal error: 
qemu unexpectedly closed the monitor: 2018-08-16T11:19:00.055479Z qemu-kvm: 
get_pci_config_device: Bad config data: i=0x10 read: 61 device: 1 cmask: ff 
wmask: c0 w1cmask:0
Aug 16 12:19:00 devstack5 nova-compute[14766]: 2018-08-16T11:19:00.055541Z 
qemu-kvm: Failed to load PCIDevice:config
Aug 16 12:19:00 devstack5 nova-compute[14766]: 2018-08-16T11:19:00.055555Z 
qemu-kvm: Failed to load virtio-net:virtio
Aug 16 12:19:00 devstack5 nova-compute[14766]: 2018-08-16T11:19:00.055568Z 
qemu-kvm: error while loading state for instance 0x0 of device 
'0000:00:03.0/virtio-net'
Aug 16 12:19:00 devstack5 nova-compute[14766]: 2018-08-16T11:19:00.055761Z 
qemu-kvm: warning: TSC frequency mismatch between VM (2712005 kHz) and host 
(2712000 kHz), and TSC scaling unavailable
Aug 16 12:19:00 devstack5 nova-compute[14766]: 2018-08-16T11:19:00.057082Z 
qemu-kvm: warning: TSC frequency mismatch between VM (2712005 kHz) and host 
(2712000 kHz), and TSC scaling unavailable

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1788014

Title:
  when live migration fails due to a internal error rollback is not
  handeled correctly.

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  While testing livemigration between ovs and ovs-dpdk on centos 7.5 i 
encountered
  a qemu internal error the resulted in the qemu monitor closeing.
  when the migration failed during the virDomainMigrateToURI3 it 
  rolled back the migration leaving the vm running on the souce node
  the vm however no longer had network connectivity because the vm port
  on the source node had no vif type as the host-id in the port was no long set.
  this is likely due to not reactivating the souce node binding when the 
migration fails.

  if the vm is hard-rebooted in this state it enters teh error state as
  the vif bingins are broken.

  note the qemu error is out of scope of this bug. this bug just focuses on the 
fact
  that nova does not correctly roll back the migration and result in the vm 
still running
  correctly with networking on the source node when a qemu error is thrown.

  
  Steps to reproduce
  ==================
  A chronological list of steps which will bring off the
  issue you noticed:
  deploy a two node devstack with ovs on the controler and ovs-dpdk on the 
second node.
  configure nova compute with libvirt/kvm virt type on both nodes.
  alloate hugepages on both nodes and boot a vm on either node.
  attempt to live migrate the vm to the other node.

  Expected result
  ===============
  if qemu has a bug that results in migration failure
  the vif bindings should be resotred to there premigration
  state and if networkig cannot be resore the vm should go directly
  to error state.

  Actual result
  =============
  the vm remained running on the source node with state active.
  the vm interface no longer had a host_id set in the neutron 
vif:binding-details
  and the vif type was none.

  on hard reboot the vm entered error state with the following message
  Error: Failed to perform requested operation on instance "dpdk", the instance 
has an error status: Please try again later [Error: vif_type parameter must be 
present for this vif_driver implementation].

  Environment
  ===========
  1. openstack rocky RC1 
     nova sha: afe4512bf66c89a061b1a7ccd3e7ac8e3b1b284d
     neutron sha: 1dda2bca862b1268c0f5ae39b7508f1b1cab6f15

  2. Which hypervisor did you use?
     libvirt + kvm
     libvirtd (libvirt) 3.9.0
     QEMU emulator version 2.10.0(qemu-kvm-ev-2.10.0-21.el7_5.4.1)
     centos 7.5
      
  2. Which storage type did you use?
    boot from volume using default lvm backed deploy by devstack.

  3. Which networking type did you use?
    neutron ml2/ovs with ovs and ovs-dpdk
    this is not really relevent to the bug.

  Logs & Configs
  ==============

  n-cpu log

  Aug 16 12:19:00 devstack5 nova-compute[14766]: 2018-08-16T11:19:00.057596Z 
qemu-kvm: load of migration failed: Invalid argument: libvirtError: internal 
error: qemu unexpectedly closed the monitor: 2018-08-16T11:19:00.055479Z 
qemu-kvm: get_pci_config_device: Bad config data: i=0x10 read: 61 device: 1 
cmask: ff wmask: c0 w1cmask:0
  Aug 16 12:19:00 devstack5 nova-compute[14766]: DEBUG nova.virt.libvirt.driver 
[-] [instance: 05a564c2-2e25-4b07-911f-2ded60fb6756] Migration operation thread 
notification {{(pid=14766) thread_finished 
/opt/stack/nova/nova/virt/libvirt/driver.py:7478}}
  Aug 16 12:19:00 devstack5 nova-compute[14766]: Traceback (most recent call 
last):
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 457, in 
fire_timers
  Aug 16 12:19:00 devstack5 nova-compute[14766]: timer()
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/hubs/timer.py", line 58, in __call__
  Aug 16 12:19:00 devstack5 nova-compute[14766]: cb(*args, **kw)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/event.py", line 168, in _do_send
  Aug 16 12:19:00 devstack5 nova-compute[14766]: waiter.switch(result)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
  Aug 16 12:19:00 devstack5 nova-compute[14766]: result = function(*args, 
**kwargs)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/opt/stack/nova/nova/utils.py", line 810, in context_wrapper
  Aug 16 12:19:00 devstack5 nova-compute[14766]: return func(*args, **kwargs)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 7134, in 
_live_migration_operation
  Aug 16 12:19:00 devstack5 nova-compute[14766]: LOG.error("Live Migration 
failure: %s", e, instance=instance)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  Aug 16 12:19:00 devstack5 nova-compute[14766]: self.force_reraise()
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in 
force_reraise
  Aug 16 12:19:00 devstack5 nova-compute[14766]: six.reraise(self.type_, 
self.value, self.tb)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/opt/stack/nova/nova/virt/libvirt/driver.py", line 7127, in 
_live_migration_operation
  Aug 16 12:19:00 devstack5 nova-compute[14766]: 
bandwidth=CONF.libvirt.live_migration_bandwidth)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/opt/stack/nova/nova/virt/libvirt/guest.py", line 673, in migrate
  Aug 16 12:19:00 devstack5 nova-compute[14766]: destination, params=params, 
flags=flags)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
  Aug 16 12:19:00 devstack5 nova-compute[14766]: result = 
proxy_call(self._autowrap, f, *args, **kwargs)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
  Aug 16 12:19:00 devstack5 nova-compute[14766]: rv = execute(f, *args, 
**kwargs)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
  Aug 16 12:19:00 devstack5 nova-compute[14766]: six.reraise(c, e, tb)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
  Aug 16 12:19:00 devstack5 nova-compute[14766]: rv = meth(*args, **kwargs)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: File 
"/usr/lib64/python2.7/site-packages/libvirt.py", line 1745, in migrateToURI3
  Aug 16 12:19:00 devstack5 nova-compute[14766]: if ret == -1: raise 
libvirtError ('virDomainMigrateToURI3() failed', dom=self)
  Aug 16 12:19:00 devstack5 nova-compute[14766]: libvirtError: internal error: 
qemu unexpectedly closed the monitor: 2018-08-16T11:19:00.055479Z qemu-kvm: 
get_pci_config_device: Bad config data: i=0x10 read: 61 device: 1 cmask: ff 
wmask: c0 w1cmask:0
  Aug 16 12:19:00 devstack5 nova-compute[14766]: 2018-08-16T11:19:00.055541Z 
qemu-kvm: Failed to load PCIDevice:config
  Aug 16 12:19:00 devstack5 nova-compute[14766]: 2018-08-16T11:19:00.055555Z 
qemu-kvm: Failed to load virtio-net:virtio
  Aug 16 12:19:00 devstack5 nova-compute[14766]: 2018-08-16T11:19:00.055568Z 
qemu-kvm: error while loading state for instance 0x0 of device 
'0000:00:03.0/virtio-net'
  Aug 16 12:19:00 devstack5 nova-compute[14766]: 2018-08-16T11:19:00.055761Z 
qemu-kvm: warning: TSC frequency mismatch between VM (2712005 kHz) and host 
(2712000 kHz), and TSC scaling unavailable
  Aug 16 12:19:00 devstack5 nova-compute[14766]: 2018-08-16T11:19:00.057082Z 
qemu-kvm: warning: TSC frequency mismatch between VM (2712005 kHz) and host 
(2712000 kHz), and TSC scaling unavailable

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1788014/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to