Re: live migration fails: qemu placing pci devices at different locations

2023-11-01 Thread James Dingwall
On Tue, Oct 31, 2023 at 10:07:29AM +, James Dingwall wrote:
> Hi,
> 
> I'm having a bit of trouble performing live migration between hvm guests.  The
> sending side is xen 4.14.5 (qemu 5.0), receiving 4.15.5 (qemu 5.1).  The error
> message recorded in qemu-dm---incoming.log:
> 
> qemu-system-i386: Unknown savevm section or instance ':00:04.0/vga' 0. 
> Make sure that your current VM setup matches your saved VM setup, including 
> any hotplugged devices
> 
> I have patched libxl_dm.c to explicitly assign `addr=xx` values for various
> devices and when these are correct the domain migrates correctly.  However
> the configuration differences between guests means that the values are not
> consistent.  The domain config file doesn't allow the pci address to be
> expressed in the configuration for, e.g. `soundhw="DEVICE"`
> 
> e.g. 
> 
> diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
> index 6e531863ac0..daa7c49846f 100644
> --- a/tools/libs/light/libxl_dm.c
> +++ b/tools/libs/light/libxl_dm.c
> @@ -1441,7 +1441,7 @@ static int libxl__build_device_model_args_new(libxl__gc 
> *gc,
>  flexarray_append(dm_args, "-spice");
>  flexarray_append(dm_args, spiceoptions);
>  if (libxl_defbool_val(b_info->u.hvm.spice.vdagent)) {
> -flexarray_vappend(dm_args, "-device", "virtio-serial",
> +flexarray_vappend(dm_args, "-device", 
> "virtio-serial,addr=04",
>  "-chardev", "spicevmc,id=vdagent,name=vdagent", 
> "-device",
>  "virtserialport,chardev=vdagent,name=com.redhat.spice.0",
>  NULL);
> 
> The order of devices on the qemu command line (below) appears to be the same
> so my assumption is that the internals of qemu have resulted in things being
> connected in a different order.  The output of a Windows `lspci` tool is
> also included.
> 
> Could anyone make any additional suggestions on how I could try to gain
> consistency between the different qemu versions?

After a bit more head scratching we worked out the cause and a solution for
our case.  In xen 4.15.4 d65ebacb78901b695bc5e8a075ad1ad865a78928 was
introduced to stop using the deprecated qemu `-soundhw` option.  The qemu
device initialisation code looks like:

...
soundhw_init(); // handles old -soundhw option
...
/* init generic devices */
rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
qemu_opts_foreach(qemu_find_opts("device"),
  device_init_func, NULL, _fatal);
...

So for the old -soundhw option this was processed before any -device options
and the sound card was assigned the next available slot on the bus and then
any further -devices were added according to the command line order.  After
that xen change the sound card was added as a -device and depending on the
other emulated hardware would be added at a different point to the equivalent
-soundhw option.  By re-ordering the qemu command line building in libxl_dm.c
we can make the sound card be the first -device which resolves the migration
problem.

I think this would also have been a problem for live migration between 4.15.3
and 4.15.4 for a vm with a sound card and not just the major version jump we
are doing.

James



live migration fails: qemu placing pci devices at different locations

2023-10-31 Thread James Dingwall
Hi,

I'm having a bit of trouble performing live migration between hvm guests.  The
sending side is xen 4.14.5 (qemu 5.0), receiving 4.15.5 (qemu 5.1).  The error
message recorded in qemu-dm---incoming.log:

qemu-system-i386: Unknown savevm section or instance ':00:04.0/vga' 0. Make 
sure that your current VM setup matches your saved VM setup, including any 
hotplugged devices

I have patched libxl_dm.c to explicitly assign `addr=xx` values for various
devices and when these are correct the domain migrates correctly.  However
the configuration differences between guests means that the values are not
consistent.  The domain config file doesn't allow the pci address to be
expressed in the configuration for, e.g. `soundhw="DEVICE"`

e.g. 

diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
index 6e531863ac0..daa7c49846f 100644
--- a/tools/libs/light/libxl_dm.c
+++ b/tools/libs/light/libxl_dm.c
@@ -1441,7 +1441,7 @@ static int libxl__build_device_model_args_new(libxl__gc 
*gc,
 flexarray_append(dm_args, "-spice");
 flexarray_append(dm_args, spiceoptions);
 if (libxl_defbool_val(b_info->u.hvm.spice.vdagent)) {
-flexarray_vappend(dm_args, "-device", "virtio-serial",
+flexarray_vappend(dm_args, "-device", "virtio-serial,addr=04",
 "-chardev", "spicevmc,id=vdagent,name=vdagent", "-device",
 "virtserialport,chardev=vdagent,name=com.redhat.spice.0",
 NULL);

The order of devices on the qemu command line (below) appears to be the same
so my assumption is that the internals of qemu have resulted in things being
connected in a different order.  The output of a Windows `lspci` tool is
also included.

Could anyone make any additional suggestions on how I could try to gain
consistency between the different qemu versions?

Thanks,
James


xen 4.14.5

/usr/lib/xen/bin/qemu-system-i386 -xen-domid 19 -no-shutdown
  -chardev socket,id=libxl-cmd,fd=19,server,nowait -S 
  -mon chardev=libxl-cmd,mode=control
  -chardev 
socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-19,server,nowait
  -mon chardev=libxenstat-cmd,mode=control
  -nodefaults -no-user-config -name  -vnc 0.0.0.0:93 -display none
  -k en-us
  -spice 
port=35993,tls-port=0,addr=127.0.0.1,disable-ticketing,agent-mouse=on,disable-copy-paste,image-compression=auto_glz
 
  -device virtio-serial -chardev spicevmc,id=vdagent,name=vdagent
  -device virtserialport,chardev=vdagent,name=com.redhat.spice.0
  -device VGA,vgamem_mb=16
  -boot order=cn
  -usb -usbdevice tablet
  -soundhw hda
  -smp 2,maxcpus=2
  -device rtl8139,id=nic0,netdev=net0,mac=00:16:3e:64:c8:68
  -netdev type=tap,id=net0,ifname=vif19.0-emu,script=no,downscript=no
  -object 
tls-creds-x509,id=tls0,endpoint=client,dir=/etc/certificates/usbredir,verify-peer=yes
  -chardev 
socket,id=charredir_serial0,host=127.0.0.1,port=48052,reconnect=2,nodelay,keepalive=on,user-timeout=5
  -device isa-serial,chardev=charredir_serial0
  -chardev 
socket,id=charredir_serial1,host=127.0.0.1,port=48054,reconnect=2,nodelay,keepalive=on,user-timeout=5
  -device isa-serial,chardev=charredir_serial1
  -chardev 
socket,id=charredir_serial2,host=127.0.0.1,port=48055,reconnect=2,nodelay,keepalive=on,user-timeout=5
  -device pci-serial,chardev=charredir_serial2
  -trace events=/etc/xen/qemu-trace-options -machine xenfv -m 2032
  -drive file=/dev/drbd1002,if=ide,index=0,media=disk,format=raw,cache=writeback
  -drive file=/dev/drbd1003,if=ide,index=1,media=disk,format=raw,cache=writeback
  -runas 131091:131072

00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] 
(rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01)
00:03.0 Audio device: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
High Definition Audio Controller (rev 01)
00:04.0 Communication controller: Red Hat, Inc Virtio console
00:05.0 VGA compatible controller: Device 1234: (rev 02)
00:07.0 Serial controller: Red Hat, Inc. QEMU PCI 16550A Adapter (rev 01)



xen 4.15.5

/usr/lib/xen/bin/qemu-system-i386 -xen-domid 15 -no-shutdown
  -chardev socket,id=libxl-cmd,fd=19,server=on,wait=off -S
  -mon chardev=libxl-cmd,mode=control
  -chardev 
socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-15,server=on,wait=off
  -mon chardev=libxenstat-cmd,mode=control
  -nodefaults -no-user-config -name  -vnc 0.0.0.0:93 -display none
  -k en-us
  -spice 
port=35993,tls-port=0,addr=127.0.0.1,disable-ticketing=on,agent-mouse=on,disable-copy-paste=on,image-compression=auto_glz
  -device virtio-serial -chardev spicevmc,id=vdagent,name=vdagent
  -device