On Fri, 28 Nov 2014, Sander Eikelenboom wrote:
> Friday, November 28, 2014, 6:09:52 PM, you wrote:
>
> > On Fri, 28 Nov 2014, Wei Liu wrote:
> >> On Fri, Nov 28, 2014 at 04:54:19PM +0000, Stefano Stabellini wrote:
> >> > On Fri, 28 Nov 2014, Wei Liu wrote:
> >> > > On Fri, Nov 28, 2014 at 03:08:51PM +0000, Stefano Stabellini wrote:
> >> > > > create ^
> >> > > > title it QMP connection problems prevent libxl from calling
> >> > > > libxl__device_pci_reset on domain shutdown
> >> > > > thanks
> >> > > >
> >> > > > On Wed, 26 Nov 2014, Sander Eikelenboom wrote:
> >> > > > > Hi,
> >> > > > >
> >> > > > > While testing a patch for Konrad i was wondering why "libxl_pci.c:
> >> > > > > libxl__device_pci_reset()"
> >> > > > > doesn't get called on guest shutdown of a HVM guest (qemu-xen)
> >> > > > > with pci passthrough.
> >> > > > > xl didn't show any problems on the commandline so i never drawed
> >> > > > > much attention
> >> > > > > to it, but /var/log/xen/xl-guest.log shows:
> >> > > > >
> >> > > > > Waiting for domain xbmc13sid (domid 19) to die [pid 20450]
> >> > > > > Domain 19 has shut down, reason code 0 0x0
> >> > > > > Action for shutdown reason code 0 is destroy
> >> > > > > Domain 19 needs to be cleaned up: destroying the domain
> >> > > > > libxl: error: libxl_qmp.c:443:qmp_next: Socket read error:
> >> > > > > Connection reset by peer
> >> > > > > libxl: error: libxl_qmp.c:701:libxl__qmp_initialize: Failed to
> >> > > > > connect to QMP
> >> > > > > libxl: error: libxl_pci.c:1242:do_pci_remove:
> >> > > > > libxl__qmp_pci_del: Connection reset by peer
> >> > > > > libxl: error: libxl_dm.c:1575:kill_device_model: Device Model
> >> > > > > already exited
> >> > > > > Done. Exiting now
> >> > > > >
> >> > > > > So it doesn't even get to calling "libxl_pci.c:
> >> > > > > libxl__device_pci_reset()".
> >> > > > >
> >> > > > > --
> >> > >
> >> > > It's worth checking whether the device model exits too early, i.e., did
> >> > > it crash? What's in the DM log?
> >> >
> >> > Firstly I was thinking that if force=1 it makes sense to continue even
> >> > when QEMU returns error, see:
> >> >
> >> > http://marc.info/?i=alpine.DEB.2.02.1411281648500.14135%40kaball.uk.xensource.com
> >>
> >> In normal case DM is not destroyed until PCI device is removed. So I
> >> think the real issue is DM crashed.
>
> > I don't think it crashed: QEMU detects that domain shutdown has been
> > called and exits.
> > See:
>
> > static void main_loop(void)
> > {
> > bool nonblocking;
> > int last_io = 0;
> > #ifdef CONFIG_PROFILER
> > int64_t ti;
> > #endif
> > do {
>
> > [...]
>
> > } while (!main_loop_should_exit());
> > }
>
>
> > main_loop_should_exit returns true when qemu_shutdown_requested().
>
>
> >> libxl.c:
> >> 1571 if (libxl__device_pci_destroy_all(gc, domid) < 0)
> >> 1572 LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "pci shutdown failed for
> >> domid %d", domid);
> >> 1573 rc = xc_domain_pause(ctx->xch, domid);
> >> 1574 if (rc < 0) {
> >> 1575 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc,
> >> "xc_domain_pause failed for %d", domid);
> >> 1576 }
> >> 1577 if (dm_present) {
> >> 1578 if (libxl__destroy_device_model(gc, domid) < 0)
> >> 1579 LIBXL__LOG(ctx, LIBXL__LOG_ERROR,
> >> "libxl__destroy_device_model failed for %d", domid);
> >> 1580
> >> 1581 libxl__qmp_cleanup(gc, domid);
> >> 1582 }
> >>
> >> The patch you posted covers up the real issue.
>
> > That is true, but I think the patch is correct regardless of the issue.
>
> Hmm how about qemu's "-no-shutdown" option ?
> >From the manpage:
> -no-shutdown
> Don't exit QEMU on guest shutdown, but instead only stop the emulation.
> This allows for instance switching to monitor to commit changes to the disk
> image.
>
> with:
> device_model_args_hvm = [ '-no-shutdown' ]
>
> I get:
>
> /var/log/xen/xl-tv.log:
>
> Waiting for domain tv (domid 18) to die [pid 18587]
> Domain 18 has shut down, reason code 0 0x0
> Action for shutdown reason code 0 is destroy
> Domain 18 needs to be cleaned up: destroying the domain
> libxl: error: libxl_pci.c:1299:do_pci_remove: xc_domain_irq_permission
> irq=40: Invalid argument
Thanks for testing! The suggestion might be a good idea, but I am not
sure whether it is suitable given the stage of the release cycle. I
would wait until 4.6 before making that change.
Regarding the xc_domain_irq_permission error, it is just a matter of
calling xc_physdev_unmap_pirq after xc_domain_irq_permission:
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 316643c..904d255 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1288,14 +1290,14 @@ skip1:
goto out;
}
if ((fscanf(f, "%u", &irq) == 1) && irq) {
- rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
- if (rc < 0) {
- LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_physdev_unmap_pirq
irq=%d", irq);
- }
rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
if (rc < 0) {
LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
"xc_domain_irq_permission irq=%d", irq);
}
+ rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
+ if (rc < 0) {
+ LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_physdev_unmap_pirq
irq=%d", irq);
+ }
}
fclose(f);
}
_______________________________________________
Xen-devel mailing list
[email protected]
http://lists.xen.org/xen-devel