On Thu, Dec 20, 2018 at 10:29:14AM +0100, Roger Pau Monné wrote: >On Thu, Dec 20, 2018 at 10:46:29AM +0800, Chao Gao wrote: >> On Wed, Dec 19, 2018 at 09:57:51AM +0100, Roger Pau Monné wrote: >> >On Tue, Dec 18, 2018 at 10:43:37PM +0800, Chao Gao wrote: >> >> I find some pass-thru devices don't work any more across guest >> >> reboot. Assigning it to another domain also meets the same issue. And >> >> the only way to make it work again is un-binding and binding it to >> >> pciback. Someone reported this issue one year ago [1]. >> >> >> >> If the device's driver doesn't disable MSI-X during shutdown or qemu is >> >> killed/crashed before the domain shutdown, this domain's pirq won't be >> >> unmapped. Then xen takes over this work, unmapping all pirq-s, when >> >> destroying guest. But as pciback has already disabled meory decoding >> >> before >> >> xen unmapping pirq, Xen has to sets the host_maskall flag and maskall bit >> >> to mask a MSI rather than sets maskbit in MSI-x table. The call trace of >> >> this process is: >> >> >> >> ->arch_domain_destroy >> >> ->free_domain_pirqs >> >> ->unmap_domain_pirq (if pirq isn't unmapped by qemu) >> >> ->pirq_guest_force_unbind >> >> ->__pirq_guest_unbind >> >> ->mask_msi_irq(=desc->handler->disable()) >> >> ->the warning in msi_set_mask_bit() >> >> >> >> The host_maskall bit will prevent guests from clearing the maskall bit >> >> even the device is assigned to another guest later. Then guests cannot >> >> receive MSIs from this device. >> >> >> >> To fix this issue, a pirq is unmapped before memory decoding is disabled >> >> by >> >> pciback. Specifically, when a device is detached from a guest, all >> >> established >> >> mappings between pirq and msi are destroying before changing the >> >> ownership. >> >> >> >> [1]: >> >> https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg02520.html >> >> >> >> Signed-off-by: Chao Gao <chao....@intel.com> >> >> --- >> >> Applied this patch, qemu would report the error below: >> >> [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: >> >> 1, pirq: 302, gvec: 0xd5) >> >> [00:05.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: >> >> 1, pirq: 301, gvec: 0xe5) >> >> [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: >> >> 1, pirq: 359, gvec: 0x41) >> >> [00:04.0] msi_msix_disable: Error: Unbinding of MSI-X failed. (err: >> >> 1, pirq: 358, gvec: 0x51) >> >> >> >> Despite of the error, guest shutdown or device hotplug finishs smoothly. >> >> It seems to me that qemu tries to unbind a msi which is already unbound by >> >> the code added by this patch. I am not sure whether it is acceptable to >> >> leave this error there. >> > >> >So QEMU would try to unmap IRQs after unbinding the device? I think >> >> It seems to me yes. I don't know the reason right now. maybe because it >> is an asynchronous process? >> >> >QEMU should be fixed to first unmap the IRQs and then unbind the >> >device. >> >> Yes. Agree. >> >> > >> >As long as this doesn't affect QEMU functionality I guess the Xen side >> >can be committed, but ideally a QEMU patch to avoid those error >> >messages should be committed at the same time. >> > >> >> --- >> >> xen/drivers/passthrough/io.c | 57 >> >> +++++++++++++++++++++++++++++-------------- >> >> xen/drivers/passthrough/pci.c | 49 +++++++++++++++++++++++++++++++++++++ >> >> xen/include/xen/iommu.h | 1 + >> >> 3 files changed, 89 insertions(+), 18 deletions(-) >> >> >> >> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c >> >> index a6eb8a4..56ee1ef 100644 >> >> --- a/xen/drivers/passthrough/io.c >> >> +++ b/xen/drivers/passthrough/io.c >> >> @@ -619,6 +619,42 @@ int pt_irq_create_bind( >> >> return 0; >> >> } >> >> >> >> +static void pt_irq_destroy_bind_common(struct domain *d, struct pirq >> >> *pirq) >> >> +{ >> >> + struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq); >> >> + >> >> + ASSERT(spin_is_locked(&d->event_lock)); >> >> + >> >> + if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) && >> >> + list_empty(&pirq_dpci->digl_list) ) >> >> + { >> >> + pirq_guest_unbind(d, pirq); >> >> + msixtbl_pt_unregister(d, pirq); >> >> + if ( pt_irq_need_timer(pirq_dpci->flags) ) >> >> + kill_timer(&pirq_dpci->timer); >> >> + pirq_dpci->flags = 0; >> >> + /* >> >> + * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the >> >> + * call to pt_pirq_softirq_reset. >> >> + */ >> >> + pt_pirq_softirq_reset(pirq_dpci); >> >> + >> >> + pirq_cleanup_check(pirq, d); >> >> + } >> >> +} >> >> + >> >> +void pt_irq_destroy_bind_msi(struct domain *d, struct pirq *pirq) >> >> +{ >> >> + struct hvm_pirq_dpci *pirq_dpci = pirq_dpci(pirq); >> >> + >> >> + ASSERT(spin_is_locked(&d->event_lock)); >> >> + >> >> + if ( pirq_dpci && pirq_dpci->gmsi.posted ) >> >> + pi_update_irte(NULL, pirq, 0); >> >> + >> >> + pt_irq_destroy_bind_common(d, pirq); >> >> +} >> >> + >> >> int pt_irq_destroy_bind( >> >> struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind) >> >> { >> >> @@ -727,26 +763,11 @@ int pt_irq_destroy_bind( >> >> } >> >> else >> >> what = "bogus"; >> >> - } >> >> - else if ( pirq_dpci && pirq_dpci->gmsi.posted ) >> >> - pi_update_irte(NULL, pirq, 0); >> >> - >> >> - if ( pirq_dpci && (pirq_dpci->flags & HVM_IRQ_DPCI_MAPPED) && >> >> - list_empty(&pirq_dpci->digl_list) ) >> >> - { >> >> - pirq_guest_unbind(d, pirq); >> >> - msixtbl_pt_unregister(d, pirq); >> >> - if ( pt_irq_need_timer(pirq_dpci->flags) ) >> >> - kill_timer(&pirq_dpci->timer); >> >> - pirq_dpci->flags = 0; >> >> - /* >> >> - * See comment in pt_irq_create_bind's PT_IRQ_TYPE_MSI before the >> >> - * call to pt_pirq_softirq_reset. >> >> - */ >> >> - pt_pirq_softirq_reset(pirq_dpci); >> >> >> >> - pirq_cleanup_check(pirq, d); >> >> + pt_irq_destroy_bind_common(d, pirq); >> >> } >> >> + else >> >> + pt_irq_destroy_bind_msi(d, pirq); >> >> >> >> spin_unlock(&d->event_lock); >> >> >> >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c >> >> index 1277ce2..88a8007 100644 >> >> --- a/xen/drivers/passthrough/pci.c >> >> +++ b/xen/drivers/passthrough/pci.c >> >> @@ -368,6 +368,7 @@ static struct pci_dev *alloc_pdev(struct pci_seg >> >> *pseg, u8 bus, u8 devfn) >> >> return NULL; >> >> } >> >> spin_lock_init(&msix->table_lock); >> >> + msix->warned = DOMID_INVALID; >> >> pdev->msix = msix; >> >> } >> >> >> >> @@ -1514,6 +1515,52 @@ static int assign_device(struct domain *d, u16 >> >> seg, u8 bus, u8 devfn, u32 flag) >> >> return rc; >> >> } >> >> >> >> +/* >> >> + * Unmap established mappings between domain's pirq and device's MSI. >> >> + * These mappings were set up by qemu/guest and are expected to be >> >> + * destroyed when changing the device's ownership. >> >> + */ >> >> +static void pci_unmap_msi(struct pci_dev *pdev) >> >> +{ >> >> + struct msi_desc *entry, *tmp; >> >> + >> >> + ASSERT(pcidevs_locked()); >> >> + >> >> + if ( !pdev->domain ) >> >> + return; >> >> + >> >> + spin_lock(&pdev->domain->event_lock); >> >> + list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list ) >> > >> >Do you really need the _safe version here? Couldn't you even use: >> >> Don't need the _safe version. >> >> > >> >while ( (entry = list_first_entry_or_null(...)) != NULL ) >> >... >> >> I think it is the same with list_for_each_entry(). Any reason makes you think >> this one would be better? > >Doesn't 'entry' get freed when you call unmap_domain_pirq? In which >case using the list pointer from that struct would be a >use-after-free. > >Using list_first_entry_or_null you don't need the previous entry in >order to get the next one, since you always pick the first one until >the list is empty.
Agree. here I should use _safe version and will take you advice. > >> > >> >> + { >> >> + struct pirq *info; >> >> + struct hvm_pirq_dpci *pirq_dpci; >> >> + int pirq = domain_irq_to_pirq(pdev->domain, entry->irq), >> >> pirq_orig; >> >> + >> >> + pirq_orig = pirq; >> >> + >> >> + if ( !pirq ) >> >> + continue; >> >> + >> >> + /* For forcibly unmapped pirq, lookup radix tree with absolute >> >> value */ >> >> + if ( pirq < 0) >> >> + pirq = -pirq; >> > >> >I'm not sure I follow, the pirq hasn't been unmapped at this point >> >yet? >> >> Qemu (i.e. compromised qemu) has the ability to do this. Right? we can't >> assert the pirq hasn't been unmapped here. > >If the PIRQ is unmapped then the 'entry' would also be gone (freed) >AFAICT (see unmap_domain_pirq which calls msi_free_irq)? > >I think that any entry in pdev->msi_list will always have entry->irq > >= 0, but maybe I'm missing something. AFAICT map_domain_pirq will not >add an entry with irq < 0. Yes, you are right. I will add a "WARN_ON" when pirq < 0. Thanks Chao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel