Re: [Xen-devel] [PATCH v4 9/9] vpci/msix: add MSI-X handlers

2017-08-11 Thread Jan Beulich
>>> On 11.08.17 at 12:11,  wrote:
> On Fri, Aug 11, 2017 at 04:01:05AM -0600, Jan Beulich wrote:
>> >>> On 10.08.17 at 19:04,  wrote:
>> > On Wed, Aug 02, 2017 at 09:07:54AM -0600, Jan Beulich wrote:
>> >> >>> Roger Pau Monne  06/30/17 5:01 PM >>>
>> >> >+case PCI_MSIX_ENTRY_DATA_OFFSET:
>> >> >+/*
>> >> >+ * 8 byte writes to the msg data and vector control fields are
>> >> >+ * only allowed if the entry is masked.
>> >> >+ */
>> >> >+if ( len == 8 && !entry->masked && !msix->masked && 
>> >> >msix->enabled )
>> >> >+{
>> >> >+vpci_unlock(d);
>> >> >+return X86EMUL_OKAY;
>> >> >+}
>> >> 
>> >> I don't think this is correct - iirc such writes simply don't take effect 
>> >> immediately
>> >> (but I then seem to recall this to apply to the address field and 32-bit 
>> >> writes to
>> >> the data field as well). They'd instead take effect the next time the 
>> >> entry is being
>> >> unmasked (or some such). A while ago I did fix the qemu code to behave in 
>> >> this
>> >> way.
>> > 
>> > There's an Implementation Note called "Special Considerations for QWORD
>> > Accesses" in the MSI-X section of the PCI 3.0 spec that states:
>> > 
>> > If a given entry is currently masked (via its Mask bit or the Function
>> > Mask bit), software is permitted to fill in the Message Data and
>> > Vector Control fields with a single QWORD write, taking advantage of
>> > the fact the Message Data field is guaranteed to become visible to
>> > hardware no later than the Vector Control field.
>> > 
>> > So I think the above chunk is correct. The specification also states
>> > that:
>> > 
>> > Software must not modify the Address or Data fields of an entry while
>> > it is unmasked. Refer to Section 6.8.3.5 for details.
>> > 
>> > AFAICT this is not enforced by QEMU, and you can write to the
>> > address/data fields while the message is not masked. The update will
>> > only take effect once the message is masked and unmasked.
>> 
>> The spec also says:
>> 
>> "For MSI-X, a function is permitted to cache Address and Data values
>>  from unmasked MSIX Table entries. However, anytime software
>>  unmasks a currently masked MSI-X Table entry either by clearing its
>>  Mask bit or by clearing the Function Mask bit, the function must
>>  update any Address or Data values that it cached from that entry. If
>>  software changes the Address or Data value of an entry while the
>>  entry is unmasked, the result is undefined."
> 
> I'm not sure it's clear to me what to do if the guest writes to an
> entry while unmasked. For once it says that it must not modify it, but
> OTHO it says result is undefined when doing so.
> 
> Would you be fine with ignoring writes to the address and data fields
> if the entry is unmasked?

No, not really. I've intentionally pointed you to the qemu code,
as there I've implemented the caching behavior described by the
quote above. I'd expect vPCI to behave similarly.

>> >> >+for_each_domain ( d )
>> >> >+{
>> >> >+if ( !has_vpci(d) )
>> >> >+continue;
>> >> >+
>> >> >+printk("vPCI MSI-X information for guest %u\n", d->domain_id);
>> >> 
>> >> Wouldn't it be better (more useful) to dump the MSI and MSI-X data for a
>> >> domain next to each other?
>> > 
>> > Possibly yes, and printing the MSI and MSI-X data of each device
>> > together would be even better IMHO.
>> 
>> Not sure about that last aspect - devices aren't permitted to enable
>> both at the same time, so only one of them can really be relevant.
> 
> I think (for debugging purposes) it's useful to print both together
> in order to spot if the guest is doing something wrong.

For Dom0 maybe. For DomU we'd have to refuse guest attempts
to do anything possibly resulting in undefined behavior.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 9/9] vpci/msix: add MSI-X handlers

2017-08-11 Thread Roger Pau Monné
On Fri, Aug 11, 2017 at 04:01:05AM -0600, Jan Beulich wrote:
> >>> On 10.08.17 at 19:04,  wrote:
> > On Wed, Aug 02, 2017 at 09:07:54AM -0600, Jan Beulich wrote:
> >> >>> Roger Pau Monne  06/30/17 5:01 PM >>>
> >> >@@ -113,6 +148,35 @@ static int vpci_modify_bar(struct domain *d, const 
> >> >struct vpci_bar *bar,
> >> >if ( IS_ERR(mem) )
> >> >return -PTR_ERR(mem);
> >>  >
> >> >+/*
> >> >+ * Make sure the MSI-X regions of the BAR are not mapped into the 
> >> >domain
> >> >+ * p2m, or else the MSI-X handlers are useless. Only do this when 
> >> >mapping,
> >> >+ * since that's when the memory decoding on the device is enabled.
> >> >+ */
> >> >+for ( i = 0; map && i < ARRAY_SIZE(bar->msix); i++ )
> >> >+{
> >> >+struct vpci_msix_mem *msix = bar->msix[i];
> >> >+
> >> >+if ( !msix || msix->addr == INVALID_PADDR )
> >> >+continue;
> >> >+
> >> >+rc = vpci_unmap_msix(d, msix);
> >> 
> >> Why do you need this, instead of being able to simply rely on the rangeset
> >> based (un)mapping?
> > 
> > This is because the series that I've sent called: "x86/pvh: implement
> > iommu_inclusive_mapping for PVH Dom0" will map the MSI-X memory areas
> > into the guest, and thus we need to make sure they are not mapped
> > here for the emulation path to work.
> > 
> > https://lists.xenproject.org/archives/html/xen-devel/2017-04/msg02849.html 
> 
> Oh, okay. The patch description doesn't mention any such
> dependency though.

Will make that clearer on the next version, in fact I'm going to send
this series rebased on top of the iommu_inclusive_mapping one. AFAICT
that one is closer to being committed, and in any case changing the
order is trivial, there are not conflicts.

> >> >+break;
> >> >+case PCI_MSIX_ENTRY_DATA_OFFSET:
> >> >+/*
> >> >+ * 8 byte writes to the msg data and vector control fields are
> >> >+ * only allowed if the entry is masked.
> >> >+ */
> >> >+if ( len == 8 && !entry->masked && !msix->masked && 
> >> >msix->enabled )
> >> >+{
> >> >+vpci_unlock(d);
> >> >+return X86EMUL_OKAY;
> >> >+}
> >> 
> >> I don't think this is correct - iirc such writes simply don't take effect 
> >> immediately
> >> (but I then seem to recall this to apply to the address field and 32-bit 
> >> writes to
> >> the data field as well). They'd instead take effect the next time the 
> >> entry is being
> >> unmasked (or some such). A while ago I did fix the qemu code to behave in 
> >> this
> >> way.
> > 
> > There's an Implementation Note called "Special Considerations for QWORD
> > Accesses" in the MSI-X section of the PCI 3.0 spec that states:
> > 
> > If a given entry is currently masked (via its Mask bit or the Function
> > Mask bit), software is permitted to fill in the Message Data and
> > Vector Control fields with a single QWORD write, taking advantage of
> > the fact the Message Data field is guaranteed to become visible to
> > hardware no later than the Vector Control field.
> > 
> > So I think the above chunk is correct. The specification also states
> > that:
> > 
> > Software must not modify the Address or Data fields of an entry while
> > it is unmasked. Refer to Section 6.8.3.5 for details.
> > 
> > AFAICT this is not enforced by QEMU, and you can write to the
> > address/data fields while the message is not masked. The update will
> > only take effect once the message is masked and unmasked.
> 
> The spec also says:
> 
> "For MSI-X, a function is permitted to cache Address and Data values
>  from unmasked MSIX Table entries. However, anytime software
>  unmasks a currently masked MSI-X Table entry either by clearing its
>  Mask bit or by clearing the Function Mask bit, the function must
>  update any Address or Data values that it cached from that entry. If
>  software changes the Address or Data value of an entry while the
>  entry is unmasked, the result is undefined."

I'm not sure it's clear to me what to do if the guest writes to an
entry while unmasked. For once it says that it must not modify it, but
OTHO it says result is undefined when doing so.

Would you be fine with ignoring writes to the address and data fields
if the entry is unmasked?

> >> >+for_each_domain ( d )
> >> >+{
> >> >+if ( !has_vpci(d) )
> >> >+continue;
> >> >+
> >> >+printk("vPCI MSI-X information for guest %u\n", d->domain_id);
> >> 
> >> Wouldn't it be better (more useful) to dump the MSI and MSI-X data for a
> >> domain next to each other?
> > 
> > Possibly yes, and printing the MSI and MSI-X data of each device
> > together would be even better IMHO.
> 
> Not sure about that last aspect - devices aren't permitted to enable
> both at the same time, so only one of them can really be relevant.

I think (for debugging purposes) it's useful to print both together
in order to spot if the guest is doing 

Re: [Xen-devel] [PATCH v4 9/9] vpci/msix: add MSI-X handlers

2017-08-11 Thread Jan Beulich
>>> On 10.08.17 at 19:04,  wrote:
> On Wed, Aug 02, 2017 at 09:07:54AM -0600, Jan Beulich wrote:
>> >>> Roger Pau Monne  06/30/17 5:01 PM >>>
>> >@@ -113,6 +148,35 @@ static int vpci_modify_bar(struct domain *d, const 
>> >struct vpci_bar *bar,
>> >if ( IS_ERR(mem) )
>> >return -PTR_ERR(mem);
>>  >
>> >+/*
>> >+ * Make sure the MSI-X regions of the BAR are not mapped into the 
>> >domain
>> >+ * p2m, or else the MSI-X handlers are useless. Only do this when 
>> >mapping,
>> >+ * since that's when the memory decoding on the device is enabled.
>> >+ */
>> >+for ( i = 0; map && i < ARRAY_SIZE(bar->msix); i++ )
>> >+{
>> >+struct vpci_msix_mem *msix = bar->msix[i];
>> >+
>> >+if ( !msix || msix->addr == INVALID_PADDR )
>> >+continue;
>> >+
>> >+rc = vpci_unmap_msix(d, msix);
>> 
>> Why do you need this, instead of being able to simply rely on the rangeset
>> based (un)mapping?
> 
> This is because the series that I've sent called: "x86/pvh: implement
> iommu_inclusive_mapping for PVH Dom0" will map the MSI-X memory areas
> into the guest, and thus we need to make sure they are not mapped
> here for the emulation path to work.
> 
> https://lists.xenproject.org/archives/html/xen-devel/2017-04/msg02849.html 

Oh, okay. The patch description doesn't mention any such
dependency though.

>> >+break;
>> >+case PCI_MSIX_ENTRY_DATA_OFFSET:
>> >+/*
>> >+ * 8 byte writes to the msg data and vector control fields are
>> >+ * only allowed if the entry is masked.
>> >+ */
>> >+if ( len == 8 && !entry->masked && !msix->masked && msix->enabled )
>> >+{
>> >+vpci_unlock(d);
>> >+return X86EMUL_OKAY;
>> >+}
>> 
>> I don't think this is correct - iirc such writes simply don't take effect 
>> immediately
>> (but I then seem to recall this to apply to the address field and 32-bit 
>> writes to
>> the data field as well). They'd instead take effect the next time the entry 
>> is being
>> unmasked (or some such). A while ago I did fix the qemu code to behave in 
>> this
>> way.
> 
> There's an Implementation Note called "Special Considerations for QWORD
> Accesses" in the MSI-X section of the PCI 3.0 spec that states:
> 
> If a given entry is currently masked (via its Mask bit or the Function
> Mask bit), software is permitted to fill in the Message Data and
> Vector Control fields with a single QWORD write, taking advantage of
> the fact the Message Data field is guaranteed to become visible to
> hardware no later than the Vector Control field.
> 
> So I think the above chunk is correct. The specification also states
> that:
> 
> Software must not modify the Address or Data fields of an entry while
> it is unmasked. Refer to Section 6.8.3.5 for details.
> 
> AFAICT this is not enforced by QEMU, and you can write to the
> address/data fields while the message is not masked. The update will
> only take effect once the message is masked and unmasked.

The spec also says:

"For MSI-X, a function is permitted to cache Address and Data values
 from unmasked MSIX Table entries. However, anytime software
 unmasks a currently masked MSI-X Table entry either by clearing its
 Mask bit or by clearing the Function Mask bit, the function must
 update any Address or Data values that it cached from that entry. If
 software changes the Address or Data value of an entry while the
 entry is unmasked, the result is undefined."

>> >+for_each_domain ( d )
>> >+{
>> >+if ( !has_vpci(d) )
>> >+continue;
>> >+
>> >+printk("vPCI MSI-X information for guest %u\n", d->domain_id);
>> 
>> Wouldn't it be better (more useful) to dump the MSI and MSI-X data for a
>> domain next to each other?
> 
> Possibly yes, and printing the MSI and MSI-X data of each device
> together would be even better IMHO.

Not sure about that last aspect - devices aren't permitted to enable
both at the same time, so only one of them can really be relevant.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 9/9] vpci/msix: add MSI-X handlers

2017-08-10 Thread Roger Pau Monné
On Wed, Aug 02, 2017 at 09:07:54AM -0600, Jan Beulich wrote:
> >>> Roger Pau Monne  06/30/17 5:01 PM >>>
> >Note that accesses to the Table Offset, Table BIR, PBA Offset and PBA
> >BIR are not trapped by Xen at the moment.
> 
> They're mandated r/o by the spec anyway.

> 
> >@@ -113,6 +148,35 @@ static int vpci_modify_bar(struct domain *d, const 
> >struct vpci_bar *bar,
> >if ( IS_ERR(mem) )
> >return -PTR_ERR(mem);
>  >
> >+/*
> >+ * Make sure the MSI-X regions of the BAR are not mapped into the domain
> >+ * p2m, or else the MSI-X handlers are useless. Only do this when 
> >mapping,
> >+ * since that's when the memory decoding on the device is enabled.
> >+ */
> >+for ( i = 0; map && i < ARRAY_SIZE(bar->msix); i++ )
> >+{
> >+struct vpci_msix_mem *msix = bar->msix[i];
> >+
> >+if ( !msix || msix->addr == INVALID_PADDR )
> >+continue;
> >+
> >+rc = vpci_unmap_msix(d, msix);
> 
> Why do you need this, instead of being able to simply rely on the rangeset
> based (un)mapping?

This is because the series that I've sent called: "x86/pvh: implement
iommu_inclusive_mapping for PVH Dom0" will map the MSI-X memory areas
into the guest, and thus we need to make sure they are not mapped
here for the emulation path to work.

https://lists.xenproject.org/archives/html/xen-devel/2017-04/msg02849.html

> >@@ -405,7 +475,20 @@ static int vpci_init_bars(struct pci_dev *pdev)
> >continue;
> >}
>  >
> >-bars[i].addr = (cmd & PCI_COMMAND_MEMORY) ? addr : INVALID_PADDR;
> >+if ( cmd & PCI_COMMAND_MEMORY )
> >+{
> >+unsigned int j;
> >+
> >+bars[i].addr = addr;
> >+
> >+for ( j = 0; j < ARRAY_SIZE(bars[i].msix); j++ )
> >+if ( bars[i].msix[j] )
> >+bars[i].msix[j]->addr = bars[i].addr +
> >+bars[i].msix[j]->offset;
> >+}
> >+else
> >+bars[i].addr = INVALID_PADDR;
> 
> As (I think) mentioned elsewhere already, this init-time special case looks
> dangerous (and unnecessary) to me (or else I'd expect you to also zap
> the field when the memory decode bit is being cleared).

OK, so I'm simply going to set this to addr + offset, regardless of
whether the BAR has memory decoding enabled of not. If the BAR is not
yet positioned Dom0 will have to position it anyway before enabling
memory decoding.

> >+for ( i = 0; i < msix->max_entries; i++ )
> >+{
> >+if ( msix->entries[i].masked )
> >+continue;
> 
> Why is the mask bit relevant here, but not the mask-all one?

Not taking the mask-all into account here is wrong, since setting
mask-all from 1 to 0 should force a recalculation of all the entries
address and data fields. I will fix this in the next version.

> >+rc = vpci_msix_arch_enable(>entries[i].arch, pdev,
> >+   msix->entries[i].addr,
> >+   msix->entries[i].data,
> >+   msix->entries[i].nr, table_base);
> >+if ( rc )
> >+{
> >+gdprintk(XENLOG_ERR,
> <+ "%04x:%02x:%02x.%u: unable to update entry %u: 
> %d\n",
> >+ seg, bus, slot, func, i, rc);
> >+return;
> >+}
> >+
> >+vpci_msix_arch_mask(>entries[i].arch, pdev, false);
> 
> Same question here.

This is needed because after a vpci_msix_arch_enable the pirq is still
masked, and hence needs to be unmasked to match the guest's view.

> >+}
> >+}
> >+else if ( msix->enabled && !new_enabled )
> >+{
> >+/* MSI-X disabled. */
> >+for ( i = 0; i < msix->max_entries; i++ )
> >+{
> >+rc = vpci_msix_arch_disable(>entries[i].arch, pdev);
> >+if ( rc )
> >+{
> >+gdprintk(XENLOG_ERR,
> >+ "%04x:%02x:%02x.%u: unable to disable entry %u: 
> >%d\n",
> >+ seg, bus, slot, func, i, rc);
> >+return;
> >+}
> >+}
> >+}
> >+
> >+if ( (new_enabled != msix->enabled || new_masked != msix->masked) &&
> >+ pci_msi_conf_write_intercept(pdev, reg, 2, ) >= 0 )
> >+pci_conf_write16(seg, bus, slot, func, reg, val.u32);
> 
> DYM val.u16 here?

Now this is simply val, since the union has been removed.

> >+static struct vpci_msix *vpci_msix_find(struct domain *d, unsigned long 
> >addr)
> >+{
> >+struct vpci_msix *msix;
> >+
> >+ASSERT(vpci_locked(d));
> >+list_for_each_entry ( msix,  >arch.hvm_domain.msix_tables, next )
> >+{
> >+uint8_t seg = msix->pdev->seg, bus = msix->pdev->bus;
> >+uint8_t slot = PCI_SLOT(msix->pdev->devfn);
> >+uint8_t func = PCI_FUNC(msix->pdev->devfn);
> >+uint16_t cmd = pci_conf_read16(seg, bus, slot, 

Re: [Xen-devel] [PATCH v4 9/9] vpci/msix: add MSI-X handlers

2017-08-02 Thread Jan Beulich
>>> Roger Pau Monne  06/30/17 5:01 PM >>>
>Note that accesses to the Table Offset, Table BIR, PBA Offset and PBA
>BIR are not trapped by Xen at the moment.

They're mandated r/o by the spec anyway.

>@@ -113,6 +148,35 @@ static int vpci_modify_bar(struct domain *d, const struct 
>vpci_bar *bar,
>if ( IS_ERR(mem) )
>return -PTR_ERR(mem);
 >
>+/*
>+ * Make sure the MSI-X regions of the BAR are not mapped into the domain
>+ * p2m, or else the MSI-X handlers are useless. Only do this when mapping,
>+ * since that's when the memory decoding on the device is enabled.
>+ */
>+for ( i = 0; map && i < ARRAY_SIZE(bar->msix); i++ )
>+{
>+struct vpci_msix_mem *msix = bar->msix[i];
>+
>+if ( !msix || msix->addr == INVALID_PADDR )
>+continue;
>+
>+rc = vpci_unmap_msix(d, msix);

Why do you need this, instead of being able to simply rely on the rangeset
based (un)mapping?

>@@ -405,7 +475,20 @@ static int vpci_init_bars(struct pci_dev *pdev)
>continue;
>}
 >
>-bars[i].addr = (cmd & PCI_COMMAND_MEMORY) ? addr : INVALID_PADDR;
>+if ( cmd & PCI_COMMAND_MEMORY )
>+{
>+unsigned int j;
>+
>+bars[i].addr = addr;
>+
>+for ( j = 0; j < ARRAY_SIZE(bars[i].msix); j++ )
>+if ( bars[i].msix[j] )
>+bars[i].msix[j]->addr = bars[i].addr +
>+bars[i].msix[j]->offset;
>+}
>+else
>+bars[i].addr = INVALID_PADDR;

As (I think) mentioned elsewhere already, this init-time special case looks
dangerous (and unnecessary) to me (or else I'd expect you to also zap
the field when the memory decode bit is being cleared).

>--- /dev/null
>+++ b/xen/drivers/vpci/msix.c
>@@ -0,0 +1,503 @@
>+/*
>+ * Handlers for accesses to the MSI-X capability structure and the memory
>+ * region.
>+ *
>+ * Copyright (C) 2017 Citrix Systems R
>+ *
>+ * This program is free software; you can redistribute it and/or
>+ * modify it under the terms and conditions of the GNU General Public
>+ * License, version 2, as published by the Free Software Foundation.
>+ *
>+ * This program is distributed in the hope that it will be useful,
>+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
>+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>+ * General Public License for more details.
>+ *
>+ * You should have received a copy of the GNU General Public
>+ * License along with this program; If not, see 
>.
>+ */
>+
>+#include 
>+#include 
>+#include 
>+#include 
>+#include 
>+
>+#define MSIX_SIZE(num) (offsetof(struct vpci_msix, entries[num]))

The outermost parens are pointless here.

>+static void vpci_msix_control_write(struct pci_dev *pdev, unsigned int reg,
>+union vpci_val val, void *data)
>+{
>+uint8_t seg = pdev->seg, bus = pdev->bus;
>+uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
>+struct vpci_msix *msix = data;
>+paddr_t table_base = pdev->vpci->header.bars[msix->table.bir].addr;
>+bool new_masked, new_enabled;
>+unsigned int i;
>+int rc;
>+
>+new_masked = val.u16 & PCI_MSIX_FLAGS_MASKALL;
>+new_enabled = val.u16 & PCI_MSIX_FLAGS_ENABLE;
>+
>+if ( !msix->enabled && new_enabled )
>+{
>+/* MSI-X enabled. */

Insert "being"?

>+for ( i = 0; i < msix->max_entries; i++ )
>+{
>+if ( msix->entries[i].masked )
>+continue;

Why is the mask bit relevant here, but not the mask-all one?

>+rc = vpci_msix_arch_enable(>entries[i].arch, pdev,
>+   msix->entries[i].addr,
>+   msix->entries[i].data,
>+   msix->entries[i].nr, table_base);
>+if ( rc )
>+{
>+gdprintk(XENLOG_ERR,
<+ "%04x:%02x:%02x.%u: unable to update entry %u: %d\n",
>+ seg, bus, slot, func, i, rc);
>+return;
>+}
>+
>+vpci_msix_arch_mask(>entries[i].arch, pdev, false);

Same question here.

>+}
>+}
>+else if ( msix->enabled && !new_enabled )
>+{
>+/* MSI-X disabled. */
>+for ( i = 0; i < msix->max_entries; i++ )
>+{
>+rc = vpci_msix_arch_disable(>entries[i].arch, pdev);
>+if ( rc )
>+{
>+gdprintk(XENLOG_ERR,
>+ "%04x:%02x:%02x.%u: unable to disable entry %u: 
>%d\n",
>+ seg, bus, slot, func, i, rc);
>+return;
>+}
>+}
>+}
>+
>+if ( (new_enabled != msix->enabled || new_masked != msix->masked) &&
>+ pci_msi_conf_write_intercept(pdev, reg, 2, ) >= 0 )
>+pci_conf_write16(seg, bus, slot, func, reg, val.u32);

DYM val.u16 

[Xen-devel] [PATCH v4 9/9] vpci/msix: add MSI-X handlers

2017-06-30 Thread Roger Pau Monne
Add handlers for accesses to the MSI-X message control field on the
PCI configuration space, and traps for accesses to the memory region
that contains the MSI-X table and PBA. This traps detect attempts from
the guest to configure MSI-X interrupts and properly sets them up.

Note that accesses to the Table Offset, Table BIR, PBA Offset and PBA
BIR are not trapped by Xen at the moment.

Finally, turn the panic in the Dom0 PVH builder into a warning.

Signed-off-by: Roger Pau Monné 
---
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
Changes since v3:
 - Propagate changes from previous versions: remove xen_ prefix, use
   the new fields in vpci_val and remove the return value from
   handlers.
 - Remove the usage of GENMASK.
 - Mave the arch-specific parts of the dump routine to the
   x86/hvm/vmsi.c dump handler.
 - Chain the MSI-X dump handler to the 'M' debug key.
 - Fix the header BAR mappings so that the MSI-X regions inside of
   BARs are unmapped from the domain p2m in order for the handlers to
   work properly.
 - Unconditionally trap and forward accesses to the PBA MSI-X area.
 - Simplify the conditionals in vpci_msix_control_write.
 - Fix vpci_msix_accept to use a bool type.
 - Allow all supported accesses as described in the spec to the MSI-X
   table.
 - Truncate the returned address when the access is a 32b read.
 - Always return X86EMUL_OKAY from the handlers, returning ~0 in the
   read case if the access is not supported, or ignoring writes.
 - Do not check that max_entries is != 0 in the init handler.
 - Use trylock in the dump handler.

Changes since v2:
 - Split out arch-specific code.

This patch has been tested with devices using both a single MSI-X
entry and multiple ones.
---
 xen/arch/x86/hvm/dom0_build.c|   2 +-
 xen/arch/x86/hvm/hvm.c   |   1 +
 xen/arch/x86/hvm/vmsi.c  | 128 +-
 xen/arch/x86/msi.c   |   1 +
 xen/drivers/vpci/Makefile|   2 +-
 xen/drivers/vpci/header.c|  85 ++-
 xen/drivers/vpci/msix.c  | 503 +++
 xen/include/asm-x86/hvm/domain.h |   3 +
 xen/include/asm-x86/hvm/io.h |  18 ++
 xen/include/xen/vpci.h   |  39 +++
 10 files changed, 774 insertions(+), 8 deletions(-)
 create mode 100644 xen/drivers/vpci/msix.c

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 6b9f76ec36..c060eb85eb 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1091,7 +1091,7 @@ int __init dom0_construct_pvh(struct domain *d, const 
module_t *image,
 return rc;
 }
 
-panic("Building a PVHv2 Dom0 is not yet supported.");
+printk("WARNING: PVH is an experimental mode with limited 
functionality\n");
 return 0;
 }
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index f45e2bd23d..9277e84150 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -585,6 +585,7 @@ int hvm_domain_initialise(struct domain *d, unsigned long 
domcr_flags,
 INIT_LIST_HEAD(>arch.hvm_domain.write_map.list);
 INIT_LIST_HEAD(>arch.hvm_domain.g2m_ioport_list);
 INIT_LIST_HEAD(>arch.hvm_domain.mmcfg_regions);
+INIT_LIST_HEAD(>arch.hvm_domain.msix_tables);
 
 rc = create_perdomain_mapping(d, PERDOMAIN_VIRT_START, 0, NULL, NULL);
 if ( rc )
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 5732c70b5c..f1c72f23d9 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -643,17 +643,15 @@ static unsigned int msi_flags(uint16_t data, uint64_t 
addr)
(trig_mode << GFLAGS_SHIFT_TRG_MODE);
 }
 
-void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,
-unsigned int entry, bool mask)
+static void vpci_mask_pirq(struct domain *d, int pirq, bool mask)
 {
-struct domain *d = pdev->domain;
 const struct pirq *pinfo;
 struct irq_desc *desc;
 unsigned long flags;
 int irq;
 
-ASSERT(arch->pirq >= 0);
-pinfo = pirq_info(d, arch->pirq + entry);
+ASSERT(pirq >= 0);
+pinfo = pirq_info(d, pirq);
 ASSERT(pinfo);
 
 irq = pinfo->arch.irq;
@@ -667,6 +665,12 @@ void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct 
pci_dev *pdev,
 spin_unlock_irqrestore(>lock, flags);
 }
 
+void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+unsigned int entry, bool mask)
+{
+vpci_mask_pirq(pdev->domain, arch->pirq + entry, mask);
+}
+
 int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
  uint64_t address, uint32_t data, unsigned int vectors)
 {
@@ -771,3 +775,117 @@ void vpci_msi_arch_print(struct vpci_arch_msi *arch, 
uint16_t data,
MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK),
arch->pirq);
 }
+
+void vpci_msix_arch_mask(struct vpci_arch_msix_entry *arch,
+ struct pci_dev *pdev, bool