On Fri, May 17, 2024 at 01:06:12PM -0400, Stewart Hildebrand wrote: > From: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> > > Xen and/or Dom0 may have put values in PCI_COMMAND which they expect > to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the > guest's (domU) view of this will want to be zero (for now), the host > having set it to 1 should be preserved, or else we'd effectively be > giving the domU control of the bit. Thus, PCI_COMMAND register needs > proper emulation in order to honor host's settings. > > According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2 > Device Control" the reset state of the command register is typically 0, > so when assigning a PCI device use 0 as the initial state for the > guest's (domU) view of the command register. > > Here is the full list of command register bits with notes about > PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is > also documented here since that is our current reference implementation > for PCI passthrough. > > PCI_COMMAND_IO (bit 0) > PCIe 6.1: RW > PCI LB 3.0: RW > QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest > writes do not propagate to hardware. QEMU sets this bit to 1 in > hardware if an I/O BAR is exposed to the guest. > Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we > don't yet support I/O BARs for domUs. > Xen dom0: We allow dom0 to control this bit freely. > > PCI_COMMAND_MEMORY (bit 1) > PCIe 6.1: RW > PCI LB 3.0: RW > QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest > writes do not propagate to hardware. QEMU sets this bit to 1 in > hardware if a Memory BAR is exposed to the guest. > Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR > regions. > Xen domU: For devices assigned to DomUs, memory decoding will be > disabled at the time of initialization. > > PCI_COMMAND_MASTER (bit 2) > PCIe 6.1: RW > PCI LB 3.0: RW > QEMU: Pass through writes to hardware. > Xen domU/dom0: Pass through writes to hardware. > > PCI_COMMAND_SPECIAL (bit 3) > PCIe 6.1: RO, hardwire to 0 > PCI LB 3.0: RW > QEMU: Pass through writes to hardware. > Xen domU/dom0: Pass through writes to hardware. > > PCI_COMMAND_INVALIDATE (bit 4) > PCIe 6.1: RO, hardwire to 0 > PCI LB 3.0: RW > QEMU: Pass through writes to hardware. > Xen domU/dom0: Pass through writes to hardware. > > PCI_COMMAND_VGA_PALETTE (bit 5) > PCIe 6.1: RO, hardwire to 0 > PCI LB 3.0: RW > QEMU: Pass through writes to hardware. > Xen domU/dom0: Pass through writes to hardware. > > PCI_COMMAND_PARITY (bit 6) > PCIe 6.1: RW > PCI LB 3.0: RW > QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest > writes do not propagate to hardware. > Xen domU: (rsvdp_mask) We treat this bit as RsvdP. > Xen dom0: We allow dom0 to control this bit freely. > > PCI_COMMAND_WAIT (bit 7) > PCIe 6.1: RO, hardwire to 0 > PCI LB 3.0: hardwire to 0 > QEMU: res_mask > Xen domU: (rsvdp_mask) We treat this bit as RsvdP. > Xen dom0: We allow dom0 to control this bit freely. > > PCI_COMMAND_SERR (bit 8) > PCIe 6.1: RW > PCI LB 3.0: RW > QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest > writes do not propagate to hardware. > Xen domU: (rsvdp_mask) We treat this bit as RsvdP. > Xen dom0: We allow dom0 to control this bit freely. > > PCI_COMMAND_FAST_BACK (bit 9) > PCIe 6.1: RO, hardwire to 0 > PCI LB 3.0: RW > QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest > writes do not propagate to hardware. > Xen domU: (rsvdp_mask) We treat this bit as RsvdP. > Xen dom0: We allow dom0 to control this bit freely. > > PCI_COMMAND_INTX_DISABLE (bit 10) > PCIe 6.1: RW > PCI LB 3.0: RW > QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest > writes do not propagate to hardware. QEMU checks if INTx was mapped > for a device. If it is not, then guest can't control > PCI_COMMAND_INTX_DISABLE bit. > Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled. > Xen dom0: We allow dom0 to control this bit freely. > > Bits 11-15 > PCIe 6.1: RsvdP > PCI LB 3.0: Reserved > QEMU: res_mask > Xen domU/dom0: rsvdp_mask > > Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushche...@epam.com> > Signed-off-by: Volodymyr Babchuk <volodymyr_babc...@epam.com> > Signed-off-by: Stewart Hildebrand <stewart.hildebr...@amd.com> > Reviewed-by: Jan Beulich <jbeul...@suse.com> > --- > RFC: There is an unaddressed question for Roger: should we update the > guest view of the PCI_COMMAND_INTX_DISABLE bit in > msi.c/msix.c:control_write()? See prior discussion at [1]. > In my opinion, I think we should make sure that hardware state and > the guest view are consistent (i.e. don't lie to the guest). > > [1] > https://lore.kernel.org/xen-devel/86b25777-788c-4b9a-8166-a6f8174be...@suse.com/
I think updating the guest view is helpful in case we need to debug issues in the guest. > > In v15: > - add Jan's R-b > - add blank line after declaration in msi.c:control_write() > > In v14: > - check for 0->1 transition in INTX_DISABLE-setting logic in > msi.c:control_write() to match msix.c:control_write() > - clear domU-controllable bits in header.c:init_header() > > In v13: > - Update right away (don't defer) PCI_COMMAND_MEMORY bit in guest_cmd > variable in cmd_write() > - Make comment single line in xen/drivers/vpci/msi.c:control_write() > - Rearrange memory decoding disabling snippet in init_header() > > In v12: > - Rework patch using vpci_add_register_mask() > - Add bitmask #define in pci_regs.h according to PCIe 6.1 spec, except > don't add the RO bits because they were RW in PCI LB 3.0 spec. > - Move and expand TODO comment about properly emulating bits > - Update guest_cmd in msi.c/msix.c:control_write() > - Simplify cmd_write(), thanks to rsvdp_mask > - Update commit description > > In v11: > - Fix copy-paste mistake: vpci->msi should be vpci->msix > - Handle PCI_COMMAND_IO > - Fix condition for disabling INTx in the MSI-X code > - Show domU changes to only allowed bits > - Show PCI_COMMAND_MEMORY write only after P2M was altered > - Update comments in the code > In v10: > - Added cf_check attribute to guest_cmd_read > - Removed warning about non-zero cmd > - Updated comment MSI code regarding disabling INTX > - Used ternary operator in vpci_add_register() call > - Disable memory decoding for DomUs in init_bars() > In v9: > - Reworked guest_cmd_read > - Added handling for more bits > Since v6: > - fold guest's logic into cmd_write > - implement cmd_read, so we can report emulated INTx state to guests > - introduce header->guest_cmd to hold the emulated state of the > PCI_COMMAND register for guests > Since v5: > - add additional check for MSI-X enabled while altering INTX bit > - make sure INTx disabled while guests enable MSI/MSI-X > Since v3: > - gate more code on CONFIG_HAS_MSI > - removed logic for the case when MSI/MSI-X not enabled > --- > xen/drivers/vpci/header.c | 60 ++++++++++++++++++++++++++++++++++---- > xen/drivers/vpci/msi.c | 9 ++++++ > xen/drivers/vpci/msix.c | 7 +++++ > xen/include/xen/pci_regs.h | 1 + > xen/include/xen/vpci.h | 3 ++ > 5 files changed, 74 insertions(+), 6 deletions(-) > > diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c > index 47648c395132..2491dbae8901 100644 > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -524,9 +524,21 @@ static void cf_check cmd_write( > { > struct vpci_header *header = data; > > + if ( !is_hardware_domain(pdev->domain) ) > + { > + const struct vpci *vpci = pdev->vpci; > + > + if ( (vpci->msi && vpci->msi->enabled) || > + (vpci->msix && vpci->msix->enabled) ) > + cmd |= PCI_COMMAND_INTX_DISABLE; > + > + header->guest_cmd = cmd; > + } > + > /* > * Let Dom0 play with all the bits directly except for the memory > - * decoding one. > + * decoding one. Bits that are not allowed for DomU are already > + * handled above and by the rsvdp_mask. > */ > if ( header->bars_mapped != !!(cmd & PCI_COMMAND_MEMORY) ) > /* > @@ -540,6 +552,14 @@ static void cf_check cmd_write( > pci_conf_write16(pdev->sbdf, reg, cmd); > } > > +static uint32_t cf_check guest_cmd_read( > + const struct pci_dev *pdev, unsigned int reg, void *data) > +{ > + const struct vpci_header *header = data; > + > + return header->guest_cmd; > +} > + > static void cf_check bar_write( > const struct pci_dev *pdev, unsigned int reg, uint32_t val, void *data) > { > @@ -754,9 +774,23 @@ static int cf_check init_header(struct pci_dev *pdev) > return -EOPNOTSUPP; > } > > - /* Setup a handler for the command register. */ > - rc = vpci_add_register(pdev->vpci, vpci_hw_read16, cmd_write, > PCI_COMMAND, > - 2, header); > + /* > + * Setup a handler for the command register. > + * > + * TODO: If support for emulated bits is added, re-visit how to handle > + * PCI_COMMAND_PARITY, PCI_COMMAND_SERR, and PCI_COMMAND_FAST_BACK. > + */ > + rc = vpci_add_register_mask(pdev->vpci, > + is_hwdom ? vpci_hw_read16 : guest_cmd_read, > + cmd_write, PCI_COMMAND, 2, header, 0, 0, > + PCI_COMMAND_RSVDP_MASK | > + (is_hwdom ? 0 > + : PCI_COMMAND_IO | > + PCI_COMMAND_PARITY | > + PCI_COMMAND_WAIT | > + PCI_COMMAND_SERR | > + PCI_COMMAND_FAST_BACK), We want to allow full access to the hw domain and only apply the PCI_COMMAND_RSVDP_MASK when !is_hwdom in order to keep the current behavior for dom0. I don't think it makes a difference in practice, but we are very lax in explicitly not applying any of such restrictions to dom0. With that fixed: Reviewed-by: Roger Pau Monné <roger....@citrix.com> Thanks, Roger.