On Tue, Nov 28, 2023 at 11:45:34PM +0000, Volodymyr Babchuk wrote: > Hi Roger, > > Roger Pau Monné <roger....@citrix.com> writes: > > > On Wed, Nov 22, 2023 at 01:18:32PM -0800, Stefano Stabellini wrote: > >> On Wed, 22 Nov 2023, Roger Pau Monné wrote: > >> > On Tue, Nov 21, 2023 at 05:12:15PM -0800, Stefano Stabellini wrote: > >> > > Let me expand on this. Like I wrote above, I think it is important that > >> > > Xen vPCI is the only in-use PCI Root Complex emulator. If it makes the > >> > > QEMU implementation easier, it is OK if QEMU emulates an unneeded and > >> > > unused PCI Root Complex. From Xen point of view, it doesn't exist. > >> > > > >> > > In terms if ioreq registration, QEMU calls > >> > > xendevicemodel_map_pcidev_to_ioreq_server for each PCI BDF it wants to > >> > > emulate. That way, Xen vPCI knows exactly what PCI config space > >> > > reads/writes to forward to QEMU. > >> > > > >> > > Let's say that: > >> > > - 00:02.0 is PCI passthrough device > >> > > - 00:03.0 is a PCI emulated device > >> > > > >> > > QEMU would register 00:03.0 and vPCI would know to forward anything > >> > > related to 00:03.0 to QEMU, but not 00:02.0. > >> > > >> > I think there's some work here so that we have a proper hierarchy > >> > inside of Xen. Right now both ioreq and vpci expect to decode the > >> > accesses to the PCI config space, and setup (MM)IO handlers to trap > >> > ECAM, see vpci_ecam_{read,write}(). > >> > > >> > I think we want to move to a model where vPCI doesn't setup MMIO traps > >> > itself, and instead relies on ioreq to do the decoding and forwarding > >> > of accesses. We need some work in order to represent an internal > >> > ioreq handler, but that shouldn't be too complicated. IOW: vpci > >> > should register devices it's handling with ioreq, much like QEMU does. > >> > >> I think this could be a good idea. > >> > >> This would be the very first IOREQ handler implemented in Xen itself, > >> rather than outside of Xen. Some code refactoring might be required, > >> which worries me given that vPCI is at v10 and has been pending for > >> years. I think it could make sense as a follow-up series, not v11. > > > > That's perfectly fine for me, most of the series here just deal with > > the logic to intercept guest access to the config space and is > > completely agnostic as to how the accesses are intercepted. > > > >> I think this idea would be beneficial if, in the example above, vPCI > >> doesn't really need to know about device 00:03.0. vPCI registers via > >> IOREQ the PCI Root Complex and device 00:02.0 only, QEMU registers > >> 00:03.0, and everything works. vPCI is not involved at all in PCI config > >> space reads and writes for 00:03.0. If this is the case, then moving > >> vPCI to IOREQ could be good. > > > > Given your description above, with the root complex implemented in > > vPCI, we would need to mandate vPCI together with ioreqs even if no > > passthrough devices are using vPCI itself (just for the emulation of > > the root complex). Which is fine, just wanted to mention the > > dependency. > > > >> On the other hand if vPCI actually needs to know that 00:03.0 exists, > >> perhaps because it changes something in the PCI Root Complex emulation > >> or vPCI needs to take some action when PCI config space registers of > >> 00:03.0 are written to, then I think this model doesn't work well. If > >> this is the case, then I think it would be best to keep vPCI as MMIO > >> handler and let vPCI forward to IOREQ when appropriate. > > > > At first approximation I don't think we would have such interactions, > > otherwise the whole premise of ioreq being able to register individual > > PCI devices would be broken. > > > > XenSever already has scenarios with two different user-space emulators > > (ie: two different ioreq servers) handling accesses to different > > devices in the same PCI bus, and there's no interaction with the root > > complex required. > > > > Out of curiosity: how legacy PCI interrupts are handled in this case? In > my understanding, it is Root Complex's responsibility to propagate > correct IRQ levels to an interrupt controller?
I'm unsure whether my understanding of the question is correct, so my reply might not be what you are asking for, sorry. Legacy IRQs (GSI on x86) are setup directly by the toolstack when the device is assigned to the guest, using PHYSDEVOP_map_pirq + XEN_DOMCTL_bind_pt_irq. Those hypercalls bind together a host IO-APIC pin to a guest IO-APIC pin, so that interrupts originating from that host IO-APIC pin are always forwarded to the guest an injected as originating from the guest IO-APIC pin. Note that the device will always use the same IO-APIC pin, this is not configured by the OS. Thanks, Roger.