On 12/5/23 06:08, Roger Pau Monné wrote:
> On Mon, Dec 04, 2023 at 02:07:51PM -0800, Stefano Stabellini wrote:
>> On Mon, 4 Dec 2023, Roger Pau Monné wrote:
>>> On Fri, Dec 01, 2023 at 06:56:32PM -0800, Stefano Stabellini wrote:
>>>> On Fri, 1 Dec 2023, Roger Pau Monné wrote:
>>>>> On Mon, Nov 13, 2023 at 05:21:13PM -0500, Stewart Hildebrand wrote:
>>>>>> @@ -1618,6 +1630,14 @@ int iommu_do_pci_domctl(
>>>>>>          bus = PCI_BUS(machine_sbdf);
>>>>>>          devfn = PCI_DEVFN(machine_sbdf);
>>>>>>  
>>>>>> +        if ( needs_vpci(d) && !has_vpci(d) )
>>>>>> +        {
>>>>>> +            printk(XENLOG_G_WARNING "Cannot assign %pp to %pd: vPCI 
>>>>>> support not enabled\n",
>>>>>> +                   &PCI_SBDF(seg, bus, devfn), d);
>>>>>> +            ret = -EPERM;
>>>>>> +            break;
>>>>>
>>>>> I think this is likely too restrictive going forward.  The current
>>>>> approach is indeed to enable vPCI on a per-domain basis because that's
>>>>> how PVH dom0 uses it, due to being unable to use ioreq servers.
>>>>>
>>>>> If we start to expose vPCI suport to guests the interface should be on
>>>>> a per-device basis, so that vPCI could be enabled for some devices,
>>>>> while others could still be handled by ioreq servers.
>>>>>
>>>>> We might want to add a new flag to xen_domctl_assign_device (used by
>>>>> XEN_DOMCTL_assign_device) in order to signal whether the device will
>>>>> use vPCI.
>>>>
>>>> Actually I don't think this is a good idea. I am all for flexibility but
>>>> supporting multiple different configurations comes at an extra cost for
>>>> both maintainers and contributors. I think we should try to reduce the
>>>> amount of configurations we support rather than increasing them
>>>> (especially on x86 where we have PV, PVH, HVM).
>>>
>>> I think it's perfectly fine to initially require a domain to have all
>>> its devices either passed through using vPCI or ireqs, but the
>>> interface IMO should allow for such differentiation in the future.
>>> That's why I think introducing a domain wide vPCI flag might not be
>>> the best option going forward.
>>>
>>> It would be perfectly fine for XEN_DOMCTL_assign_device to set a
>>> domain wide vPCI flag, iow:
>>>
>>> if ( HYPERCALL_VPCI_FLAG_SET && !has_vpci(d) )
>>> {
>>>     if ( has_arch_pdevs(d) )
>>>     {
>>>         printk("All passthrough devices must use the same backend\n");
>>>         return -EINVAL;
>>>     }
>>>
>>>     /* Set vPCI domain flag */
>>> }
>>
>> That would be fine by me, but maybe we can avoid this change too. I was
>> imagining that vPCI would be enabled at domain creation, not at runtime.
>> And that vPCI would be enabled by default for all PVH guests (once we
>> are past the initial experimental phase.)
> 
> Then we don't even need a new CDF flag, and just enable vPCI when
> IOMMU is enabled?  IOW: we can key the enabling of vPCI to
> XEN_DOMCTL_CDF_iommu for specific domain types?

There are many Arm based platforms that need to use iommu but don't have (or 
don't use) PCI, so we'd still like to have a separate vPCI flag.

> 
> Maybe that's not so trivial on x86, as there's no x86 PVH domain type
> from the hypervisor PoV.
> 
>>
>>> We have already agreed that we want to aim for a setup where ioreqs
>>> and vPCI could be used for the same domain, but I guess you assumed
>>> that ioreqs would be used for emulated devices exclusively and vPCI
>>> for passthrough devices?
>>
>> Yes, that's right
>>
>>
>>> Overall if we agree that ioreqs and vPCI should co-exist for a domain,
>>> I'm not sure there's much reason to limit ioreqs to only handle
>>> emulated devices, seems like an arbitrary limitation.
>>
>> Reply below
>>
>>
>>>> I don't think we should enable IOREQ servers to handle PCI passthrough
>>>> for PVH guests and/or guests with vPCI. If the domain has vPCI, PCI
>>>> Passthrough can be handled by vPCI just fine. I think this should be a
>>>> good anti-feature to have (a goal to explicitly not add this feature) to
>>>> reduce complexity. Unless you see a specific usecase to add support for
>>>> it?
>>>
>>> There are passthrough devices (GPUs) that might require some extra
>>> mediation on dom0 (like the Intel GVT-g thing) that would force the
>>> usage of ioreqs to passthrough.
>>
>> From an architectural perspective, I think it would be cleaner, simpler
>> to maintain, and simpler to understand if Xen was the sole owner of the
>> PCI Root Complex and PCI config space mediation (implemented with vPCI).
>> IOREQ can be used for emulation and it works very well for that. At
>> least in my mind, that makes things much simpler.
> 
> But IOREQ already has all the code to mediate accesses to the PCI
> config space, and the interface to register separate servers for
> different PCI devices.
> 
> We would then need to duplicate this internally for vPCI, so that vPCI
> could forward accesses to IOREQ just for IOREQ to forward to yet a
> different component?  Seems like a lot of duplication for no benefit.
> 
>> I understand there are non-trivial cases, like virtual GPUs with
>> hardware access, but I don't classify those as passthrough. That's
>> because there isn't one device that gets fully assigned to the guest.
>> Instead, there is an emulated device (hence IOREQ) with certain MMIO
>> regions and interrupts that end up being directly mapped from real
>> hardware.
>>
>> So I think it is natural in those cases to use IOREQ and it is also
>> natural to have QEMU remap MMIO/IRQs at runtime. From a vPCI
>> perspective, I hope it will mostly look as if the device is assigned to
>> Dom0. Even if it ends up being more complex than that, Rome wasn't
>> built in one day, and I don't think we should try to solve this problem
>> on day1 (as long as the interfaces are not stable interfaces).
> 
> I don't see IOREQ as dealing explicitly with emulation.  Yes, it does
> allow for emulators to be implemented in user-space, but at the end
> it's just an interface that allows forwarding accesses to certain
> resources (for the case we are speaking about here, PCI config space)
> to entities that registered as handlers.
> 
> vPCI OTOH just deals with a very specific resource (PCI config space)
> and only allows internal handlers to be registered on a byte
> granularity.
> 
> So your proposal would be to implement a hierarchy like the one on the
> diagram below:
> 
>     ┌────────┐ ┌──────────┐ ┌──────────────────┐
>     │ Memory │ │ IO Ports │ │ PCI config space │
>     └───────┬┘ └┬─────────┘ └───┬──────────────┘
>             │   │               │
>             │   │           ┌───┴──┐
>             │   │           │ vPCI │
>             │   │           └─┬──┬─┘
>          ┌──┴───┴┐            │  │
>          │ IOREQ ├────────────┘  │
>          └────┬──┘               │
>               │                  │
>  ┌────────────┴──┐              ┌┴──────────────┐
>  │ IOREQ servers │              │ vPCI handlers │
>  └───────────────┘              └───────────────┘
> 
> While what I'm proposing would look like:
> 
>     ┌────────┐ ┌──────────┐ ┌──────────────────┐
>     │ Memory │ │ IO Ports │ │ PCI config space │
>     └────┬───┘ └────┬─────┘ └────────┬─────────┘
>          │          │                │
>          └─────┬────┴────┬───────────┘
>                │  IOREQ  │
>                └─┬─────┬─┘
>                  │     │
>  ┌───────────────┤     └────┬──────┐
>  │ IOREQ servers │          │ vPCI │
>  └───────────────┘          └───┬──┘
>                                 │
>                             ┌───┴───────────┐
>                             │ vPCI handlers │
>                             └───────────────┘
> 
> I'm obviously biased, but I think the latter is cleaner, and allows
> all resources to be arbitrated by the same component (IOREQ).
> 
> If the concern is about the IOREQ hypercall interface, it would be
> fine to introduce an option that limit IOREQs to internal users
> (vPCI) without supporting external IOREQ servers.
> 
> Think of IOREQ as a resource mediator inside of Xen, that just does
> the PCI address decoding and forwards the access to the interested
> party, either an external IOREQ server or vPCI.
> 
>>
>>> It's important that the interfaces we introduce are correct IMO,
>>> because that ends up reflecting on the configuration options that we
>>> expose to users on xl/libxl.  While both XEN_DOMCTL_createdomain and
>>> XEN_DOMCTL_assign_device are unstable interfaces, how the vPCI option
>>> gets placed there will ultimately influence how the option gets
>>> exposed in xl/libxl, and the interface there is relevant to keep
>>> stable for end user sanity.
>>
>> I agree with you on the stable interfaces. The important part is not to
>> introduce changes to stable interfaces that could limit us in the
>> future. Specifically that includes xl and libxl, we need to be careful
>> there. But I don't see a single per-domain vPCI enable/disable option as
>> a problem. Let's say that in the future we have a mediated vGPU
>> implementation: if it works together with vPCI then the per-domain vPCI
>> option in libxl will be enabled (either explicitely or by default), if
>> it doesn't then vPCI will be disabled (either explicitely or by the
>> newer vGPU options.)
> 
> If vPCI is hooked into IOREQ there won't be a need anymore to register
> the vPCI config space traps, as that would be done by IOREQ, and hence
> vPCI managed devices could be registered at runtime with IOREQ.  IOW:
> there won't be a need anymore to signal at domain creation whether
> vPCI is intended to be used or not.
> 
> We would obviously need to enable IOREQ for all domains with IOMMU
> enabled, as it would be IOREQ that register the PCI config space
> handlers.
> 
>> For *unstable* interfaces (XEN_DOMCTL_assign_device) I would rather wait
>> before adding more changes on top of them, not because I don't care
>> about the mediated GPU problem (we do have something similar at AMD),
>> but because I worry that if we try to change them now we might not do a
>> good enough job. I would prefer to wait until we know more about the
>> actual use case, ideally with code supporting it.
>>
>> I think the difference in points of views comes from the fact that I see
>> vPCI as the default, QEMU only as a limited peripheral emulator (or
>> mediator for the vGPU case) but not in control. vPCI and QEMU are not
>> equal in my view. vPCI is in charge and always present if not in very
>> uncommon setups (even if we decide to hook it inside Xen by using
>> internal IOREQ interfaces). QEMU might come and go.
> 
> Xen needs a single component that mediates accesses to resources,
> whether that's IOREQ, or something else I don't really care that much.
> Having vPCI mediate accesses to the PCI config space, and IOREQ to the
> memory (and on x86 IO port) space just seems awfully complicated for
> AFAICT no real benefit.
> 
> Also, you seem to confabulate IOREQ with QEMU, while the latter is
> indeed an user of IOREQ, I do see IOREQ as a simple resource mediator
> inside of Xen, that has the ability to forward such accesses to
> external emulators using an hypercall interface.
> 
>> Now that I am writing this, I realize this is also why I wasn't too
>> happy with the idea of hooking vPCI using IOREQ. It makes them look as
>> if they are the same, while I don't they should be considered at the
>> same level of priority, criticality, safety, integration in the system,
>> etc.
> 
> I feel there are some fears with IOREQ from a safety PoV?  The code
> that does the resource multiplexing is small, and as said above if
> there are safety concerns with the hypercall interface it would be
> fine to limit it's usage to internal handlers only.
> 
> Thanks, Roger.

Reply via email to