Hello, Roger!

On 10/27/20 2:55 PM, Roger Pau Monné wrote:
> On Tue, Oct 27, 2020 at 09:59:05AM +0000, Oleksandr Andrushchenko wrote:
>> Hello, all!
>>
>> While working on PCI passthrough on ARM (partial RFC was published by ARM
>> earlier this year) I tried to implement some related changes in the 
>> toolstack.
>> One of the obstacles for ARM is PCI backend’s related code presence: ARM is
>> going to fully emulate an ECAM host bridge in Xen, so no PCI backend/frontend
>> pair is going to be used.
>>
>> If my understanding correct the functionality which is implemented by the
>> pciback and toolstack and which is relevant/needed for ARM:
>>
>>    1. pciback is used as a database for assignable PCI devices, e.g. xl
>>       pci-assignable-{add|remove|list} manipulates that list. So, whenever 
>> the
>>       toolstack needs to know which PCI devices can be passed through it 
>> reads
>>       that from the relevant sysfs entries of the pciback.
>>
>>    2. pciback is used to hold the unbound PCI devices, e.g. when passing 
>> through a
>>       PCI device it needs to be unbound from the relevant device driver and 
>> bound
>>       to pciback (strictly speaking it is not required that the device is 
>> bound to
>>       pciback, but pciback is again used as a database of the passed through 
>> PCI
>>       devices, so we can re-bind the devices back to their original drivers 
>> when
>>       guest domain shuts down)
>>
>>    3. toolstack depends on Domain-0 for discovering PCI device resources 
>> which are
>>       then permitted for the guest domain, e.g MMIO ranges, IRQs. are read 
>> from
>>       the sysfs
>>
>>    4. toolstack is responsible for resetting PCI devices being passed 
>> through via
>>       sysfs/reset of the Domain-0’s PCI bus subsystem
>>
>>    5. toolstack is responsible for the devices are passed with all relevant
>>       functions, e.g. so for multifunction devices all the functions are 
>> passed to
>>       a domain and no partial passthrough is done
>>
>>    6. toolstack cares about SR-IOV devices (am I correct here?)
> I'm not sure I fully understand what this means. Toolstack cares about
> SR-IOV as it cares about other PCI devices, but the SR-IOV
> functionality is managed by the (dom0) kernel.
Yes, you are right. Please ignore #6
>
>>
>> I have implemented a really dirty POC for that which I would need to clean up
>> before showing, but before that I would like to get some feedback and advice 
>> on
>> how to proceed with the above. I suggest we:
>>
>>    1. Move all pciback related code (which seems to become x86 code only) 
>> into a
>>       dedicated file, something like tools/libxl/libxl_pci_x86.c
>>
>>    2. Make the functionality now provided by pciback architecture dependent, 
>> so
>>       tools/libxl/libxl_pci.c delegates actual assignable device list 
>> handling to
>>       that arch code and uses some sort of “ops”, e.g.
>>       arch->ops.get_all_assignable, arch->ops.add_assignable etc. (This can 
>> also
>>       be done with “#ifdef CONFIG_PCIBACK”, but seems to be not cute). 
>> Introduce
>>       tools/libxl/libxl_pci_arm.c to provide ARM implementation.
> To be fair this is arch and OS dependent, since it's currently based
> on sysfs which is Linux specific. So it should really be
> libxl_pci_linux_x86.c or similar.
This is true, but do we really have any other implementation yet?
>
>>    3. ARM only: As we do not have pciback on ARM we need to have some 
>> storage for
>>       assignable device list: move that into Xen by extending struct pci_dev 
>> with
>>       “bool assigned” and providing sysctls for manipulating that, e.g.
>>       XEN_SYSCTL_pci_device_{set|get}_assigned,
>>       XEN_SYSCTL_pci_device_enum_assigned (to enumerate/get the list of
>>       assigned/not-assigned PCI devices). Can this also be interesting for 
>> x86? At
>>       the moment it seems that x86 does rely on pciback presence, so 
>> probably this
>>       change might not be interesting for x86 world, but may allow stripping
>>       pciback functionality a bit and making the code common to both ARM and 
>> x86.
> How are you going to perform the device reset then? Will you assign
> the device to dom0 after removing it from the guest so that dom0 can
> perform the reset? You will need to use logic currently present in
> pciback to do so IIRC.
>
> It doesn't seem like a bad approach, but there are more consequences
> than just how assignable devices are listed.
>
> Also Xen doesn't currently know about IOMMU groups, so Xen would have
> to gain this knowledge in order to know the minimal set of PCI devices
> that can be assigned to a guest.
Good point, I'll check the relevant reset code. Thanks
>
>>    4. ARM only: It is not clear how to handle re-binding of the PCI driver on
>>       guest shutdown: we need to store the sysfs path of the original driver 
>> the
>>       device was bound to. Do we also want to store that in struct pci_dev?
> I'm not sure I follow you here. On shutdown the device would be
> handled back to Xen?

Currently it is bound back to the driver which we seized the device from (if 
any).

So, probably the same logic should remain?

>
> Most certainly we don't want to store a sysfs (Linux private
> information) inside of a Xen specific struct (pci_dev).
Yeap, this is something I don't like as well
>
>>    5. An alternative route for 3-4 could be to store that data in XenStore, 
>> e.g.
>>       MMIOs, IRQ, bind sysfs path etc. This would require more code on Xen 
>> side to
>>       access XenStore and won’t work if MMIOs/IRQs are passed via device 
>> tree/ACPI
>>       tables by the bootloaders.
> As above, I think I need more context to understand what and why you
> need to save such information.

Well, with pciback absence we loose a "database" which holds all the knowledge

about which devices are assigned, bound etc. So, XenStore *could* be used a such

a database for us. But this looks not elegant.

>
>> Another big question is with respect to Domain-0 and PCI bus sysfs use. The
>> existing code for querying PCI device resources/IRQs and resetting those via
>> sysfs of Domain-0 is more than OK if Domain-0 is present and owns PCI HW. 
>> But,
>> there are at least two cases when this is not going to work on ARM: Dom0less
>> setups and when there is a hardware domain owning PCI devices.
>>
>> In our case we have a dedicated guest which is a sort of hardware domain 
>> (driver
>> domain DomD) which owns all the hardware of the platform, so we are 
>> interested
>> in implementing something that fits our design as well: DomD/hardware domain
>> makes it not possible to access the relevant PCI bus sysfs entries from 
>> Domain-0
>> as those live in DomD/hwdom. This is also true for Dom0less setups as there 
>> is
>> no entity that can provide the same.
> You need some kind of channel to transfer this information from the
> hardware domain to the toolstack domain. Some kind of protocol over
> libvchan might be an option.
Yes, this way it will all be handled without workarounds
>
>> For that reason in my POC I have introduced the following: extended struct
>> pci_dev to hold an array of PCI device’s MMIO ranges and IRQ:
>>
>>    1. Provide internal API for accessing the array of MMIO ranges and IRQ. 
>> This
>>       can be used in both Dom0less and Domain-0 setups to manipulate the 
>> relevant
>>       data. The actual data can be read from a device tree/ACPI tables if
>>       enumeration is done by bootloaders.
> I would be against storing this data inside of Xen if Xen doesn't have
> to make any use of it. Does Xen need to know the MMIO ranges and IRQs
> to perform it's task?
>
> If not, then there's no reason to store those in Xen. The hypervisor
> is not the right place to implement a database like mechanism for PCI
> devices.

We have discussed all the above with Roger on IRC (thank you Roger),

so I'll prepare an RFC for ARM PCI passthrough configuration and send it ASAP.

>
> Roger.

Thank you,

Oleksandr

Reply via email to