On 29.01.2026 11:33, Roger Pau Monné wrote:
> On Thu, Jan 29, 2026 at 09:15:30AM +0100, Jan Beulich wrote:
>> On 28.01.2026 18:49, Roger Pau Monné wrote:
>>> On Mon, Jan 19, 2026 at 03:46:55PM +0100, Jan Beulich wrote:
>>>> Legacy PCI devices don't have any extended config space. Reading any part
>>>> thereof may return all ones or other arbitrary data, e.g. in some cases
>>>> base config space contents repeatedly.
>>>>
>>>> Logic follows Linux 6.19-rc's pci_cfg_space_size(), albeit leveraging our
>>>> determination of device type; in particular some comments are taken
>>>> verbatim from there.
>>>>
>>>> Signed-off-by: Jan Beulich <[email protected]>
>
> Acked-by: Roger Pau Monné <[email protected]>
Thanks. There'll be a v3 though in any event, to add the command line option
that you asked for. I'll take the liberty to retain the ack (whereas I've
dropped Stewart's R-b).
>>>> Linux also has CONFIG_PCI_QUIRKS, allowing to compile out the slightly
>>>> risky code (as reads may in principle have side effects). Should we gain
>>>> such, too?
>>>
>>> I would be fine with just a command line to disable the newly added
>>> behavior in case it causes issues.
>>
>> Can do. Will need to get creative as to the name of such an option.
>
> pci=check-ext-cfg=<bool>? Kind of a mouthful.
As we already have "pci=", I've added a "pci=no-quirks" sub-option there.
>>>> @@ -718,6 +721,11 @@ int pci_add_device(u16 seg, u8 bus, u8 d
>>>>
>>>> list_add(&pdev->vf_list, &pf_pdev->vf_list);
>>>> }
>>>> +
>>>> + if ( !pdev->ext_cfg )
>>>> + printk(XENLOG_WARNING
>>>> + "%pp: VF without extended config space?\n",
>>>> + &pdev->sbdf);
>>>
>>> You possibly also want to check that the PF (pf_pdev in this context I
>>> think) also has ext_cfg == true.
>>
>> I don't think so. No extended config space on a PF means no PF in that sense
>> in the first place, for then there not being any SR-IOV capability.
>
> Right, but won't it be possible for Xen to not be aware of the
> ECAM region for that device, yet the hardware domain somehow managed
> to enable SR-IOV it and request to register a VF?
Then we're screwed elsewhere, when we try to read the SR-IOV capability
ourselves.
>>>> @@ -1041,6 +1049,75 @@ enum pdev_type pdev_type(u16 seg, u8 bus
>>>> return pos ? DEV_TYPE_PCIe_ENDPOINT : DEV_TYPE_PCI;
>>>> }
>>>>
>>>> +void pci_check_extcfg(struct pci_dev *pdev)
>>>> +{
>>>> + unsigned int pos, sig;
>>>> +
>>>> + pdev->ext_cfg = false;
>>>
>>> I think I would prefer if the ext_cfg field is only modified once Xen
>>> know the correct value to put there.
>>
>> Well, my main point of doing it this way is that the code ends up being a
>> little easier to follow. Especially without the optimization talked about
>> near the top, there inevitably will be a window in time where what the
>> field says is wrong. With the optimization there'll be two main cases:
>> - MCFG becoming newly available: The field starts out false in this case,
>> i.e. the write above is a no-op.
>> - MCFG disappearing (largely hypothetical, I think): The field may start
>> out true in this case, but will go false unless we have another access
>> mechanism for extended config space. It then can as well be set to
>> false as early as possible.
>
> Yes, with the optimization to not re-parse existing MMCFGs there's no
> transient windows where the filed is wrongly set.
When there's no change. When there is a change, there'll still be such a
window. Unavoidably, though.
> I also think the registering of MMCFG ares with Xen should be done
> ahead of the OS attempting to access the config space, and hence it's
> not possible for there to be in-flight accesses that could see
> transient invalid pdev->ext_cfg values.
Yes, for Dom0 alone all should be fine. My worry here is with dom0less
and Hyperlaunch.
Jan