On 04.10.2022 15:55, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 03:15:02PM +0200, Jan Beulich wrote:
>> On 04.10.2022 14:59, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote:
>>>> On 04.10.2022 14:17, Roger Pau Monné wrote:
>>>>> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
>>>>>> On 04.10.2022 11:27, Roger Pau Monné wrote:
>>>>>>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
>>>>>>>> On 30.09.2022 16:17, Roger Pau Monne wrote:
>>>>>>>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
>>>>>>>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
>>>>>>>>> devices used by EFI.
>>>>>>>>>
>>>>>>>>> The current parsing of the EFI memory map was translating
>>>>>>>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
>>>>>>>>> x86.  This is an issue because device MMIO regions (BARs) should not
>>>>>>>>> be positioned on reserved regions.  Any BARs positioned on non-hole
>>>>>>>>> areas of the memory map will cause is_memory_hole() to return false,
>>>>>>>>> which would then cause memory decoding to be disabled for such device.
>>>>>>>>> This leads to EFI firmware malfunctions when using runtime services.
>>>>>>>>>
>>>>>>>>> The system under which this was observed has:
>>>>>>>>>
>>>>>>>>> EFI memory map:
>>>>>>>>> [...]
>>>>>>>>>  00000fd000000-00000fe7fffff type=11 attr=800000000000100d
>>>>>>>>> [...]
>>>>>>>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
>>>>>>>>>
>>>>>>>>> The device behind this BAR is:
>>>>>>>>>
>>>>>>>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
>>>>>>>>> Controller (rev 09)
>>>>>>>>>       Subsystem: Super Micro Computer Inc Device 091c
>>>>>>>>>       Flags: fast devsel
>>>>>>>>>       Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well
>>>>>>>>>
>>>>>>>>> For the record, the symptom observed in that machine was a hard freeze
>>>>>>>>> when attempting to set an EFI variable (XEN_EFI_set_variable).
>>>>>>>>>
>>>>>>>>> Fix by not adding regions with type EfiMemoryMappedIO or
>>>>>>>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
>>>>>>>>> be positioned there.
>>>>>>>>>
>>>>>>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably 
>>>>>>>>> positioned')
>>>>>>>>> Signed-off-by: Roger Pau Monné <roger....@citrix.com>
>>>>>>>>
>>>>>>>> In the best case this is moving us from one way of being wrong to 
>>>>>>>> another:
>>>>>>>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
>>>>>>>> legitimately covered by a EfiMemoryMappedIO region in the first place,
>>>>>>>> which I'm not sure is actually permitted - iirc just like E820_RESERVED
>>>>>>>> may not be used for BARs, this memory type also may not be), whereas 
>>>>>>>> with
>>>>>>>> your change we would no longer report non-BAR MMIO space (chipset 
>>>>>>>> specific
>>>>>>>> ranges for example) as reserved. In fact I think the example you 
>>>>>>>> provide
>>>>>>>> is at least partly due to bogus firmware behavior: The BAR is put in 
>>>>>>>> space
>>>>>>>> normally used for firmware specific memory (MMIO) ranges. I think 
>>>>>>>> firmware
>>>>>>>> should either assign the BAR differently or exclude the range from the
>>>>>>>> memory map.
>>>>>>>
>>>>>>> Hm, I'm not sure the example is bogus, how would firmware request a BAR
>>>>>>> to be mapped for run time services to access it otherwise if it's not
>>>>>>> using EfiMemoryMappedIO?
>>>>>>>
>>>>>>> Not adding the BAR to the memory map in any way would mean the OS is
>>>>>>> free to not map it for runtime services to access.
>>>>>>
>>>>>> My view is that BARs should not be marked for runtime services use. Doing
>>>>>> so requires awareness of the driver inside the OS, which I don't think
>>>>>> one can expect. If firmware needs to make use of a device in a system, it
>>>>>> ought to properly hide it from the OS. Note how the potential sharing of
>>>>>> an RTC requires special provisions in the spec, mandating driver 
>>>>>> awareness.
>>>>>>
>>>>>> Having a BAR expressed in the memory map also contradicts the ability of
>>>>>> an OS to relocate all BARs of all devices, if necessary.
>>>>>
>>>>> I've failed to figure out if there's a way in UEFI to report a device
>>>>> is in use by the firmware.  I've already looked before sending the
>>>>> patch (see also the post commit notes about for example not passing
>>>>> through the device to any guest for obvious reason).
>>>>>
>>>>> I've got no idea if Linux has any checks to avoid trying to move BARs
>>>>> residing in EfiMemoryMappedIO ranges, we have now observed this
>>>>> behavior in two systems already.
>>>>>
>>>>> Maybe we could do a special check for PCI devices and allow them
>>>>> having BARs in EfiMemoryMappedIO, together with printing a warning
>>>>> message.
>>>>
>>>> Right, that's one of the possible quirk workarounds I was thinking of.
>>>> At the risk of stating the obvious - the same would presumably apply to
>>>> E820_RESERVED on non-EFI systems then.
>>>
>>> One option would be to strictly limit to EfiMemoryMappedIO, by taking
>>> the EFI memory map into account also if present.
>>>
>>> Another maybe simpler option is to allow BARs to be placed in
>>> E820_RESERVED regions, and translate EfiMemoryMappedIO into
>>> E820_RESERVED like we have been doing.
>>>
>>> I will attempt the later if you are OK with the approach.
>>
>> I might be okay with the approach, but first of all I continue to be
>> worried of us violating specifications if we make this the default
>> behavior.
> 
> In any case it would be the firmware violating the specification by
> not hiding those PCI devices, Xen is just trying to cope.
> 
> Xen went from not checking the position of the BARs at all to
> enforcing them to not be placed over any regions on the memory map. I
> think we need to relax the checks a bit to match reality.

Sure, as a workaround. Yet we don't want to relax too much, or else a
primary purpose of the check will be lost.

Jan

Reply via email to