Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED

2022-10-04 Thread Jan Beulich
On 04.10.2022 15:55, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 03:15:02PM +0200, Jan Beulich wrote:
>> On 04.10.2022 14:59, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote:
 On 04.10.2022 14:17, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
>> On 04.10.2022 11:27, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
 On 30.09.2022 16:17, Roger Pau Monne wrote:
> The EFI memory map contains two memory types (EfiMemoryMappedIO and
> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
> devices used by EFI.
>
> The current parsing of the EFI memory map was translating
> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
> x86.  This is an issue because device MMIO regions (BARs) should not
> be positioned on reserved regions.  Any BARs positioned on non-hole
> areas of the memory map will cause is_memory_hole() to return false,
> which would then cause memory decoding to be disabled for such device.
> This leads to EFI firmware malfunctions when using runtime services.
>
> The system under which this was observed has:
>
> EFI memory map:
> [...]
>  0fd00-0fe7f type=11 attr=8000100d
> [...]
> :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
>
> The device behind this BAR is:
>
> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
> Controller (rev 09)
>   Subsystem: Super Micro Computer Inc Device 091c
>   Flags: fast devsel
>   Memory at fe01 (32-bit, non-prefetchable) [size=4K]well
>
> For the record, the symptom observed in that machine was a hard freeze
> when attempting to set an EFI variable (XEN_EFI_set_variable).
>
> Fix by not adding regions with type EfiMemoryMappedIO or
> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
> be positioned there.
>
> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably 
> positioned')
> Signed-off-by: Roger Pau Monné 

 In the best case this is moving us from one way of being wrong to 
 another:
 So far we wrongly include BARs in E820_RESERVED (_if_ they can be
 legitimately covered by a EfiMemoryMappedIO region in the first place,
 which I'm not sure is actually permitted - iirc just like E820_RESERVED
 may not be used for BARs, this memory type also may not be), whereas 
 with
 your change we would no longer report non-BAR MMIO space (chipset 
 specific
 ranges for example) as reserved. In fact I think the example you 
 provide
 is at least partly due to bogus firmware behavior: The BAR is put in 
 space
 normally used for firmware specific memory (MMIO) ranges. I think 
 firmware
 should either assign the BAR differently or exclude the range from the
 memory map.
>>>
>>> Hm, I'm not sure the example is bogus, how would firmware request a BAR
>>> to be mapped for run time services to access it otherwise if it's not
>>> using EfiMemoryMappedIO?
>>>
>>> Not adding the BAR to the memory map in any way would mean the OS is
>>> free to not map it for runtime services to access.
>>
>> My view is that BARs should not be marked for runtime services use. Doing
>> so requires awareness of the driver inside the OS, which I don't think
>> one can expect. If firmware needs to make use of a device in a system, it
>> ought to properly hide it from the OS. Note how the potential sharing of
>> an RTC requires special provisions in the spec, mandating driver 
>> awareness.
>>
>> Having a BAR expressed in the memory map also contradicts the ability of
>> an OS to relocate all BARs of all devices, if necessary.
>
> I've failed to figure out if there's a way in UEFI to report a device
> is in use by the firmware.  I've already looked before sending the
> patch (see also the post commit notes about for example not passing
> through the device to any guest for obvious reason).
>
> I've got no idea if Linux has any checks to avoid trying to move BARs
> residing in EfiMemoryMappedIO ranges, we have now observed this
> behavior in two systems already.
>
> Maybe we could do a special check for PCI devices and allow them
> having BARs in EfiMemoryMappedIO, together with printing a warning
> message.

 Right, that's one of the possible quirk workarounds I was thinking of.
 At the risk of stating the obvious - the same would presumably apply to

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED

2022-10-04 Thread Roger Pau Monné
On Tue, Oct 04, 2022 at 03:15:02PM +0200, Jan Beulich wrote:
> On 04.10.2022 14:59, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote:
> >> On 04.10.2022 14:17, Roger Pau Monné wrote:
> >>> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
>  On 04.10.2022 11:27, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
> >> On 30.09.2022 16:17, Roger Pau Monne wrote:
> >>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
> >>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
> >>> devices used by EFI.
> >>>
> >>> The current parsing of the EFI memory map was translating
> >>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
> >>> x86.  This is an issue because device MMIO regions (BARs) should not
> >>> be positioned on reserved regions.  Any BARs positioned on non-hole
> >>> areas of the memory map will cause is_memory_hole() to return false,
> >>> which would then cause memory decoding to be disabled for such device.
> >>> This leads to EFI firmware malfunctions when using runtime services.
> >>>
> >>> The system under which this was observed has:
> >>>
> >>> EFI memory map:
> >>> [...]
> >>>  0fd00-0fe7f type=11 attr=8000100d
> >>> [...]
> >>> :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
> >>>
> >>> The device behind this BAR is:
> >>>
> >>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
> >>> Controller (rev 09)
> >>>   Subsystem: Super Micro Computer Inc Device 091c
> >>>   Flags: fast devsel
> >>>   Memory at fe01 (32-bit, non-prefetchable) [size=4K]well
> >>>
> >>> For the record, the symptom observed in that machine was a hard freeze
> >>> when attempting to set an EFI variable (XEN_EFI_set_variable).
> >>>
> >>> Fix by not adding regions with type EfiMemoryMappedIO or
> >>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
> >>> be positioned there.
> >>>
> >>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably 
> >>> positioned')
> >>> Signed-off-by: Roger Pau Monné 
> >>
> >> In the best case this is moving us from one way of being wrong to 
> >> another:
> >> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
> >> legitimately covered by a EfiMemoryMappedIO region in the first place,
> >> which I'm not sure is actually permitted - iirc just like E820_RESERVED
> >> may not be used for BARs, this memory type also may not be), whereas 
> >> with
> >> your change we would no longer report non-BAR MMIO space (chipset 
> >> specific
> >> ranges for example) as reserved. In fact I think the example you 
> >> provide
> >> is at least partly due to bogus firmware behavior: The BAR is put in 
> >> space
> >> normally used for firmware specific memory (MMIO) ranges. I think 
> >> firmware
> >> should either assign the BAR differently or exclude the range from the
> >> memory map.
> >
> > Hm, I'm not sure the example is bogus, how would firmware request a BAR
> > to be mapped for run time services to access it otherwise if it's not
> > using EfiMemoryMappedIO?
> >
> > Not adding the BAR to the memory map in any way would mean the OS is
> > free to not map it for runtime services to access.
> 
>  My view is that BARs should not be marked for runtime services use. Doing
>  so requires awareness of the driver inside the OS, which I don't think
>  one can expect. If firmware needs to make use of a device in a system, it
>  ought to properly hide it from the OS. Note how the potential sharing of
>  an RTC requires special provisions in the spec, mandating driver 
>  awareness.
> 
>  Having a BAR expressed in the memory map also contradicts the ability of
>  an OS to relocate all BARs of all devices, if necessary.
> >>>
> >>> I've failed to figure out if there's a way in UEFI to report a device
> >>> is in use by the firmware.  I've already looked before sending the
> >>> patch (see also the post commit notes about for example not passing
> >>> through the device to any guest for obvious reason).
> >>>
> >>> I've got no idea if Linux has any checks to avoid trying to move BARs
> >>> residing in EfiMemoryMappedIO ranges, we have now observed this
> >>> behavior in two systems already.
> >>>
> >>> Maybe we could do a special check for PCI devices and allow them
> >>> having BARs in EfiMemoryMappedIO, together with printing a warning
> >>> message.
> >>
> >> Right, that's one of the possible quirk workarounds I was thinking of.
> >> At the risk of stating the obvious - the same would presumably apply to
> >> E820_RESERVED on non-EFI systems then.
> > 

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED

2022-10-04 Thread Jan Beulich
On 04.10.2022 14:59, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote:
>> On 04.10.2022 14:17, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
 On 04.10.2022 11:27, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
>> On 30.09.2022 16:17, Roger Pau Monne wrote:
>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
>>> devices used by EFI.
>>>
>>> The current parsing of the EFI memory map was translating
>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
>>> x86.  This is an issue because device MMIO regions (BARs) should not
>>> be positioned on reserved regions.  Any BARs positioned on non-hole
>>> areas of the memory map will cause is_memory_hole() to return false,
>>> which would then cause memory decoding to be disabled for such device.
>>> This leads to EFI firmware malfunctions when using runtime services.
>>>
>>> The system under which this was observed has:
>>>
>>> EFI memory map:
>>> [...]
>>>  0fd00-0fe7f type=11 attr=8000100d
>>> [...]
>>> :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
>>>
>>> The device behind this BAR is:
>>>
>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
>>> Controller (rev 09)
>>> Subsystem: Super Micro Computer Inc Device 091c
>>> Flags: fast devsel
>>> Memory at fe01 (32-bit, non-prefetchable) [size=4K]well
>>>
>>> For the record, the symptom observed in that machine was a hard freeze
>>> when attempting to set an EFI variable (XEN_EFI_set_variable).
>>>
>>> Fix by not adding regions with type EfiMemoryMappedIO or
>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
>>> be positioned there.
>>>
>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably 
>>> positioned')
>>> Signed-off-by: Roger Pau Monné 
>>
>> In the best case this is moving us from one way of being wrong to 
>> another:
>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
>> legitimately covered by a EfiMemoryMappedIO region in the first place,
>> which I'm not sure is actually permitted - iirc just like E820_RESERVED
>> may not be used for BARs, this memory type also may not be), whereas with
>> your change we would no longer report non-BAR MMIO space (chipset 
>> specific
>> ranges for example) as reserved. In fact I think the example you provide
>> is at least partly due to bogus firmware behavior: The BAR is put in 
>> space
>> normally used for firmware specific memory (MMIO) ranges. I think 
>> firmware
>> should either assign the BAR differently or exclude the range from the
>> memory map.
>
> Hm, I'm not sure the example is bogus, how would firmware request a BAR
> to be mapped for run time services to access it otherwise if it's not
> using EfiMemoryMappedIO?
>
> Not adding the BAR to the memory map in any way would mean the OS is
> free to not map it for runtime services to access.

 My view is that BARs should not be marked for runtime services use. Doing
 so requires awareness of the driver inside the OS, which I don't think
 one can expect. If firmware needs to make use of a device in a system, it
 ought to properly hide it from the OS. Note how the potential sharing of
 an RTC requires special provisions in the spec, mandating driver awareness.

 Having a BAR expressed in the memory map also contradicts the ability of
 an OS to relocate all BARs of all devices, if necessary.
>>>
>>> I've failed to figure out if there's a way in UEFI to report a device
>>> is in use by the firmware.  I've already looked before sending the
>>> patch (see also the post commit notes about for example not passing
>>> through the device to any guest for obvious reason).
>>>
>>> I've got no idea if Linux has any checks to avoid trying to move BARs
>>> residing in EfiMemoryMappedIO ranges, we have now observed this
>>> behavior in two systems already.
>>>
>>> Maybe we could do a special check for PCI devices and allow them
>>> having BARs in EfiMemoryMappedIO, together with printing a warning
>>> message.
>>
>> Right, that's one of the possible quirk workarounds I was thinking of.
>> At the risk of stating the obvious - the same would presumably apply to
>> E820_RESERVED on non-EFI systems then.
> 
> One option would be to strictly limit to EfiMemoryMappedIO, by taking
> the EFI memory map into account also if present.
> 
> Another maybe simpler option is to allow BARs to be placed in
> E820_RESERVED regions, and translate EfiMemoryMappedIO into
> E820_RESERVED 

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED

2022-10-04 Thread Roger Pau Monné
On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote:
> On 04.10.2022 14:17, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
> >> On 04.10.2022 11:27, Roger Pau Monné wrote:
> >>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
>  On 30.09.2022 16:17, Roger Pau Monne wrote:
> > The EFI memory map contains two memory types (EfiMemoryMappedIO and
> > EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
> > devices used by EFI.
> >
> > The current parsing of the EFI memory map was translating
> > EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
> > x86.  This is an issue because device MMIO regions (BARs) should not
> > be positioned on reserved regions.  Any BARs positioned on non-hole
> > areas of the memory map will cause is_memory_hole() to return false,
> > which would then cause memory decoding to be disabled for such device.
> > This leads to EFI firmware malfunctions when using runtime services.
> >
> > The system under which this was observed has:
> >
> > EFI memory map:
> > [...]
> >  0fd00-0fe7f type=11 attr=8000100d
> > [...]
> > :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
> >
> > The device behind this BAR is:
> >
> > 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
> > Controller (rev 09)
> > Subsystem: Super Micro Computer Inc Device 091c
> > Flags: fast devsel
> > Memory at fe01 (32-bit, non-prefetchable) [size=4K]well
> >
> > For the record, the symptom observed in that machine was a hard freeze
> > when attempting to set an EFI variable (XEN_EFI_set_variable).
> >
> > Fix by not adding regions with type EfiMemoryMappedIO or
> > EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
> > be positioned there.
> >
> > Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably 
> > positioned')
> > Signed-off-by: Roger Pau Monné 
> 
>  In the best case this is moving us from one way of being wrong to 
>  another:
>  So far we wrongly include BARs in E820_RESERVED (_if_ they can be
>  legitimately covered by a EfiMemoryMappedIO region in the first place,
>  which I'm not sure is actually permitted - iirc just like E820_RESERVED
>  may not be used for BARs, this memory type also may not be), whereas with
>  your change we would no longer report non-BAR MMIO space (chipset 
>  specific
>  ranges for example) as reserved. In fact I think the example you provide
>  is at least partly due to bogus firmware behavior: The BAR is put in 
>  space
>  normally used for firmware specific memory (MMIO) ranges. I think 
>  firmware
>  should either assign the BAR differently or exclude the range from the
>  memory map.
> >>>
> >>> Hm, I'm not sure the example is bogus, how would firmware request a BAR
> >>> to be mapped for run time services to access it otherwise if it's not
> >>> using EfiMemoryMappedIO?
> >>>
> >>> Not adding the BAR to the memory map in any way would mean the OS is
> >>> free to not map it for runtime services to access.
> >>
> >> My view is that BARs should not be marked for runtime services use. Doing
> >> so requires awareness of the driver inside the OS, which I don't think
> >> one can expect. If firmware needs to make use of a device in a system, it
> >> ought to properly hide it from the OS. Note how the potential sharing of
> >> an RTC requires special provisions in the spec, mandating driver awareness.
> >>
> >> Having a BAR expressed in the memory map also contradicts the ability of
> >> an OS to relocate all BARs of all devices, if necessary.
> > 
> > I've failed to figure out if there's a way in UEFI to report a device
> > is in use by the firmware.  I've already looked before sending the
> > patch (see also the post commit notes about for example not passing
> > through the device to any guest for obvious reason).
> > 
> > I've got no idea if Linux has any checks to avoid trying to move BARs
> > residing in EfiMemoryMappedIO ranges, we have now observed this
> > behavior in two systems already.
> > 
> > Maybe we could do a special check for PCI devices and allow them
> > having BARs in EfiMemoryMappedIO, together with printing a warning
> > message.
> 
> Right, that's one of the possible quirk workarounds I was thinking of.
> At the risk of stating the obvious - the same would presumably apply to
> E820_RESERVED on non-EFI systems then.

One option would be to strictly limit to EfiMemoryMappedIO, by taking
the EFI memory map into account also if present.

Another maybe simpler option is to allow BARs to be placed in
E820_RESERVED regions, and translate EfiMemoryMappedIO into
E820_RESERVED like we have been doing.

I will attempt the later if you are OK 

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED

2022-10-04 Thread Jan Beulich
On 04.10.2022 14:17, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
>> On 04.10.2022 11:27, Roger Pau Monné wrote:
>>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
 On 30.09.2022 16:17, Roger Pau Monne wrote:
> The EFI memory map contains two memory types (EfiMemoryMappedIO and
> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
> devices used by EFI.
>
> The current parsing of the EFI memory map was translating
> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
> x86.  This is an issue because device MMIO regions (BARs) should not
> be positioned on reserved regions.  Any BARs positioned on non-hole
> areas of the memory map will cause is_memory_hole() to return false,
> which would then cause memory decoding to be disabled for such device.
> This leads to EFI firmware malfunctions when using runtime services.
>
> The system under which this was observed has:
>
> EFI memory map:
> [...]
>  0fd00-0fe7f type=11 attr=8000100d
> [...]
> :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
>
> The device behind this BAR is:
>
> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
> Controller (rev 09)
>   Subsystem: Super Micro Computer Inc Device 091c
>   Flags: fast devsel
>   Memory at fe01 (32-bit, non-prefetchable) [size=4K]well
>
> For the record, the symptom observed in that machine was a hard freeze
> when attempting to set an EFI variable (XEN_EFI_set_variable).
>
> Fix by not adding regions with type EfiMemoryMappedIO or
> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
> be positioned there.
>
> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably 
> positioned')
> Signed-off-by: Roger Pau Monné 

 In the best case this is moving us from one way of being wrong to another:
 So far we wrongly include BARs in E820_RESERVED (_if_ they can be
 legitimately covered by a EfiMemoryMappedIO region in the first place,
 which I'm not sure is actually permitted - iirc just like E820_RESERVED
 may not be used for BARs, this memory type also may not be), whereas with
 your change we would no longer report non-BAR MMIO space (chipset specific
 ranges for example) as reserved. In fact I think the example you provide
 is at least partly due to bogus firmware behavior: The BAR is put in space
 normally used for firmware specific memory (MMIO) ranges. I think firmware
 should either assign the BAR differently or exclude the range from the
 memory map.
>>>
>>> Hm, I'm not sure the example is bogus, how would firmware request a BAR
>>> to be mapped for run time services to access it otherwise if it's not
>>> using EfiMemoryMappedIO?
>>>
>>> Not adding the BAR to the memory map in any way would mean the OS is
>>> free to not map it for runtime services to access.
>>
>> My view is that BARs should not be marked for runtime services use. Doing
>> so requires awareness of the driver inside the OS, which I don't think
>> one can expect. If firmware needs to make use of a device in a system, it
>> ought to properly hide it from the OS. Note how the potential sharing of
>> an RTC requires special provisions in the spec, mandating driver awareness.
>>
>> Having a BAR expressed in the memory map also contradicts the ability of
>> an OS to relocate all BARs of all devices, if necessary.
> 
> I've failed to figure out if there's a way in UEFI to report a device
> is in use by the firmware.  I've already looked before sending the
> patch (see also the post commit notes about for example not passing
> through the device to any guest for obvious reason).
> 
> I've got no idea if Linux has any checks to avoid trying to move BARs
> residing in EfiMemoryMappedIO ranges, we have now observed this
> behavior in two systems already.
> 
> Maybe we could do a special check for PCI devices and allow them
> having BARs in EfiMemoryMappedIO, together with printing a warning
> message.

Right, that's one of the possible quirk workarounds I was thinking of.
At the risk of stating the obvious - the same would presumably apply to
E820_RESERVED on non-EFI systems then.

Jan



Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED

2022-10-04 Thread Roger Pau Monné
On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote:
> On 04.10.2022 11:27, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
> >> On 30.09.2022 16:17, Roger Pau Monne wrote:
> >>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
> >>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
> >>> devices used by EFI.
> >>>
> >>> The current parsing of the EFI memory map was translating
> >>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
> >>> x86.  This is an issue because device MMIO regions (BARs) should not
> >>> be positioned on reserved regions.  Any BARs positioned on non-hole
> >>> areas of the memory map will cause is_memory_hole() to return false,
> >>> which would then cause memory decoding to be disabled for such device.
> >>> This leads to EFI firmware malfunctions when using runtime services.
> >>>
> >>> The system under which this was observed has:
> >>>
> >>> EFI memory map:
> >>> [...]
> >>>  0fd00-0fe7f type=11 attr=8000100d
> >>> [...]
> >>> :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
> >>>
> >>> The device behind this BAR is:
> >>>
> >>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
> >>> Controller (rev 09)
> >>>   Subsystem: Super Micro Computer Inc Device 091c
> >>>   Flags: fast devsel
> >>>   Memory at fe01 (32-bit, non-prefetchable) [size=4K]well
> >>>
> >>> For the record, the symptom observed in that machine was a hard freeze
> >>> when attempting to set an EFI variable (XEN_EFI_set_variable).
> >>>
> >>> Fix by not adding regions with type EfiMemoryMappedIO or
> >>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
> >>> be positioned there.
> >>>
> >>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably 
> >>> positioned')
> >>> Signed-off-by: Roger Pau Monné 
> >>
> >> In the best case this is moving us from one way of being wrong to another:
> >> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
> >> legitimately covered by a EfiMemoryMappedIO region in the first place,
> >> which I'm not sure is actually permitted - iirc just like E820_RESERVED
> >> may not be used for BARs, this memory type also may not be), whereas with
> >> your change we would no longer report non-BAR MMIO space (chipset specific
> >> ranges for example) as reserved. In fact I think the example you provide
> >> is at least partly due to bogus firmware behavior: The BAR is put in space
> >> normally used for firmware specific memory (MMIO) ranges. I think firmware
> >> should either assign the BAR differently or exclude the range from the
> >> memory map.
> > 
> > Hm, I'm not sure the example is bogus, how would firmware request a BAR
> > to be mapped for run time services to access it otherwise if it's not
> > using EfiMemoryMappedIO?
> > 
> > Not adding the BAR to the memory map in any way would mean the OS is
> > free to not map it for runtime services to access.
> 
> My view is that BARs should not be marked for runtime services use. Doing
> so requires awareness of the driver inside the OS, which I don't think
> one can expect. If firmware needs to make use of a device in a system, it
> ought to properly hide it from the OS. Note how the potential sharing of
> an RTC requires special provisions in the spec, mandating driver awareness.
> 
> Having a BAR expressed in the memory map also contradicts the ability of
> an OS to relocate all BARs of all devices, if necessary.

I've failed to figure out if there's a way in UEFI to report a device
is in use by the firmware.  I've already looked before sending the
patch (see also the post commit notes about for example not passing
through the device to any guest for obvious reason).

I've got no idea if Linux has any checks to avoid trying to move BARs
residing in EfiMemoryMappedIO ranges, we have now observed this
behavior in two systems already.

Maybe we could do a special check for PCI devices and allow them
having BARs in EfiMemoryMappedIO, together with printing a warning
message.

Thanks, Roger.



Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED

2022-10-04 Thread Jan Beulich
On 04.10.2022 11:27, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
>> On 30.09.2022 16:17, Roger Pau Monne wrote:
>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
>>> devices used by EFI.
>>>
>>> The current parsing of the EFI memory map was translating
>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
>>> x86.  This is an issue because device MMIO regions (BARs) should not
>>> be positioned on reserved regions.  Any BARs positioned on non-hole
>>> areas of the memory map will cause is_memory_hole() to return false,
>>> which would then cause memory decoding to be disabled for such device.
>>> This leads to EFI firmware malfunctions when using runtime services.
>>>
>>> The system under which this was observed has:
>>>
>>> EFI memory map:
>>> [...]
>>>  0fd00-0fe7f type=11 attr=8000100d
>>> [...]
>>> :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
>>>
>>> The device behind this BAR is:
>>>
>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
>>> Controller (rev 09)
>>> Subsystem: Super Micro Computer Inc Device 091c
>>> Flags: fast devsel
>>> Memory at fe01 (32-bit, non-prefetchable) [size=4K]well
>>>
>>> For the record, the symptom observed in that machine was a hard freeze
>>> when attempting to set an EFI variable (XEN_EFI_set_variable).
>>>
>>> Fix by not adding regions with type EfiMemoryMappedIO or
>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
>>> be positioned there.
>>>
>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
>>> Signed-off-by: Roger Pau Monné 
>>
>> In the best case this is moving us from one way of being wrong to another:
>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
>> legitimately covered by a EfiMemoryMappedIO region in the first place,
>> which I'm not sure is actually permitted - iirc just like E820_RESERVED
>> may not be used for BARs, this memory type also may not be), whereas with
>> your change we would no longer report non-BAR MMIO space (chipset specific
>> ranges for example) as reserved. In fact I think the example you provide
>> is at least partly due to bogus firmware behavior: The BAR is put in space
>> normally used for firmware specific memory (MMIO) ranges. I think firmware
>> should either assign the BAR differently or exclude the range from the
>> memory map.
> 
> Hm, I'm not sure the example is bogus, how would firmware request a BAR
> to be mapped for run time services to access it otherwise if it's not
> using EfiMemoryMappedIO?
> 
> Not adding the BAR to the memory map in any way would mean the OS is
> free to not map it for runtime services to access.

My view is that BARs should not be marked for runtime services use. Doing
so requires awareness of the driver inside the OS, which I don't think
one can expect. If firmware needs to make use of a device in a system, it
ought to properly hide it from the OS. Note how the potential sharing of
an RTC requires special provisions in the spec, mandating driver awareness.

Having a BAR expressed in the memory map also contradicts the ability of
an OS to relocate all BARs of all devices, if necessary.

>> I guess instead we want to handle the example you give by a firmware quirk
>> workaround.
> 
> I'm unconvinced we need a quirk for this. AFAICT such usage of
> EfiMemoryMappedIO doesn't go against the UEFI spec, and hence we need
> to handle it without requiring specific firmware quirks.
> 
>>> ---
>>> I don't understand the definition of EfiMemoryMappedIOPortSpace:
>>>
>>> "System memory-mapped IO region that is used to translate memory
>>> cycles to IO cycles by the processor."
>>
>> That's something (only?) IA-64 used, where kind of as a "replacement" for
>> x86 I/O port accesses equivalents thereof were provided (iirc 4 ports
>> per page) via MMIO accesses. It is this compatibility MMIO space which is
>> marked this way. Such ranges should never be seen on (current) x86.
> 
> I've heard the Arm guys speak about something similar.
> 
> There's a clarification note in newer versions of the UEFI spec:
> 
> "Note: There is only one region of type EfiMemoryMappedIoPortSpace
> defined in the architecture for Itanium-based platforms. As a result,
> there should be one and only one region of type
> EfiMemoryMappedIoPortSpace in the EFI memory map of an Itanium-based
> platform."
> 
>>> But given its name I would assume it's also likely used to mark ranges
>>> in use by PCI device BARs.
>>>
>>> It would also be interesting to forward this information to dom0, so
>>> it doesn't attempt to move the BARs of this device(s) around, or else
>>> issues will arise.
>>
>> None of this is device specific. There's simply (typically) one MMIO
>> range covering the entire 64k or I/O port space.
> 
> So this translation 

Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED

2022-10-04 Thread Roger Pau Monné
On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote:
> On 30.09.2022 16:17, Roger Pau Monne wrote:
> > The EFI memory map contains two memory types (EfiMemoryMappedIO and
> > EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
> > devices used by EFI.
> > 
> > The current parsing of the EFI memory map was translating
> > EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
> > x86.  This is an issue because device MMIO regions (BARs) should not
> > be positioned on reserved regions.  Any BARs positioned on non-hole
> > areas of the memory map will cause is_memory_hole() to return false,
> > which would then cause memory decoding to be disabled for such device.
> > This leads to EFI firmware malfunctions when using runtime services.
> > 
> > The system under which this was observed has:
> > 
> > EFI memory map:
> > [...]
> >  0fd00-0fe7f type=11 attr=8000100d
> > [...]
> > :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
> > 
> > The device behind this BAR is:
> > 
> > 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
> > Controller (rev 09)
> > Subsystem: Super Micro Computer Inc Device 091c
> > Flags: fast devsel
> > Memory at fe01 (32-bit, non-prefetchable) [size=4K]well
> > 
> > For the record, the symptom observed in that machine was a hard freeze
> > when attempting to set an EFI variable (XEN_EFI_set_variable).
> > 
> > Fix by not adding regions with type EfiMemoryMappedIO or
> > EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
> > be positioned there.
> > 
> > Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> > Signed-off-by: Roger Pau Monné 
> 
> In the best case this is moving us from one way of being wrong to another:
> So far we wrongly include BARs in E820_RESERVED (_if_ they can be
> legitimately covered by a EfiMemoryMappedIO region in the first place,
> which I'm not sure is actually permitted - iirc just like E820_RESERVED
> may not be used for BARs, this memory type also may not be), whereas with
> your change we would no longer report non-BAR MMIO space (chipset specific
> ranges for example) as reserved. In fact I think the example you provide
> is at least partly due to bogus firmware behavior: The BAR is put in space
> normally used for firmware specific memory (MMIO) ranges. I think firmware
> should either assign the BAR differently or exclude the range from the
> memory map.

Hm, I'm not sure the example is bogus, how would firmware request a BAR
to be mapped for run time services to access it otherwise if it's not
using EfiMemoryMappedIO?

Not adding the BAR to the memory map in any way would mean the OS is
free to not map it for runtime services to access.

> I guess instead we want to handle the example you give by a firmware quirk
> workaround.

I'm unconvinced we need a quirk for this. AFAICT such usage of
EfiMemoryMappedIO doesn't go against the UEFI spec, and hence we need
to handle it without requiring specific firmware quirks.

> > ---
> > I don't understand the definition of EfiMemoryMappedIOPortSpace:
> > 
> > "System memory-mapped IO region that is used to translate memory
> > cycles to IO cycles by the processor."
> 
> That's something (only?) IA-64 used, where kind of as a "replacement" for
> x86 I/O port accesses equivalents thereof were provided (iirc 4 ports
> per page) via MMIO accesses. It is this compatibility MMIO space which is
> marked this way. Such ranges should never be seen on (current) x86.

I've heard the Arm guys speak about something similar.

There's a clarification note in newer versions of the UEFI spec:

"Note: There is only one region of type EfiMemoryMappedIoPortSpace
defined in the architecture for Itanium-based platforms. As a result,
there should be one and only one region of type
EfiMemoryMappedIoPortSpace in the EFI memory map of an Itanium-based
platform."

> > But given its name I would assume it's also likely used to mark ranges
> > in use by PCI device BARs.
> > 
> > It would also be interesting to forward this information to dom0, so
> > it doesn't attempt to move the BARs of this device(s) around, or else
> > issues will arise.
> 
> None of this is device specific. There's simply (typically) one MMIO
> range covering the entire 64k or I/O port space.

So this translation region won't be in a BAR of a host bridge for
example?

Thanks, Roger.



Re: [PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED

2022-10-04 Thread Jan Beulich
On 30.09.2022 16:17, Roger Pau Monne wrote:
> The EFI memory map contains two memory types (EfiMemoryMappedIO and
> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
> devices used by EFI.
> 
> The current parsing of the EFI memory map was translating
> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
> x86.  This is an issue because device MMIO regions (BARs) should not
> be positioned on reserved regions.  Any BARs positioned on non-hole
> areas of the memory map will cause is_memory_hole() to return false,
> which would then cause memory decoding to be disabled for such device.
> This leads to EFI firmware malfunctions when using runtime services.
> 
> The system under which this was observed has:
> 
> EFI memory map:
> [...]
>  0fd00-0fe7f type=11 attr=8000100d
> [...]
> :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
> 
> The device behind this BAR is:
> 
> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
> Controller (rev 09)
>   Subsystem: Super Micro Computer Inc Device 091c
>   Flags: fast devsel
>   Memory at fe01 (32-bit, non-prefetchable) [size=4K]well
> 
> For the record, the symptom observed in that machine was a hard freeze
> when attempting to set an EFI variable (XEN_EFI_set_variable).
> 
> Fix by not adding regions with type EfiMemoryMappedIO or
> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
> be positioned there.
> 
> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
> Signed-off-by: Roger Pau Monné 

In the best case this is moving us from one way of being wrong to another:
So far we wrongly include BARs in E820_RESERVED (_if_ they can be
legitimately covered by a EfiMemoryMappedIO region in the first place,
which I'm not sure is actually permitted - iirc just like E820_RESERVED
may not be used for BARs, this memory type also may not be), whereas with
your change we would no longer report non-BAR MMIO space (chipset specific
ranges for example) as reserved. In fact I think the example you provide
is at least partly due to bogus firmware behavior: The BAR is put in space
normally used for firmware specific memory (MMIO) ranges. I think firmware
should either assign the BAR differently or exclude the range from the
memory map.

I guess instead we want to handle the example you give by a firmware quirk
workaround.

> ---
> I don't understand the definition of EfiMemoryMappedIOPortSpace:
> 
> "System memory-mapped IO region that is used to translate memory
> cycles to IO cycles by the processor."

That's something (only?) IA-64 used, where kind of as a "replacement" for
x86 I/O port accesses equivalents thereof were provided (iirc 4 ports
per page) via MMIO accesses. It is this compatibility MMIO space which is
marked this way. Such ranges should never be seen on (current) x86.

> But given its name I would assume it's also likely used to mark ranges
> in use by PCI device BARs.
> 
> It would also be interesting to forward this information to dom0, so
> it doesn't attempt to move the BARs of this device(s) around, or else
> issues will arise.

None of this is device specific. There's simply (typically) one MMIO
range covering the entire 64k or I/O port space.

Jan



[PATCH for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED

2022-09-30 Thread Roger Pau Monne
The EFI memory map contains two memory types (EfiMemoryMappedIO and
EfiMemoryMappedIOPortSpace) used to describe IO memory areas of
devices used by EFI.

The current parsing of the EFI memory map was translating
EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on
x86.  This is an issue because device MMIO regions (BARs) should not
be positioned on reserved regions.  Any BARs positioned on non-hole
areas of the memory map will cause is_memory_hole() to return false,
which would then cause memory decoding to be disabled for such device.
This leads to EFI firmware malfunctions when using runtime services.

The system under which this was observed has:

EFI memory map:
[...]
 0fd00-0fe7f type=11 attr=8000100d
[...]
:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map

The device behind this BAR is:

00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
Controller (rev 09)
Subsystem: Super Micro Computer Inc Device 091c
Flags: fast devsel
Memory at fe01 (32-bit, non-prefetchable) [size=4K]well

For the record, the symptom observed in that machine was a hard freeze
when attempting to set an EFI variable (XEN_EFI_set_variable).

Fix by not adding regions with type EfiMemoryMappedIO or
EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to
be positioned there.

Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned')
Signed-off-by: Roger Pau Monné 
---
I don't understand the definition of EfiMemoryMappedIOPortSpace:

"System memory-mapped IO region that is used to translate memory
cycles to IO cycles by the processor."

But given its name I would assume it's also likely used to mark ranges
in use by PCI device BARs.

It would also be interesting to forward this information to dom0, so
it doesn't attempt to move the BARs of this device(s) around, or else
issues will arise.

And of course the device must not be passed through to domUs, as
disabling memory decoding on it can lead to a host DoS.  Not that it
makes much sense to pass devices like the one above to guests.

It also makes me wonder whether this playing with the decoding bit
that we do in Xen is safe.  It might be more resilient to only disable
memory decoding when the BARs overlap a RAM region, as that would
indeed cause issues.

We should also consider moving away from the e820 and instead using
the EFI memory map for things like is_memory_hole(), but that would
involve translating e820 memory maps into EFI format, which might be
easier and more reliable than the other way around which we currently
do.
---
 xen/arch/x86/efi/efi-boot.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 836e8c2ba1..12ad94cd71 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -169,6 +169,14 @@ static void __init 
efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable,
 
 switch ( desc->Type )
 {
+case EfiMemoryMappedIO:
+case EfiMemoryMappedIOPortSpace:
+/*
+ * There no suitable e820 memory type to represent a MMIO area
+ * except a hole.
+ */
+continue;
+
 case EfiBootServicesCode:
 case EfiBootServicesData:
 if ( map_bs )
-- 
2.37.3