[Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-12 Thread Alexey Gerasimenko
Much like normal PCI BARs or other chipset-specific memory-mapped
resources, MMCONFIG area needs space in MMIO hole, so we must allocate
it manually.

The actual MMCONFIG size depends on a number of PCI buses available which
should be covered by ECAM. Possible options are 64MB, 128MB and 256MB.
As we are limited to the bus 0 currently, thus using lowest possible
setting (64MB), #defined via PCI_MAX_MCFG_BUSES in hvmloader/config.h.
When multiple PCI buses support for Xen will be implemented,
PCI_MAX_MCFG_BUSES may be changed to calculation of the number of buses
according to results of the PCI devices enumeration.

The way to allocate MMCONFIG range in MMIO hole is similar to how other
PCI BARs are allocated. The patch extends 'bars' structure to make
it universal for any arbitrary BAR type -- either IO, MMIO, ROM or
a chipset-specific resource.

One important new field is addr_mask, which tells which bits of the base
address can (should) be written. Different address types (ROM, MMIO BAR,
PCIEXBAR) will have different addr_mask values.

For every assignable BAR range we store its size, PCI device BDF (devfn
actually) to which it belongs, BAR type (mem/io/mem64) and corresponding
register offset in device PCI conf space. This way we can insert MMCONFIG
entry into bars array in the same manner like for any other BARs. In this
case, the devfn field will point to MCH PCI device and bar_reg will
contain PCIEXBAR register offset. It will be assigned a slot in MMIO hole
later in a very same way like for plain PCI BARs, with respect to its size
alignment.

Also, to reduce code complexity, all long mem/mem64 BAR flags checks are
replaced by simple bars[i] field probing, eg.:
-if ( (bar_reg == PCI_ROM_ADDRESS) ||
- ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
-  PCI_BASE_ADDRESS_SPACE_MEMORY) )
+if ( bars[i].is_mem )

Signed-off-by: Alexey Gerasimenko 
---
 tools/firmware/hvmloader/config.h   |   4 ++
 tools/firmware/hvmloader/pci.c  | 127 
 tools/firmware/hvmloader/pci_regs.h |   2 +
 3 files changed, 106 insertions(+), 27 deletions(-)

diff --git a/tools/firmware/hvmloader/config.h 
b/tools/firmware/hvmloader/config.h
index 6fde6b7b60..5443ecd804 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
 #define PCI_ISA_DEVFN   0x08/* dev 1, fn 0 */
 #define PCI_ISA_IRQ_MASK0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
 #define PCI_ICH9_LPC_DEVFN  0xf8/* dev 31, fn 0 */
+#define PCI_MCH_DEVFN   0   /* bus 0, dev 0, func 0 */
 
 /* MMIO hole: Hardcoded defaults, which can be dynamically expanded. */
 #define PCI_MEM_END 0xfc00
 
+/* possible values are: 64, 128, 256 */
+#define PCI_MAX_MCFG_BUSES  64
+
 #define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
 
 extern unsigned long pci_mem_start, pci_mem_end;
diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 033bd20992..6de124bbd5 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -158,9 +158,10 @@ static void class_specific_pci_device_setup(uint16_t 
vendor_id,
 
 void pci_setup(void)
 {
-uint8_t is_64bar, using_64bar, bar64_relocate = 0;
+uint8_t is_64bar, using_64bar, bar64_relocate = 0, is_mem;
 uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
 uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
+uint64_t addr_mask;
 uint16_t vendor_id, device_id;
 unsigned int bar, pin, link, isa_irq;
 int is_running_on_q35 = 0;
@@ -172,10 +173,14 @@ void pci_setup(void)
 
 /* Create a list of device BARs in descending order of size. */
 struct bars {
-uint32_t is_64bar;
 uint32_t devfn;
 uint32_t bar_reg;
 uint64_t bar_sz;
+uint64_t addr_mask; /* which bits of the base address can be written */
+uint32_t bar_data;  /* initial value - BAR flags here */
+uint8_t  is_64bar;
+uint8_t  is_mem;
+uint8_t  padding[2];
 } *bars = (struct bars *)scratch_start;
 unsigned int i, nr_bars = 0;
 uint64_t mmio_hole_size = 0;
@@ -259,13 +264,21 @@ void pci_setup(void)
 bar_reg = PCI_ROM_ADDRESS;
 
 bar_data = pci_readl(devfn, bar_reg);
+
+is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
+   PCI_BASE_ADDRESS_SPACE_MEMORY) ||
+   (bar_reg == PCI_ROM_ADDRESS));
+
 if ( bar_reg != PCI_ROM_ADDRESS )
 {
-is_64bar = !!((bar_data & (PCI_BASE_ADDRESS_SPACE |
- PCI_BASE_ADDRESS_MEM_TYPE_MASK)) ==
- (PCI_BASE_ADDRESS_SPACE_MEMORY |
+is_64bar = !!(is_mem &&
+ ((bar_data & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
  PCI_BASE_ADDRESS_MEM_TYPE_64));
+
 pci_writel(devfn, bar_reg, ~

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-20 Thread Alexey G
On Tue, 20 Mar 2018 08:50:48 +
Roger Pau Monné  wrote:

>On Tue, Mar 20, 2018 at 05:49:22AM +1000, Alexey G wrote:
>> On Mon, 19 Mar 2018 15:58:02 +
>> Roger Pau Monné  wrote:
>>   
>> >On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko
>> >wrote:  
>> >> Much like normal PCI BARs or other chipset-specific memory-mapped
>> >> resources, MMCONFIG area needs space in MMIO hole, so we must
>> >> allocate it manually.
>> >> 
>> >> The actual MMCONFIG size depends on a number of PCI buses
>> >> available which should be covered by ECAM. Possible options are
>> >> 64MB, 128MB and 256MB. As we are limited to the bus 0 currently,
>> >> thus using lowest possible setting (64MB), #defined via
>> >> PCI_MAX_MCFG_BUSES in hvmloader/config.h. When multiple PCI buses
>> >> support for Xen will be implemented, PCI_MAX_MCFG_BUSES may be
>> >> changed to calculation of the number of buses according to
>> >> results of the PCI devices enumeration.
>> >> 
>> >> The way to allocate MMCONFIG range in MMIO hole is similar to how
>> >> other PCI BARs are allocated. The patch extends 'bars' structure
>> >> to make it universal for any arbitrary BAR type -- either IO,
>> >> MMIO, ROM or a chipset-specific resource.
>> >
>> >I'm not sure this is fully correct. The IOREQ interface can
>> >differentiate PCI devices and forward config space accesses to
>> >different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change
>> >you will forward all MCFG accesses to QEMU, which will likely be
>> >wrong if there are multiple PCI-device emulators for the same
>> >domain.
>> >
>> >Ie: AFAICT Xen needs to know about the MCFG emulation and detect
>> >accesses to it in order to forward them to the right emulators.
>> >
>> >Adding Paul who knows more about all this.  
>> 
>> In which use cases multiple PCI-device emulators are used for a
>> single HVM domain? Is it a proprietary setup?  
>
>Likely. I think XenGT might be using it. It's a feature of the IOREQ
>implementation in Xen.

According to public slides for the feature, both PCI conf and MMIO
accesses can be routed to the designated device model. It looks like
for this particular setup it doesn't really matter which particular
ioreq type must be used for MMCONFIG accesses -- either
IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should be
acceptable. The only thing which matters is ioreq routing itself --
making decisions to which device model the PCI conf/MMIO ioreq should
be sent.

>Traditional PCI config space accesses are not IO port space accesses.

(assuming 'not' mistyped here)

>The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and IOREQ
>servers can register devices they would like to receive configuration
>space accesses for. QEMU is already making use of this, see for

That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG
implementation is a bit inconvenient for MMCONFIG MMIO accesses -- it's
too much CF8h/CFCh-centric in its implementation, might be painful to
change something in the code which was intended for CF8h/CFCh handling
(and not for MMIO processing).

>example xen_map_pcidev in the QEMU code.
>
>By treating MCFG accesses as MMIO you are bypassing the IOREQ PCI
>layer, and thus a IOREQ server could register a PCI device and only
>receive PCI configuration accesses from the IO port space, while MCFG
>accesses would be forwarded somewhere else.

It will be handled by IOREQ too, just using a different IOREQ type
(MMIO one). The basic question is why do we have to stick to PCI conf
space ioreqs for emulating MMIO accesses to MMCONFIG.

>I think you need to make the IOREQ code aware of the MCFG area and
>XEN_DMOP_IO_RANGE_PCI needs to forward both IO space and MCFG accesses
>to the right IOREQ server.

Right now there is no way to inform Xen where the emulated MMCONFIG
area is located in order to make this decision, based on the address
within MMCONFIG range. A new dmop/hypercall is needed (with args
similar to pci_mmcfg_reserved) along with its usage in QEMU.

I'll try to summarize two different approaches to MMCONFIG
handling. For both approaches the final PCI config host interface for a
passed through device in QEMU will remain same as at the moment --
xen_host_pci_* functions in /hw/xen.


Approach #1. Informing Xen about MMCONFIG area changes and letting Xen
to translate MMIO accesses to _PCI_CONFIG ioreqs:

1. QEMU will trap accesses to PCIEXBAR, calling Xen via dmop/hypercall
to let the latter know of any MMCONFIG area address/size/status changes

2. Xen will trap MMIO accesses to the current MMCONFIG location and
convert memory accesses into one or several _PCI_CONFIG ioreqs and send
them to a chosen device model

3. QEMU will receive _PCI_CONFIG ioreqs with SBDF and 12-bit offsets
inside which it needs to somehow pass to
pci_host_config_{read,write}_common() for emulation. It might require
few hacks to make the gears turn (due to QEMU pci conf read/write
model).
At the moment emulated CF8h/CFCh ports play a special rol

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Roger Pau Monné
On Wed, Mar 21, 2018 at 10:58:40AM +1000, Alexey G wrote:
> On Tue, 20 Mar 2018 08:50:48 +
> Roger Pau Monné  wrote:
> 
> >On Tue, Mar 20, 2018 at 05:49:22AM +1000, Alexey G wrote:
> >> On Mon, 19 Mar 2018 15:58:02 +
> >> Roger Pau Monné  wrote:
> >>   
> >> >On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko
> >> >wrote:  
> >> >> Much like normal PCI BARs or other chipset-specific memory-mapped
> >> >> resources, MMCONFIG area needs space in MMIO hole, so we must
> >> >> allocate it manually.
> >> >> 
> >> >> The actual MMCONFIG size depends on a number of PCI buses
> >> >> available which should be covered by ECAM. Possible options are
> >> >> 64MB, 128MB and 256MB. As we are limited to the bus 0 currently,
> >> >> thus using lowest possible setting (64MB), #defined via
> >> >> PCI_MAX_MCFG_BUSES in hvmloader/config.h. When multiple PCI buses
> >> >> support for Xen will be implemented, PCI_MAX_MCFG_BUSES may be
> >> >> changed to calculation of the number of buses according to
> >> >> results of the PCI devices enumeration.
> >> >> 
> >> >> The way to allocate MMCONFIG range in MMIO hole is similar to how
> >> >> other PCI BARs are allocated. The patch extends 'bars' structure
> >> >> to make it universal for any arbitrary BAR type -- either IO,
> >> >> MMIO, ROM or a chipset-specific resource.
> >> >
> >> >I'm not sure this is fully correct. The IOREQ interface can
> >> >differentiate PCI devices and forward config space accesses to
> >> >different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change
> >> >you will forward all MCFG accesses to QEMU, which will likely be
> >> >wrong if there are multiple PCI-device emulators for the same
> >> >domain.
> >> >
> >> >Ie: AFAICT Xen needs to know about the MCFG emulation and detect
> >> >accesses to it in order to forward them to the right emulators.
> >> >
> >> >Adding Paul who knows more about all this.  
> >> 
> >> In which use cases multiple PCI-device emulators are used for a
> >> single HVM domain? Is it a proprietary setup?  
> >
> >Likely. I think XenGT might be using it. It's a feature of the IOREQ
> >implementation in Xen.
> 
> According to public slides for the feature, both PCI conf and MMIO
> accesses can be routed to the designated device model. It looks like
> for this particular setup it doesn't really matter which particular
> ioreq type must be used for MMCONFIG accesses -- either
> IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should be
> acceptable.

Isn't that going to be quite messy? How is the IOREQ server supposed
to decode a MCFG access received as IOREQ_TYPE_COPY?

I don't think the IOREQ server needs to know the start of the MCFG
region, in which case it won't be able to detect and decode the
access if it's of type IOREQ_TYPE_COPY.

MCFG accesses need to be sent to the IOREQ server as
IOREQ_TYPE_PCI_CONFIG, or else you are forcing each IOREQ server to
know the position of the MCFG area in order to do the decoding. In
your case this would work because QEMU controls the position of the
MCFG region, but there's no need for other IOREQ servers to know the
position of the MCFG area.

> The only thing which matters is ioreq routing itself --
> making decisions to which device model the PCI conf/MMIO ioreq should
> be sent.

Hm, see above, but I'm fairly sure you need to forward those MCFG
accesses as IOREQ_TYPE_PCI_CONFIG to the IOREQ server.

> >Traditional PCI config space accesses are not IO port space accesses.
> 
> (assuming 'not' mistyped here)

Not really, this should instead be:

"Traditional PCI config space accesses are not forwarded to the IOREQ
server as IO port space accesses (IOREQ_TYPE_PIO) but rather as PCI
config space accesses (IOREQ_TYPE_PCI_CONFIG)."

Sorry for the confusion.

> >The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and IOREQ
> >servers can register devices they would like to receive configuration
> >space accesses for. QEMU is already making use of this, see for
> 
> That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG
> implementation is a bit inconvenient for MMCONFIG MMIO accesses -- it's
> too much CF8h/CFCh-centric in its implementation, might be painful to
> change something in the code which was intended for CF8h/CFCh handling
> (and not for MMIO processing).

I'm not sure I follow. Do you mean that changes should be made to the
ioreq struct in order to forward MCFG accesses using
IOREQ_TYPE_PCI_CONFIG as it's type?

> >example xen_map_pcidev in the QEMU code.
> >
> >By treating MCFG accesses as MMIO you are bypassing the IOREQ PCI
> >layer, and thus a IOREQ server could register a PCI device and only
> >receive PCI configuration accesses from the IO port space, while MCFG
> >accesses would be forwarded somewhere else.
> 
> It will be handled by IOREQ too, just using a different IOREQ type
> (MMIO one). The basic question is why do we have to stick to PCI conf
> space ioreqs for emulating MMIO accesses to MMCONFIG.

Because other IOREQ servers

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Paul Durrant
> -Original Message-
> 
> > The question is why IOREQ_TYPE_COPY -> IOREQ_TYPE_PCI_CONFIG
> > translation is a must have thing at all? It won't make handling simpler.
> > For current QEMU implementation IOREQ_TYPE_COPY (MMIO accesses for
> > MMCONFIG) would be preferable as it allows to use the existing code.
> 
> Granted it's likely easier to implement, but it's also incorrect. You
> seem to have in mind the picture of a single IOREQ server (QEMU)
> handling all the devices.
> 
> Although this is the most common scenario, it's not the only one
> supported by Xen. Your proposed solution breaks the usage of multiple
> IOREQ servers as PCI device emulators.
> 

Indeed it will, and that is not acceptable even in the short term.

> > I think it will be safe to use MMCONFIG emulation on MMIO level for now
> > and later extend it with 'set_mmconfig_' dmop/hypercall for the
> > 'multiple device emulators' IOREQ_TYPE_COPY routing to work same as for
> > PCI conf, so it can be used by XenGT etc on Q35 as well.
> 

Introducing known breakage is not really on, particularly when it can be 
avoided with a reasonable amount of extra work.

  Paul

> I'm afraid this kind of issues would have been fairly easier to
> identify if a design document for this feature was sent to the list
> prior to it's implementation.
> 
> Regarding whether to accept something like this, I'm not really in
> favor, but IMO it depends on how much new code is added to handle this
> incorrect usage that would then go away (or would have to be changed)
> in order to handle the proper implementation.
> 
> Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Alexey G
On Wed, 21 Mar 2018 09:09:11 +
Roger Pau Monné  wrote:

>On Wed, Mar 21, 2018 at 10:58:40AM +1000, Alexey G wrote:
[...]
>> According to public slides for the feature, both PCI conf and MMIO
>> accesses can be routed to the designated device model. It looks like
>> for this particular setup it doesn't really matter which particular
>> ioreq type must be used for MMCONFIG accesses -- either
>> IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should be
>> acceptable.  
>
>Isn't that going to be quite messy? How is the IOREQ server supposed
>to decode a MCFG access received as IOREQ_TYPE_COPY?

This code is already available and in sync with QEMU legacy PCI conf
emulation infrastructure.

>I don't think the IOREQ server needs to know the start of the MCFG
>region, in which case it won't be able to detect and decode the
>access if it's of type IOREQ_TYPE_COPY.

How do you think Xen will be able to know if arbitrary MMIO
access targets MMCONFIG area and to which BDF the offset in this area
belongs, without knowing where MMCONFIG is located and what PCI bus
layout is? It's QEMU who emulate PCIEXBAR and can tell Xen where
MMCONFIG is expected to be.

>MCFG accesses need to be sent to the IOREQ server as
>IOREQ_TYPE_PCI_CONFIG, or else you are forcing each IOREQ server to
>know the position of the MCFG area in order to do the decoding. In
>your case this would work because QEMU controls the position of the
>MCFG region, but there's no need for other IOREQ servers to know the
>position of the MCFG area.
>
>> The only thing which matters is ioreq routing itself --
>> making decisions to which device model the PCI conf/MMIO ioreq should
>> be sent.  
>
>Hm, see above, but I'm fairly sure you need to forward those MCFG
>accesses as IOREQ_TYPE_PCI_CONFIG to the IOREQ server.

(a detailed answer below)

>> >Traditional PCI config space accesses are not IO port space
>> >accesses.  
>> 
>> (assuming 'not' mistyped here)  
>
>Not really, this should instead be:
>
>"Traditional PCI config space accesses are not forwarded to the IOREQ
>server as IO port space accesses (IOREQ_TYPE_PIO) but rather as PCI
>config space accesses (IOREQ_TYPE_PCI_CONFIG)."
>
>Sorry for the confusion.
>
>> >The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and
>> >IOREQ servers can register devices they would like to receive
>> >configuration space accesses for. QEMU is already making use of
>> >this, see for  
>> 
>> That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG
>> implementation is a bit inconvenient for MMCONFIG MMIO accesses --
>> it's too much CF8h/CFCh-centric in its implementation, might be
>> painful to change something in the code which was intended for
>> CF8h/CFCh handling (and not for MMIO processing).  
>
>I'm not sure I follow. Do you mean that changes should be made to the
>ioreq struct in order to forward MCFG accesses using
>IOREQ_TYPE_PCI_CONFIG as it's type?

No changes for ioreq structures needed for now.

>> It will be handled by IOREQ too, just using a different IOREQ type
>> (MMIO one). The basic question is why do we have to stick to PCI conf
>> space ioreqs for emulating MMIO accesses to MMCONFIG.  
>
>Because other IOREQ servers don't need to know about the position/size
>of the MCFG area, and cannot register MMIO ranges that cover their
>device's PCI configuration space in the MCFG region.
>
>Not to mention that it would would be a terrible design flaw to force
>IOREQ servers to register PCI devices and MCFG areas belonging to
>those devices separately as MMIO in order to trap all possible PCI
>configuration space accesses.

PCI conf space layout is shared by the emulated machine. And MMCONFIG
layout is mandated by this common PCI bus map.

Even if those 'multiple device models' see a different picture of PCI
conf space, their visions of PCI bus must not overlap + MMCONFIG layout
must be consistent between different device models.

Although it is a terrible mistake to think about the emulated PCI bus
like it's a set of distinct PCI devices unrelated to each other. It's
all coupled together. And this is especially true for PCIe.
Many PCIe features rely on PCIe device interaction in PCIe fabric, eg.
PCIe endpoints may interact with Root Complex in many ways. This
cooperation may  need to be emulated somehow, eg. to provide some
support for PM features, link management or native hotplug facilities.
Even if we have a real passed through device, we might need to provide
an emulated PCIe Switch or a Root Port for it to function properly
within the PCIe hierarchy.

Dedicating an isolated PCI device to some isolated device model --
that's what might be the design flaw, considering the PCIe world.

[...]
>
>Maybe you could detect offsets >= 256 and replay them in QEMU like
>mmio accesses? Using the address_space_write or
>pcie_mmcfg_data_read/write functions?
>I have to admit my knowledge of QEMU is quite limited, so I'm not sure
>of the best way to handle this.
>
>Ideally we should find a way 

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Alexey G
On Wed, 21 Mar 2018 09:36:04 +
Paul Durrant  wrote:
>> 
>> Although this is the most common scenario, it's not the only one
>> supported by Xen. Your proposed solution breaks the usage of multiple
>> IOREQ servers as PCI device emulators.
>
>Indeed it will, and that is not acceptable even in the short term.

Hmm, what exactly you are rejecting? QEMU's usage of established (and
provided by Xen) interfaces for QEMU to use? Any particular reason why
QEMU can use map_io_range_to_ioreq_server() in one case and can't in
another? It's API available for QEMU after all.

If we actually switch to the emulated MMCONFIG range informing approach
for Xen (via a new dmop/hypercall), who should prevent QEMU to actually
map this range via map_io_range_to_ioreq_server? QEMU itself? Or Xen?
How to will look, "QEMU asks us to map this range as emulated MMIO, but
he previously told us that emulated PCIEXBAR register points there, so
we won't allow him to do it"?

>> > I think it will be safe to use MMCONFIG emulation on MMIO level
>> > for now and later extend it with 'set_mmconfig_' dmop/hypercall
>> > for the 'multiple device emulators' IOREQ_TYPE_COPY routing to
>> > work same as for PCI conf, so it can be used by XenGT etc on Q35
>> > as well.  
>Introducing known breakage is not really on, particularly when it can
>be avoided with a reasonable amount of extra work.

It's hard to break something which doesn't exist. :) Multiple device
emulators feature do not support translation/routing of MMCONFIG MMIO
accesses currently, it must be designed first.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Paul Durrant
> -Original Message-
> From: Alexey G [mailto:x19...@gmail.com]
> Sent: 21 March 2018 14:26
> To: Roger Pau Monne 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Ian Jackson ; Jan
> Beulich ; Wei Liu ; Paul Durrant
> ; Anthony Perard ;
> Stefano Stabellini 
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Wed, 21 Mar 2018 09:09:11 +
> Roger Pau Monné  wrote:
> 
> >On Wed, Mar 21, 2018 at 10:58:40AM +1000, Alexey G wrote:
> [...]
> >> According to public slides for the feature, both PCI conf and MMIO
> >> accesses can be routed to the designated device model. It looks like
> >> for this particular setup it doesn't really matter which particular
> >> ioreq type must be used for MMCONFIG accesses -- either
> >> IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should
> be
> >> acceptable.
> >
> >Isn't that going to be quite messy? How is the IOREQ server supposed
> >to decode a MCFG access received as IOREQ_TYPE_COPY?
> 
> This code is already available and in sync with QEMU legacy PCI conf
> emulation infrastructure.
> 
> >I don't think the IOREQ server needs to know the start of the MCFG
> >region, in which case it won't be able to detect and decode the
> >access if it's of type IOREQ_TYPE_COPY.
> 
> How do you think Xen will be able to know if arbitrary MMIO
> access targets MMCONFIG area and to which BDF the offset in this area
> belongs, without knowing where MMCONFIG is located and what PCI bus
> layout is? It's QEMU who emulate PCIEXBAR and can tell Xen where
> MMCONFIG is expected to be.
> 
> >MCFG accesses need to be sent to the IOREQ server as
> >IOREQ_TYPE_PCI_CONFIG, or else you are forcing each IOREQ server to
> >know the position of the MCFG area in order to do the decoding. In
> >your case this would work because QEMU controls the position of the
> >MCFG region, but there's no need for other IOREQ servers to know the
> >position of the MCFG area.
> >
> >> The only thing which matters is ioreq routing itself --
> >> making decisions to which device model the PCI conf/MMIO ioreq should
> >> be sent.
> >
> >Hm, see above, but I'm fairly sure you need to forward those MCFG
> >accesses as IOREQ_TYPE_PCI_CONFIG to the IOREQ server.
> 
> (a detailed answer below)
> 
> >> >Traditional PCI config space accesses are not IO port space
> >> >accesses.
> >>
> >> (assuming 'not' mistyped here)
> >
> >Not really, this should instead be:
> >
> >"Traditional PCI config space accesses are not forwarded to the IOREQ
> >server as IO port space accesses (IOREQ_TYPE_PIO) but rather as PCI
> >config space accesses (IOREQ_TYPE_PCI_CONFIG)."
> >
> >Sorry for the confusion.
> >
> >> >The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and
> >> >IOREQ servers can register devices they would like to receive
> >> >configuration space accesses for. QEMU is already making use of
> >> >this, see for
> >>
> >> That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG
> >> implementation is a bit inconvenient for MMCONFIG MMIO accesses --
> >> it's too much CF8h/CFCh-centric in its implementation, might be
> >> painful to change something in the code which was intended for
> >> CF8h/CFCh handling (and not for MMIO processing).
> >
> >I'm not sure I follow. Do you mean that changes should be made to the
> >ioreq struct in order to forward MCFG accesses using
> >IOREQ_TYPE_PCI_CONFIG as it's type?
> 
> No changes for ioreq structures needed for now.
> 
> >> It will be handled by IOREQ too, just using a different IOREQ type
> >> (MMIO one). The basic question is why do we have to stick to PCI conf
> >> space ioreqs for emulating MMIO accesses to MMCONFIG.
> >
> >Because other IOREQ servers don't need to know about the position/size
> >of the MCFG area, and cannot register MMIO ranges that cover their
> >device's PCI configuration space in the MCFG region.
> >
> >Not to mention that it would would be a terrible design flaw to force
> >IOREQ servers to register PCI devices and MCFG areas belonging to
> >those devices separately as MMIO in order to trap all possible PCI
> >configuration space accesses.
> 
> PCI conf space layout is shared by the emulated machine. And MMCONFIG
> layout is mandated by this common PCI bus map.
> 
> Even

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Paul Durrant
> -Original Message-
> From: Alexey G [mailto:x19...@gmail.com]
> Sent: 21 March 2018 14:35
> To: Paul Durrant 
> Cc: Roger Pau Monne ; xen-
> de...@lists.xenproject.org; Andrew Cooper ;
> Ian Jackson ; Jan Beulich ;
> Wei Liu 
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Wed, 21 Mar 2018 09:36:04 +
> Paul Durrant  wrote:
> >>
> >> Although this is the most common scenario, it's not the only one
> >> supported by Xen. Your proposed solution breaks the usage of multiple
> >> IOREQ servers as PCI device emulators.
> >
> >Indeed it will, and that is not acceptable even in the short term.
> 
> Hmm, what exactly you are rejecting? QEMU's usage of established (and
> provided by Xen) interfaces for QEMU to use? Any particular reason why
> QEMU can use map_io_range_to_ioreq_server() in one case and can't in
> another? It's API available for QEMU after all.
> 
> If we actually switch to the emulated MMCONFIG range informing approach
> for Xen (via a new dmop/hypercall), who should prevent QEMU to actually
> map this range via map_io_range_to_ioreq_server? QEMU itself? Or Xen?

Xen internal emulation always trumps any external emulator, so even if QEMU 
maps an MMIO range it will not see any accesses if Xen is handling emulation of 
that range.

> How to will look, "QEMU asks us to map this range as emulated MMIO, but
> he previously told us that emulated PCIEXBAR register points there, so
> we won't allow him to do it"?
> 
> >> > I think it will be safe to use MMCONFIG emulation on MMIO level
> >> > for now and later extend it with 'set_mmconfig_' dmop/hypercall
> >> > for the 'multiple device emulators' IOREQ_TYPE_COPY routing to
> >> > work same as for PCI conf, so it can be used by XenGT etc on Q35
> >> > as well.
> >Introducing known breakage is not really on, particularly when it can
> >be avoided with a reasonable amount of extra work.
> 
> It's hard to break something which doesn't exist. :) Multiple device
> emulators feature do not support translation/routing of MMCONFIG MMIO
> accesses currently, it must be designed first.

Indeed, but updating to a new chipset emulation that breaks existing 
functionality is not going to be helpful.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Roger Pau Monné
On Thu, Mar 22, 2018 at 12:25:40AM +1000, Alexey G wrote:
> Roger, Paul,
> 
> Here is what you suggest, just to clarify:
> 
> 1. Add to Xen a new hypercall (+corresponding dmop) so QEMU can tell
> Xen where QEMU emulates machine's MMCONFIG (chipset-specific emulation
> of PCIEXBAR/HECBASE/etc mmcfg relocation). Xen will rely on this
> information to know to which PCI device the address within MMCONFIG
> belong.
> 
> 2. Xen will trap this area + remap its trapping to other address if QEMU
> will inform Xen about emulated PCIEXBAR value change
> 
> 3. Every MMIO access to the current MMCONFIG range will be converted
> into BDF first (by offset within this range, knowing where the range is)
> 
> 4. Target device model is selected using calculated BDF
> 
> 5. MMIO read/write accesses are converted into PCI config space ioreqs
> (like it was a CF8/CFCh operation instead of MMIO access). At this
> point ioreq structure allows to specify extended PCI conf offset
> (12-bit), so it will fit into PCI conf ioreq. For now let's assume that
> eg. a 64-bit memory operation is either aborted or workarounded by
> splitting this operation into multiple PCI conf ioreqs.

Why can't you just set size = 8 in that case in the ioreq?

QEMU should then reject those if the chipset doesn't support 64bit
accesses. I cannot find in the spec any mention of whether this
chipset supports 64bit MCFG accesses, and according to the PCIe spec
64bit accesses to MCFG should not be used unless the chipset is known
to handle them correctly.

> 6. PCI conf read/write ioreqs are sent to the chosen device model
> 
> 7. QEMU receive MMCONFIG memory reads/writes as PCI conf reads/writes
> 
> 8. As these MMCONFIG PCI conf reads occur out of context (just
> address/len/data without any emulated device attached to it), xen-hvm.c
> should employ special logic to make it QEMU-friendly -- eg. right now
> it sends received PCI conf access into (emulated by QEMU) CF8h/CFCh
> ports.
> There is a real problem to embed these "naked" accesses into QEMU
> infrastructure, workarounds are required. BTW, find_primary_bus() was
> dropped from QEMU code -- it could've been useful here. Let's assume
> some workaround is employed (like storing a required object pointers in
> global variables for later use in xen-hvm.c)

That seems like a minor nit, but why not just use
address_space_{read/write} to replay the MCFG accesses as memory
read/writes?

> 
> 9. Existing MMCONFIG-handling code in QEMU will be unused in this
> scenario

If you replay the read/write I don't think so. In any case this is
irrelevant. QEMU CPU emulation code is also unused when running under
Xen.

> 10. All this needed primarily to make the specific "Multiple device
> emulators" feature to work (XenGT was mentioned as its user) on Q35
> with MMCONFIG.
> 
> Anything wrong/missing here?

I think that's correct.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Alexey G
On Wed, 21 Mar 2018 15:20:17 +
Roger Pau Monné  wrote:

>On Thu, Mar 22, 2018 at 12:25:40AM +1000, Alexey G wrote:
>> Roger, Paul,
>> 
>> Here is what you suggest, just to clarify:
>> 
>> 1. Add to Xen a new hypercall (+corresponding dmop) so QEMU can tell
>> Xen where QEMU emulates machine's MMCONFIG (chipset-specific
>> emulation of PCIEXBAR/HECBASE/etc mmcfg relocation). Xen will rely
>> on this information to know to which PCI device the address within
>> MMCONFIG belong.
>> 
>> 2. Xen will trap this area + remap its trapping to other address if
>> QEMU will inform Xen about emulated PCIEXBAR value change
>> 
>> 3. Every MMIO access to the current MMCONFIG range will be converted
>> into BDF first (by offset within this range, knowing where the range
>> is)
>> 
>> 4. Target device model is selected using calculated BDF
>> 
>> 5. MMIO read/write accesses are converted into PCI config space
>> ioreqs (like it was a CF8/CFCh operation instead of MMIO access). At
>> this point ioreq structure allows to specify extended PCI conf offset
>> (12-bit), so it will fit into PCI conf ioreq. For now let's assume
>> that eg. a 64-bit memory operation is either aborted or workarounded
>> by splitting this operation into multiple PCI conf ioreqs.  
>
>Why can't you just set size = 8 in that case in the ioreq?
>
>QEMU should then reject those if the chipset doesn't support 64bit
>accesses. I cannot find in the spec any mention of whether this
>chipset supports 64bit MCFG accesses, and according to the PCIe spec
>64bit accesses to MCFG should not be used unless the chipset is known
>to handle them correctly.
Yes, uint64_t should be enough in this particular case in fact, though
memory nature of MMCONFIG accesses might still require to provide
specific handling.


All right then, so it will be a dmop/hypercall to tell Xen where to
trap MMIO accesses to MMCONFIG as you propose.

The primary device model (QEMU) will be emulating chipset-specific
PCIEXBAR/etc and issuing this new dmop to tell Xen which area it needs
to trap for MMIO MMCONFIG acceses. It's basically what
map_io_range_to_ioreq_server does currently, but I guess a new dedicated
dmop/hypercall is bearable.

>> 6. PCI conf read/write ioreqs are sent to the chosen device model
>> 
>> 7. QEMU receive MMCONFIG memory reads/writes as PCI conf reads/writes
>> 
>> 8. As these MMCONFIG PCI conf reads occur out of context (just
>> address/len/data without any emulated device attached to it),
>> xen-hvm.c should employ special logic to make it QEMU-friendly --
>> eg. right now it sends received PCI conf access into (emulated by
>> QEMU) CF8h/CFCh ports.
>> There is a real problem to embed these "naked" accesses into QEMU
>> infrastructure, workarounds are required. BTW, find_primary_bus() was
>> dropped from QEMU code -- it could've been useful here. Let's assume
>> some workaround is employed (like storing a required object pointers
>> in global variables for later use in xen-hvm.c)  
>
>That seems like a minor nit, but why not just use
>address_space_{read/write} to replay the MCFG accesses as memory
>read/writes?

Well, this might work actually. Although the overall scenario will be
overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it will look:

QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone is
accessing this area -> Xen intercepts this MMIO access

But here's what happens next:

Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back to
the offset in emulated MMCONFIG range -> DM calls
address_space_read/write to trigger MMIO emulation

I tnink some parts of this equation can be collapsed, isn't it?

Above scenario makes it obvious that at least for QEMU the MMIO->PCI
conf translation is a redundant step. Why not to allow specifying for DM
whether it prefers to receive MMCONFIG accesses as native (MMIO ones)
or as translated PCI conf ioreqs? We can still route either ioreq
type to multiple device emulators accordingly.

This will be the most universal and consistent approach -- either _COPY
or _PCI_CONFIG-type ioreqs can be sent to DM, whatever it likes more.

>> 9. Existing MMCONFIG-handling code in QEMU will be unused in this
>> scenario  
>
>If you replay the read/write I don't think so. In any case this is
>irrelevant. QEMU CPU emulation code is also unused when running under
>Xen.
>
>> 10. All this needed primarily to make the specific "Multiple device
>> emulators" feature to work (XenGT was mentioned as its user) on Q35
>> with MMCONFIG.
>> 
>> Anything wrong/missing here?  
>
>I think that's correct.
>
>Thanks, Roger.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Paul Durrant
> -Original Message-
> From: Alexey G [mailto:x19...@gmail.com]
> Sent: 21 March 2018 16:57
> To: Roger Pau Monne 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Ian Jackson ; Jan
> Beulich ; Wei Liu ; Paul Durrant
> ; Anthony Perard ;
> Stefano Stabellini 
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Wed, 21 Mar 2018 15:20:17 +
> Roger Pau Monné  wrote:
> 
> >On Thu, Mar 22, 2018 at 12:25:40AM +1000, Alexey G wrote:
> >> Roger, Paul,
> >>
> >> Here is what you suggest, just to clarify:
> >>
> >> 1. Add to Xen a new hypercall (+corresponding dmop) so QEMU can tell
> >> Xen where QEMU emulates machine's MMCONFIG (chipset-specific
> >> emulation of PCIEXBAR/HECBASE/etc mmcfg relocation). Xen will rely
> >> on this information to know to which PCI device the address within
> >> MMCONFIG belong.
> >>
> >> 2. Xen will trap this area + remap its trapping to other address if
> >> QEMU will inform Xen about emulated PCIEXBAR value change
> >>
> >> 3. Every MMIO access to the current MMCONFIG range will be converted
> >> into BDF first (by offset within this range, knowing where the range
> >> is)
> >>
> >> 4. Target device model is selected using calculated BDF
> >>
> >> 5. MMIO read/write accesses are converted into PCI config space
> >> ioreqs (like it was a CF8/CFCh operation instead of MMIO access). At
> >> this point ioreq structure allows to specify extended PCI conf offset
> >> (12-bit), so it will fit into PCI conf ioreq. For now let's assume
> >> that eg. a 64-bit memory operation is either aborted or workarounded
> >> by splitting this operation into multiple PCI conf ioreqs.
> >
> >Why can't you just set size = 8 in that case in the ioreq?
> >
> >QEMU should then reject those if the chipset doesn't support 64bit
> >accesses. I cannot find in the spec any mention of whether this
> >chipset supports 64bit MCFG accesses, and according to the PCIe spec
> >64bit accesses to MCFG should not be used unless the chipset is known
> >to handle them correctly.
> Yes, uint64_t should be enough in this particular case in fact, though
> memory nature of MMCONFIG accesses might still require to provide
> specific handling.
> 
> 
> All right then, so it will be a dmop/hypercall to tell Xen where to
> trap MMIO accesses to MMCONFIG as you propose.
> 
> The primary device model (QEMU) will be emulating chipset-specific
> PCIEXBAR/etc and issuing this new dmop to tell Xen which area it needs
> to trap for MMIO MMCONFIG acceses. It's basically what
> map_io_range_to_ioreq_server does currently, but I guess a new dedicated
> dmop/hypercall is bearable.
> 
> >> 6. PCI conf read/write ioreqs are sent to the chosen device model
> >>
> >> 7. QEMU receive MMCONFIG memory reads/writes as PCI conf
> reads/writes
> >>
> >> 8. As these MMCONFIG PCI conf reads occur out of context (just
> >> address/len/data without any emulated device attached to it),
> >> xen-hvm.c should employ special logic to make it QEMU-friendly --
> >> eg. right now it sends received PCI conf access into (emulated by
> >> QEMU) CF8h/CFCh ports.
> >> There is a real problem to embed these "naked" accesses into QEMU
> >> infrastructure, workarounds are required. BTW, find_primary_bus() was
> >> dropped from QEMU code -- it could've been useful here. Let's assume
> >> some workaround is employed (like storing a required object pointers
> >> in global variables for later use in xen-hvm.c)
> >
> >That seems like a minor nit, but why not just use
> >address_space_{read/write} to replay the MCFG accesses as memory
> >read/writes?
> 
> Well, this might work actually. Although the overall scenario will be
> overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it will look:
> 
> QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
> MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone
> is
> accessing this area -> Xen intercepts this MMIO access
> 
> But here's what happens next:
> 
> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back to
> the offset in emulated MMCONFIG range -> DM calls
> address_space_read/write to trigger MMIO emulation
> 

That would only be true of a dm that cannot handle PCI config ioreqs directl

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Roger Pau Monné
On Thu, Mar 22, 2018 at 02:56:56AM +1000, Alexey G wrote:
> On Wed, 21 Mar 2018 15:20:17 +
> Roger Pau Monné  wrote:
> 
> >On Thu, Mar 22, 2018 at 12:25:40AM +1000, Alexey G wrote:
> >> 8. As these MMCONFIG PCI conf reads occur out of context (just
> >> address/len/data without any emulated device attached to it),
> >> xen-hvm.c should employ special logic to make it QEMU-friendly --
> >> eg. right now it sends received PCI conf access into (emulated by
> >> QEMU) CF8h/CFCh ports.
> >> There is a real problem to embed these "naked" accesses into QEMU
> >> infrastructure, workarounds are required. BTW, find_primary_bus() was
> >> dropped from QEMU code -- it could've been useful here. Let's assume
> >> some workaround is employed (like storing a required object pointers
> >> in global variables for later use in xen-hvm.c)  
> >
> >That seems like a minor nit, but why not just use
> >address_space_{read/write} to replay the MCFG accesses as memory
> >read/writes?
> 
> Well, this might work actually. Although the overall scenario will be
> overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it will look:
> 
> QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
> MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone is
> accessing this area -> Xen intercepts this MMIO access
> 
> But here's what happens next:
> 
> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back to
> the offset in emulated MMCONFIG range -> DM calls
> address_space_read/write to trigger MMIO emulation
> 
> I tnink some parts of this equation can be collapsed, isn't it?
> 
> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
> conf translation is a redundant step. Why not to allow specifying for DM
> whether it prefers to receive MMCONFIG accesses as native (MMIO ones)
> or as translated PCI conf ioreqs?

You are just adding an extra level of complexity to an interface
that's fairly simple. You register a PCI device using
XEN_DMOP_IO_RANGE_PCI and you get IOREQ_TYPE_PCI_CONFIG ioreqs.
Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI config
space access is misleading.

In both cases Xen would have to do the MCFG access decoding in order
to figure out which IOREQ server will handle the request. At which
point the only step that you avoid is the reconstruction of the memory
access from the IOREQ_TYPE_PCI_CONFIG which is trivial.

> We can still route either ioreq
> type to multiple device emulators accordingly.

It's exactly the same that's done for IO space PCI config space
addresses. QEMU gets an IOREQ_TYPE_PCI_CONFIG and it replays the IO
space access using do_outp and cpu_ioreq_pio.

If you think using IOREQ_TYPE_COPY for MCFG accesses is such a benefit
for QEMU, why not just translate the IOREQ_TYPE_PCI_CONFIG into
IOREQ_TYPE_COPY in handle_ioreq and dispatch it using
cpu_ioreq_move?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Alexey G
On Wed, 21 Mar 2018 14:54:16 +
Paul Durrant  wrote:

>> -Original Message-
>> From: Alexey G [mailto:x19...@gmail.com]
>> Sent: 21 March 2018 14:26
>> To: Roger Pau Monne 
>> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
>> ; Ian Jackson ;
>> Jan Beulich ; Wei Liu ; Paul
>> Durrant ; Anthony Perard
>> ; Stefano Stabellini
>>  Subject: Re: [Xen-devel] [RFC PATCH 07/12]
>> hvmloader: allocate MMCONFIG area in the MMIO hole + minor code
>> refactoring
>> 
>> On Wed, 21 Mar 2018 09:09:11 +
>> Roger Pau Monné  wrote:
>>   
>> >On Wed, Mar 21, 2018 at 10:58:40AM +1000, Alexey G wrote:  
>> [...]  
>> >> According to public slides for the feature, both PCI conf and MMIO
>> >> accesses can be routed to the designated device model. It looks
>> >> like for this particular setup it doesn't really matter which
>> >> particular ioreq type must be used for MMCONFIG accesses -- either
>> >> IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should  
>> be  
>> >> acceptable.  
>> >
>> >Isn't that going to be quite messy? How is the IOREQ server supposed
>> >to decode a MCFG access received as IOREQ_TYPE_COPY?  
>> 
>> This code is already available and in sync with QEMU legacy PCI conf
>> emulation infrastructure.
>>   
>> >I don't think the IOREQ server needs to know the start of the MCFG
>> >region, in which case it won't be able to detect and decode the
>> >access if it's of type IOREQ_TYPE_COPY.  
>> 
>> How do you think Xen will be able to know if arbitrary MMIO
>> access targets MMCONFIG area and to which BDF the offset in this area
>> belongs, without knowing where MMCONFIG is located and what PCI bus
>> layout is? It's QEMU who emulate PCIEXBAR and can tell Xen where
>> MMCONFIG is expected to be.
>>   
>> >MCFG accesses need to be sent to the IOREQ server as
>> >IOREQ_TYPE_PCI_CONFIG, or else you are forcing each IOREQ server to
>> >know the position of the MCFG area in order to do the decoding. In
>> >your case this would work because QEMU controls the position of the
>> >MCFG region, but there's no need for other IOREQ servers to know the
>> >position of the MCFG area.
>> >  
>> >> The only thing which matters is ioreq routing itself --
>> >> making decisions to which device model the PCI conf/MMIO ioreq
>> >> should be sent.  
>> >
>> >Hm, see above, but I'm fairly sure you need to forward those MCFG
>> >accesses as IOREQ_TYPE_PCI_CONFIG to the IOREQ server.  
>> 
>> (a detailed answer below)
>>   
>> >> >Traditional PCI config space accesses are not IO port space
>> >> >accesses.  
>> >>
>> >> (assuming 'not' mistyped here)  
>> >
>> >Not really, this should instead be:
>> >
>> >"Traditional PCI config space accesses are not forwarded to the
>> >IOREQ server as IO port space accesses (IOREQ_TYPE_PIO) but rather
>> >as PCI config space accesses (IOREQ_TYPE_PCI_CONFIG)."
>> >
>> >Sorry for the confusion.
>> >  
>> >> >The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and
>> >> >IOREQ servers can register devices they would like to receive
>> >> >configuration space accesses for. QEMU is already making use of
>> >> >this, see for  
>> >>
>> >> That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG
>> >> implementation is a bit inconvenient for MMCONFIG MMIO accesses --
>> >> it's too much CF8h/CFCh-centric in its implementation, might be
>> >> painful to change something in the code which was intended for
>> >> CF8h/CFCh handling (and not for MMIO processing).  
>> >
>> >I'm not sure I follow. Do you mean that changes should be made to
>> >the ioreq struct in order to forward MCFG accesses using
>> >IOREQ_TYPE_PCI_CONFIG as it's type?  
>> 
>> No changes for ioreq structures needed for now.
>>   
>> >> It will be handled by IOREQ too, just using a different IOREQ type
>> >> (MMIO one). The basic question is why do we have to stick to PCI
>> >> conf space ioreqs for emulating MMIO accesses to MMCONFIG.  
>> >
>> >Because other IOREQ servers don't need to know about the
>> >position/size of the MCFG area, and cannot register MMIO ranges
>> >that cover the

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Alexey G
On Wed, 21 Mar 2018 17:15:04 +
Roger Pau Monné  wrote:
[...]
>> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
>> conf translation is a redundant step. Why not to allow specifying
>> for DM whether it prefers to receive MMCONFIG accesses as native
>> (MMIO ones) or as translated PCI conf ioreqs?  
>
>You are just adding an extra level of complexity to an interface
>that's fairly simple. You register a PCI device using
>XEN_DMOP_IO_RANGE_PCI and you get IOREQ_TYPE_PCI_CONFIG ioreqs.

Yes, and it is still needed as we have two distinct (and not equal)
interfaces to PCI conf space. Apart from 0..FFh range overlapping they
can be considered very different interfaces. And whether it is a real
system or emulated -- we can use either one of these two interfaces or
both.

For QEMU zero changes are needed to support MMCONFIG MMIO accesses if
they come as MMIO ioreqs. It's just what its MMCONFIG emulation code
expects.
Anyway, for (kind of vague) users of the multiple ioreq servers
capability we can enable MMIO translation to PCI conf ioreqs. Note that
actually this is an extra step, not forwarding trapped MMCONFIG MMIO
accesses to the selected device model as is.

>Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI config
>space access is misleading.

These are very different accesses, both in transport and capabilities.

>In both cases Xen would have to do the MCFG access decoding in order
>to figure out which IOREQ server will handle the request. At which
>point the only step that you avoid is the reconstruction of the memory
>access from the IOREQ_TYPE_PCI_CONFIG which is trivial.

The "reconstruction of the memory access" you mentioned won't be easy
actually. The thing is, address_space_read/write is not all what we
need.

In order to translate PCI conf ioreqs back to emulated MMIO ops, we
need to be an involved party, mainly to know where MMCONFIG area is
located so we can construct the address within its range from BDF.
This piece of information is destroyed in the process of MMIO ioreq
translation to PCI conf type.

The code which parse PCI conf ioreqs in xen-hvm.c doesn't know anything
about the current emulated MMCONFIG state. The correct way to have this
info is to participate in its emulation. As we don't participate, we
have no other way than trying to gain backdoor access to PCIHost fields
via things like object_resolve_*(). This solution is cumbersome and
ugly but will work... and may break anytime due to changes in QEMU. 

QEMU maintainers will grin while looking at all this I'm afraid --
trapped MMIO accesses which are translated to PCI conf accesses which
in turn translated back to emulated MMIO accesses upon receiving, along
with tedious attempts to gain access to MMCONFIG-related info as we're
not invited to the MMCONFIG emulation party.

The more I think about it, the more I like the existing
map_io_range_to_ioreq_server() approach. :( It works without doing
anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
working as expected. There is a problem to make it compatible with
the specific multiple ioreq servers feature, but providing a new
dmop/hypercall (which you suggest is a must have thing to trap MMCONFIG
MMIO to give QEMU only the freedom to tell where it is located) allows
to solve this problem in any possible way, either MMIO -> PCI conf
translation or anything else.

>> We can still route either ioreq
>> type to multiple device emulators accordingly.  
>
>It's exactly the same that's done for IO space PCI config space
>addresses. QEMU gets an IOREQ_TYPE_PCI_CONFIG and it replays the IO
>space access using do_outp and cpu_ioreq_pio.

...And it is completely limited to basic PCI conf space. I don't know
the context of this line in xen-hvm.c:

val = (1u << 31) | ((req->addr & 0x0f00) << 16) | ((sbdf & 0x) << 8)
   | (req->addr & 0xfc);

but seems like current QEMU versions do not expect anything similar to
AMD ECS-style accesses for 0CF8h. It is limited to basic PCI conf only.

>If you think using IOREQ_TYPE_COPY for MCFG accesses is such a benefit
>for QEMU, why not just translate the IOREQ_TYPE_PCI_CONFIG into
>IOREQ_TYPE_COPY in handle_ioreq and dispatch it using
>cpu_ioreq_move?

Answered above, we need to somehow have access to the info which don't
belong to us for this step.

>Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-21 Thread Alexey G
On Wed, 21 Mar 2018 17:06:28 +
Paul Durrant  wrote:
[...]
>> Well, this might work actually. Although the overall scenario will be
>> overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it will
>> look:
>> 
>> QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
>> MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone
>> is
>> accessing this area -> Xen intercepts this MMIO access
>> 
>> But here's what happens next:
>> 
>> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
>> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back to
>> the offset in emulated MMCONFIG range -> DM calls
>> address_space_read/write to trigger MMIO emulation
>>   
>
>That would only be true of a dm that cannot handle PCI config ioreqs
>directly.

It's just a bit problematic for xen-hvm.c (Xen ioreq processor in QEMU).

It receives these PCI conf ioreqs out of any context. To workaround
this, existing code issues I/O to emulated CF8h/CFCh ports in order to
allow QEMU to find their target. But we can't use the same method for
MMCONFIG accesses -- this works for basic PCI conf space only.

We need to either locate PCIBus/PCIDevice manually via object lookups
and then proceed to something like pci_host_config_read_common(), or to
convert the PCI conf access into the emulated MMIO access... again, a
required piece of information is missing -- we need to somehow learn the
current MMCONFIG address to recreate the memory address to be emulated.

Let's put it simply -- the goal to make PCI conf ioreqs to reach their
MMCONFIG targets in xen-hvm.c is easily achievable but it will look like
a hack. MMIO ioreqs are preferable for MMCONFIG -- no extra logic
needed for them, we can directly pass them for emulation in a way
somewhat reminiscent of CF8h/CFCh replay, except for memory.

Ideally it would be PCI conf ioreq translation for supplemental device
emulators while skipping this translation for QEMU. QEMU expects PCI
config ioreqs only for CF8/CFC accesses. I assume it's DEMU/VGPU which
of primary concern here, not experimental users like XenGT.

>  Paul
>
>> I tnink some parts of this equation can be collapsed, isn't it?
>> 
>> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
>> conf translation is a redundant step. Why not to allow specifying
>> for DM whether it prefers to receive MMCONFIG accesses as native
>> (MMIO ones) or as translated PCI conf ioreqs? We can still route
>> either ioreq type to multiple device emulators accordingly.
>> 
>> This will be the most universal and consistent approach -- either
>> _COPY or _PCI_CONFIG-type ioreqs can be sent to DM, whatever it
>> likes more. 
>> >> 9. Existing MMCONFIG-handling code in QEMU will be unused in this
>> >> scenario  
>> >
>> >If you replay the read/write I don't think so. In any case this is
>> >irrelevant. QEMU CPU emulation code is also unused when running
>> >under Xen.
>> >  
>> >> 10. All this needed primarily to make the specific "Multiple
>> >> device emulators" feature to work (XenGT was mentioned as its
>> >> user) on Q35 with MMCONFIG.
>> >>
>> >> Anything wrong/missing here?  
>> >
>> >I think that's correct.
>> >
>> >Thanks, Roger.  
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Jan Beulich
>>> On 22.03.18 at 01:31,  wrote:
> On Wed, 21 Mar 2018 17:06:28 +
> Paul Durrant  wrote:
> [...]
>>> Well, this might work actually. Although the overall scenario will be
>>> overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it will
>>> look:
>>> 
>>> QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
>>> MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone
>>> is
>>> accessing this area -> Xen intercepts this MMIO access
>>> 
>>> But here's what happens next:
>>> 
>>> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
>>> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back to
>>> the offset in emulated MMCONFIG range -> DM calls
>>> address_space_read/write to trigger MMIO emulation
>>>   
>>
>>That would only be true of a dm that cannot handle PCI config ioreqs
>>directly.
> 
> It's just a bit problematic for xen-hvm.c (Xen ioreq processor in QEMU).
> 
> It receives these PCI conf ioreqs out of any context. To workaround
> this, existing code issues I/O to emulated CF8h/CFCh ports in order to
> allow QEMU to find their target. But we can't use the same method for
> MMCONFIG accesses -- this works for basic PCI conf space only.

I think you want to view this the other way around: No physical
device would ever get to see MMCFG accesses (or CF8/CFC port
ones). This same layering is what we should have in the
virtualized case.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Paul Durrant
> -Original Message-
> From: Alexey G [mailto:x19...@gmail.com]
> Sent: 21 March 2018 22:50
> To: Roger Pau Monne 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Ian Jackson ; Jan
> Beulich ; Wei Liu ; Paul Durrant
> ; Anthony Perard ;
> Stefano Stabellini 
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Wed, 21 Mar 2018 17:15:04 +
> Roger Pau Monné  wrote:
> [...]
> >> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
> >> conf translation is a redundant step. Why not to allow specifying
> >> for DM whether it prefers to receive MMCONFIG accesses as native
> >> (MMIO ones) or as translated PCI conf ioreqs?
> >
> >You are just adding an extra level of complexity to an interface
> >that's fairly simple. You register a PCI device using
> >XEN_DMOP_IO_RANGE_PCI and you get IOREQ_TYPE_PCI_CONFIG ioreqs.
> 
> Yes, and it is still needed as we have two distinct (and not equal)
> interfaces to PCI conf space. Apart from 0..FFh range overlapping they
> can be considered very different interfaces. And whether it is a real
> system or emulated -- we can use either one of these two interfaces or
> both.
> 
> For QEMU zero changes are needed to support MMCONFIG MMIO accesses
> if
> they come as MMIO ioreqs. It's just what its MMCONFIG emulation code
> expects.
> Anyway, for (kind of vague) users of the multiple ioreq servers
> capability we can enable MMIO translation to PCI conf ioreqs. Note that
> actually this is an extra step, not forwarding trapped MMCONFIG MMIO
> accesses to the selected device model as is.
> 
> >Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI
> config
> >space access is misleading.
> 
> These are very different accesses, both in transport and capabilities.
> 
> >In both cases Xen would have to do the MCFG access decoding in order
> >to figure out which IOREQ server will handle the request. At which
> >point the only step that you avoid is the reconstruction of the memory
> >access from the IOREQ_TYPE_PCI_CONFIG which is trivial.
> 
> The "reconstruction of the memory access" you mentioned won't be easy
> actually. The thing is, address_space_read/write is not all what we
> need.
> 
> In order to translate PCI conf ioreqs back to emulated MMIO ops, we
> need to be an involved party, mainly to know where MMCONFIG area is
> located so we can construct the address within its range from BDF.
> This piece of information is destroyed in the process of MMIO ioreq
> translation to PCI conf type.
> 
> The code which parse PCI conf ioreqs in xen-hvm.c doesn't know anything
> about the current emulated MMCONFIG state. The correct way to have this
> info is to participate in its emulation. As we don't participate, we
> have no other way than trying to gain backdoor access to PCIHost fields
> via things like object_resolve_*(). This solution is cumbersome and
> ugly but will work... and may break anytime due to changes in QEMU.
> 
> QEMU maintainers will grin while looking at all this I'm afraid --
> trapped MMIO accesses which are translated to PCI conf accesses which
> in turn translated back to emulated MMIO accesses upon receiving, along
> with tedious attempts to gain access to MMCONFIG-related info as we're
> not invited to the MMCONFIG emulation party.
> 
> The more I think about it, the more I like the existing
> map_io_range_to_ioreq_server() approach. :( It works without doing
> anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
> working as expected. There is a problem to make it compatible with
> the specific multiple ioreq servers feature, but providing a new
> dmop/hypercall (which you suggest is a must have thing to trap MMCONFIG
> MMIO to give QEMU only the freedom to tell where it is located) allows
> to solve this problem in any possible way, either MMIO -> PCI conf
> translation or anything else.
> 

I don't think we even want QEMU to have the freedom to say where the MMCONFIG 
areas are located, do we? QEMU is not in charge of the guest memory map and it 
is not responsible for the building the MCFG table, Xen is. So it should be Xen 
that decides where the MMCONFIG area goes for each registered PCI device and it 
should be Xen that adds that to the MCFG table. It should be Xen that handles 
the MMCONFIG MMIO accesses and these should be forwarded to QEMU as PCI config 
IOREQs.
Now, it may be that we need to introduce a Xen specific mechanism into QEMU to 
then route those config space transactions to the device models but that would 
be an improvement over the current cf8/cfc 

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 03:04:16 -0600
"Jan Beulich"  wrote:

 On 22.03.18 at 01:31,  wrote:  
>> On Wed, 21 Mar 2018 17:06:28 +
>> Paul Durrant  wrote:
>> [...]  
 Well, this might work actually. Although the overall scenario will
 be overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it
 will look:
 
 QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen new
 MMCONFIG address/size -> Xen (re)maps MMIO trapping area -> someone
 is
 accessing this area -> Xen intercepts this MMIO access
 
 But here's what happens next:
 
 Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
 DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back
 to the offset in emulated MMCONFIG range -> DM calls
 address_space_read/write to trigger MMIO emulation
 
>>>
>>>That would only be true of a dm that cannot handle PCI config ioreqs
>>>directly.  
>> 
>> It's just a bit problematic for xen-hvm.c (Xen ioreq processor in
>> QEMU).
>> 
>> It receives these PCI conf ioreqs out of any context. To workaround
>> this, existing code issues I/O to emulated CF8h/CFCh ports in order
>> to allow QEMU to find their target. But we can't use the same method
>> for MMCONFIG accesses -- this works for basic PCI conf space only.  
>
>I think you want to view this the other way around: No physical
>device would ever get to see MMCFG accesses (or CF8/CFC port
>ones). This same layering is what we should have in the
>virtualized case.

We have purely virtual layout of the PCI bus along with virtual,
emulated and completely unrelated to host's MMCONFIG -- so what's
exposed? This emulated MMCONFIG simply a supplement to virtual PCI bus
and its layout correspond to the virtual PCI bus guest/QEMU see.

It's QEMU who controls chipset-specific PCIEXBAR emulation and knows
about MMCONFIG position and size. QEMU informs Xen about where it is,
in order to receive events about R/W accesses to this emulated area --
so, why he should receive these events in a form of PCI conf BDF/reg and
not simply as MMCONFIG offset directly if it is basically the same
thing?

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Roger Pau Monné
On Thu, Mar 22, 2018 at 08:49:58AM +1000, Alexey G wrote:
> On Wed, 21 Mar 2018 17:15:04 +
> Roger Pau Monné  wrote:
> [...]
> >> Above scenario makes it obvious that at least for QEMU the MMIO->PCI
> >> conf translation is a redundant step. Why not to allow specifying
> >> for DM whether it prefers to receive MMCONFIG accesses as native
> >> (MMIO ones) or as translated PCI conf ioreqs?  
> >
> >You are just adding an extra level of complexity to an interface
> >that's fairly simple. You register a PCI device using
> >XEN_DMOP_IO_RANGE_PCI and you get IOREQ_TYPE_PCI_CONFIG ioreqs.
> 
> Yes, and it is still needed as we have two distinct (and not equal)
> interfaces to PCI conf space. Apart from 0..FFh range overlapping they
> can be considered very different interfaces. And whether it is a real
> system or emulated -- we can use either one of these two interfaces or
> both.

The legacy PCI config space accesses and the MCFG config space access
are just different methods of accessing the PCI configuration space,
but the data _must_ be exactly the same. I don't see how a device
would care about where the access to the config space originated.

> For QEMU zero changes are needed to support MMCONFIG MMIO accesses if
> they come as MMIO ioreqs. It's just what its MMCONFIG emulation code
> expects.

As I said many times in this thread, you seem to be focused around
what's best for QEMU only, and this is wrong. The IOREQ interface is
used by QEMU, but it's also used by other device emulators.

I get the feeling that you assume that the correct solution is the one
that involves less changes to Xen and QEMU. This is simply not true.

> Anyway, for (kind of vague) users of the multiple ioreq servers
> capability we can enable MMIO translation to PCI conf ioreqs. Note that
> actually this is an extra step, not forwarding trapped MMCONFIG MMIO
> accesses to the selected device model as is.
>
> >Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI config
> >space access is misleading.
> 
> These are very different accesses, both in transport and capabilities.
> 
> >In both cases Xen would have to do the MCFG access decoding in order
> >to figure out which IOREQ server will handle the request. At which
> >point the only step that you avoid is the reconstruction of the memory
> >access from the IOREQ_TYPE_PCI_CONFIG which is trivial.
> 
> The "reconstruction of the memory access" you mentioned won't be easy
> actually. The thing is, address_space_read/write is not all what we
> need.
> 
> In order to translate PCI conf ioreqs back to emulated MMIO ops, we
> need to be an involved party, mainly to know where MMCONFIG area is
> located so we can construct the address within its range from BDF.
> This piece of information is destroyed in the process of MMIO ioreq
> translation to PCI conf type.

QEMU certainly knows the position of the MCFG area (because it's the
one that tells Xen about it), so I don't understand your concerns
above.

> The code which parse PCI conf ioreqs in xen-hvm.c doesn't know anything
> about the current emulated MMCONFIG state. The correct way to have this
> info is to participate in its emulation. As we don't participate, we
> have no other way than trying to gain backdoor access to PCIHost fields
> via things like object_resolve_*(). This solution is cumbersome and
> ugly but will work... and may break anytime due to changes in QEMU. 

OK, so you don't want to reconstruct the access, fine.

Then just inject it using pcie_mmcfg_data_{read/write} or some similar
wrapper. My suggestion was just to try to use the easier way to get
this injected into QEMU.

> QEMU maintainers will grin while looking at all this I'm afraid --
> trapped MMIO accesses which are translated to PCI conf accesses which
> in turn translated back to emulated MMIO accesses upon receiving, along
> with tedious attempts to gain access to MMCONFIG-related info as we're
> not invited to the MMCONFIG emulation party.
>
> The more I think about it, the more I like the existing
> map_io_range_to_ioreq_server() approach. :( It works without doing
> anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
> working as expected. There is a problem to make it compatible with
> the specific multiple ioreq servers feature, but providing a new
> dmop/hypercall (which you suggest is a must have thing to trap MMCONFIG
> MMIO to give QEMU only the freedom to tell where it is located) allows
> to solve this problem in any possible way, either MMIO -> PCI conf
> translation or anything else.

I'm sorry, but I'm getting lost.

You complain that using IOREQ_TYPE_PCI_CONFIG is not a good approach
because QEMU needs to know the position of the MCFG area if we want to
reconstruct and forward the MMIO access. And then you are proposing to
use IOREQ_TYPE_COPY which _requires_ QEMU to know the position of the
MCFG area in order to do the decoding of the PCI config space access.

> >> We can still route either ioreq
> >> type to 

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Paul Durrant
> -Original Message-
> From: Alexey G [mailto:x19...@gmail.com]
> Sent: 22 March 2018 09:55
> To: Jan Beulich 
> Cc: Andrew Cooper ; Anthony Perard
> ; Ian Jackson ; Paul
> Durrant ; Roger Pau Monne
> ; Wei Liu ; Stefano Stabellini
> ; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Thu, 22 Mar 2018 03:04:16 -0600
> "Jan Beulich"  wrote:
> 
> >>>> On 22.03.18 at 01:31,  wrote:
> >> On Wed, 21 Mar 2018 17:06:28 +
> >> Paul Durrant  wrote:
> >> [...]
> >>>> Well, this might work actually. Although the overall scenario will
> >>>> be overcomplicated a bit for _PCI_CONFIG ioreqs. Here is how it
> >>>> will look:
> >>>>
> >>>> QEMU receives PCIEXBAR update -> calls the new dmop to tell Xen
> new
> >>>> MMCONFIG address/size -> Xen (re)maps MMIO trapping area ->
> someone
> >>>> is
> >>>> accessing this area -> Xen intercepts this MMIO access
> >>>>
> >>>> But here's what happens next:
> >>>>
> >>>> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
> >>>> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info back
> >>>> to the offset in emulated MMCONFIG range -> DM calls
> >>>> address_space_read/write to trigger MMIO emulation
> >>>>
> >>>
> >>>That would only be true of a dm that cannot handle PCI config ioreqs
> >>>directly.
> >>
> >> It's just a bit problematic for xen-hvm.c (Xen ioreq processor in
> >> QEMU).
> >>
> >> It receives these PCI conf ioreqs out of any context. To workaround
> >> this, existing code issues I/O to emulated CF8h/CFCh ports in order
> >> to allow QEMU to find their target. But we can't use the same method
> >> for MMCONFIG accesses -- this works for basic PCI conf space only.
> >
> >I think you want to view this the other way around: No physical
> >device would ever get to see MMCFG accesses (or CF8/CFC port
> >ones). This same layering is what we should have in the
> >virtualized case.
> 
> We have purely virtual layout of the PCI bus along with virtual,
> emulated and completely unrelated to host's MMCONFIG -- so what's
> exposed? This emulated MMCONFIG simply a supplement to virtual PCI bus
> and its layout correspond to the virtual PCI bus guest/QEMU see.
> 
> It's QEMU who controls chipset-specific PCIEXBAR emulation and knows
> about MMCONFIG position and size.

...and I think that it the wrong solution for Xen. We only use QEMU as an 
emulator for peripheral devices; we should not be using it for this kind of 
emulation... that should be brought into the hypervisor.

> QEMU informs Xen about where it is,

No. Xen should not care where QEMU wants to put it because the MMIO emulations 
should not even read QEMU.

   Paul

> in order to receive events about R/W accesses to this emulated area --
> so, why he should receive these events in a form of PCI conf BDF/reg and
> not simply as MMCONFIG offset directly if it is basically the same
> thing?

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Roger Pau Monné
On Thu, Mar 22, 2018 at 09:29:44AM +, Paul Durrant wrote:
> > The more I think about it, the more I like the existing
> > map_io_range_to_ioreq_server() approach. :( It works without doing
> > anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
> > working as expected. There is a problem to make it compatible with
> > the specific multiple ioreq servers feature, but providing a new
> > dmop/hypercall (which you suggest is a must have thing to trap MMCONFIG
> > MMIO to give QEMU only the freedom to tell where it is located) allows
> > to solve this problem in any possible way, either MMIO -> PCI conf
> > translation or anything else.
> > 
> 
> I don't think we even want QEMU to have the freedom to say where the
> MMCONFIG areas are located, do we?

Sadly this how the chipset works. The PCIEXBAR register contains the
position of the MCFG area. And this is emulated by QEMU.

> QEMU is not in charge of the
> guest memory map and it is not responsible for the building the MCFG
> table, Xen is.

Well, the one that builds the MCFG table is hvmloader actually, which
is the one that initially sets the value of PCIEXBAR and thus the
initial position of the MCFG.

> So it should be Xen that decides where the MMCONFIG
> area goes for each registered PCI device and it should be Xen that
> adds that to the MCFG table. It should be Xen that handles the
> MMCONFIG MMIO accesses and these should be forwarded to QEMU as PCI
> config IOREQs.  Now, it may be that we need to introduce a Xen
> specific mechanism into QEMU to then route those config space
> transactions to the device models but that would be an improvement
> over the current cf8/cfc hackery anyway.

I think we need a way for QEMU to tell Xen the position of the MCFG
area, and any changes to it.

I don't think we want to emulate the PCIEXBAR register inside of Xen,
if we do that then we would likely have to emulate the full Express
Chipset inside of Xen.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Paul Durrant
> -Original Message-
> From: Roger Pau Monne
> Sent: 22 March 2018 10:06
> To: Paul Durrant 
> Cc: 'Alexey G' ; xen-devel@lists.xenproject.org;
> Andrew Cooper ; Ian Jackson
> ; Jan Beulich ; Wei Liu
> ; Anthony Perard ;
> Stefano Stabellini 
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Thu, Mar 22, 2018 at 09:29:44AM +, Paul Durrant wrote:
> > > The more I think about it, the more I like the existing
> > > map_io_range_to_ioreq_server() approach. :( It works without doing
> > > anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
> > > working as expected. There is a problem to make it compatible with
> > > the specific multiple ioreq servers feature, but providing a new
> > > dmop/hypercall (which you suggest is a must have thing to trap
> MMCONFIG
> > > MMIO to give QEMU only the freedom to tell where it is located) allows
> > > to solve this problem in any possible way, either MMIO -> PCI conf
> > > translation or anything else.
> > >
> >
> > I don't think we even want QEMU to have the freedom to say where the
> > MMCONFIG areas are located, do we?
> 
> Sadly this how the chipset works. The PCIEXBAR register contains the
> position of the MCFG area. And this is emulated by QEMU.

So we should be emulating that in Xen, not handing it off to QEMU. Our 
integration with QEMU is already terrible and using QEMU to emulate the PCIe 
chipset will only make it worse.

> 
> > QEMU is not in charge of the
> > guest memory map and it is not responsible for the building the MCFG
> > table, Xen is.
> 
> Well, the one that builds the MCFG table is hvmloader actually, which
> is the one that initially sets the value of PCIEXBAR and thus the
> initial position of the MCFG.
> 
> > So it should be Xen that decides where the MMCONFIG
> > area goes for each registered PCI device and it should be Xen that
> > adds that to the MCFG table. It should be Xen that handles the
> > MMCONFIG MMIO accesses and these should be forwarded to QEMU as
> PCI
> > config IOREQs.  Now, it may be that we need to introduce a Xen
> > specific mechanism into QEMU to then route those config space
> > transactions to the device models but that would be an improvement
> > over the current cf8/cfc hackery anyway.
> 
> I think we need a way for QEMU to tell Xen the position of the MCFG
> area, and any changes to it.
> 
> I don't think we want to emulate the PCIEXBAR register inside of Xen,
> if we do that then we would likely have to emulate the full Express
> Chipset inside of Xen.
> 

No, that's *exactly* what we should be doing. We should only be using QEMU for 
emulation of discrete peripheral devices.

  Paul

> Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 09:29:44 +
Paul Durrant  wrote:

>> -Original Message-
[...]
>> >In both cases Xen would have to do the MCFG access decoding in order
>> >to figure out which IOREQ server will handle the request. At which
>> >point the only step that you avoid is the reconstruction of the
>> >memory access from the IOREQ_TYPE_PCI_CONFIG which is trivial.  
>> 
>> The "reconstruction of the memory access" you mentioned won't be easy
>> actually. The thing is, address_space_read/write is not all what we
>> need.
>> 
>> In order to translate PCI conf ioreqs back to emulated MMIO ops, we
>> need to be an involved party, mainly to know where MMCONFIG area is
>> located so we can construct the address within its range from BDF.
>> This piece of information is destroyed in the process of MMIO ioreq
>> translation to PCI conf type.
>> 
>> The code which parse PCI conf ioreqs in xen-hvm.c doesn't know
>> anything about the current emulated MMCONFIG state. The correct way
>> to have this info is to participate in its emulation. As we don't
>> participate, we have no other way than trying to gain backdoor
>> access to PCIHost fields via things like object_resolve_*(). This
>> solution is cumbersome and ugly but will work... and may break
>> anytime due to changes in QEMU.
>> 
>> QEMU maintainers will grin while looking at all this I'm afraid --
>> trapped MMIO accesses which are translated to PCI conf accesses which
>> in turn translated back to emulated MMIO accesses upon receiving,
>> along with tedious attempts to gain access to MMCONFIG-related info
>> as we're not invited to the MMCONFIG emulation party.
>> 
>> The more I think about it, the more I like the existing
>> map_io_range_to_ioreq_server() approach. :( It works without doing
>> anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
>> working as expected. There is a problem to make it compatible with
>> the specific multiple ioreq servers feature, but providing a new
>> dmop/hypercall (which you suggest is a must have thing to trap
>> MMCONFIG MMIO to give QEMU only the freedom to tell where it is
>> located) allows to solve this problem in any possible way, either
>> MMIO -> PCI conf translation or anything else.
>>   
>
>I don't think we even want QEMU to have the freedom to say where the
>MMCONFIG areas are located, do we? QEMU is not in charge of the guest
>memory map and it is not responsible for the building the MCFG table,
>Xen is. So it should be Xen that decides where the MMCONFIG area goes
>for each registered PCI device and it should be Xen that adds that to
>the MCFG table. It should be Xen that handles the MMCONFIG MMIO
>accesses and these should be forwarded to QEMU as PCI config IOREQs.
>Now, it may be that we need to introduce a Xen specific mechanism into
>QEMU to then route those config space transactions to the device
>models but that would be an improvement over the current cf8/cfc
>hackery anyway.

Well, MMCONFIG is a chipset-specific thing. We probably can't simply
abstract its usage, merely providing ACPI MCFG table for it.

Its layout must correspond to the emulated PCI conf space where the
majority of devices belong to QEMU. Although we can track all QEMU's
usage of emulated/PT PCI devices and build this layout themselves, this
design may introduce multiple issues. For QEMU handling of such PCI
conf ioreq without knowing anything about MMCONFIG will become worse --
previously he at least knew that those belong to the MMCONFIG range he
emulates, but in case of PCI conf ioreqs situation gets a bit more
complicated -- either CF8/CFC workaround or manual lookup of the target
device from rather isolated xen-hvm.c. Feasible, yes, but will look like
a dirty hack -- doing part of QEMU's internal job.

These are merely inconveniences, main problem here at the moment is
OVMF. OVMF does MMCONFIG relocation by writing to PCIEXBAR he knows
about on Q35, followed by using it at the address he expects. This is
something I want to address in subsequent patches.

>  Paul
>
>> >> We can still route either ioreq
>> >> type to multiple device emulators accordingly.  
>> >
>> >It's exactly the same that's done for IO space PCI config space
>> >addresses. QEMU gets an IOREQ_TYPE_PCI_CONFIG and it replays the IO
>> >space access using do_outp and cpu_ioreq_pio.  
>> 
>> ...And it is completely limited to basic PCI conf space. I don't know
>> the context of this line in xen-hvm.c:
>> 
>> val = (1u << 31) | ((req->addr & 0x0f00) << 16) | ((sbdf & 0x)
>> << 8) | (req->addr & 0xfc);
>> 
>> but seems like current QEMU versions do not expect anything similar
>> to AMD ECS-style accesses for 0CF8h. It is limited to basic PCI conf
>> only. 
>> >If you think using IOREQ_TYPE_COPY for MCFG accesses is such a
>> >benefit for QEMU, why not just translate the IOREQ_TYPE_PCI_CONFIG
>> >into IOREQ_TYPE_COPY in handle_ioreq and dispatch it using
>> >cpu_ioreq_move?  
>> 
>> Answered above, we need to somehow have access to the info w

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 10:09:16 +
Paul Durrant  wrote:
[...]
>> > I don't think we even want QEMU to have the freedom to say where
>> > the MMCONFIG areas are located, do we?
>> 
>> Sadly this how the chipset works. The PCIEXBAR register contains the
>> position of the MCFG area. And this is emulated by QEMU.

>So we should be emulating that in Xen, not handing it off to QEMU. Our
>integration with QEMU is already terrible and using QEMU to emulate
>the PCIe chipset will only make it worse.  

I guess QEMU guys will tell that it will actually improve. :)
One of the very first observation I made while learning Xen/QEMU was
that Xen and QEMU behave sort of like stepmother and stepdaughter --
dislike each other but have to live together in one house for now.
I think a better interaction will benefit both.

There are some architectural issues (MMIO hole control for passthrough
needs is one of them) which can be solved by actually improving
coordination with QEMU, while not sacrificing the security in any way.

>> > QEMU is not in charge of the
>> > guest memory map and it is not responsible for the building the
>> > MCFG table, Xen is.
>> 
>> Well, the one that builds the MCFG table is hvmloader actually, which
>> is the one that initially sets the value of PCIEXBAR and thus the
>> initial position of the MCFG.
>> 
>> > So it should be Xen that decides where the MMCONFIG
>> > area goes for each registered PCI device and it should be Xen that
>> > adds that to the MCFG table. It should be Xen that handles the
>> > MMCONFIG MMIO accesses and these should be forwarded to QEMU as
>> PCI
>> > config IOREQs.  Now, it may be that we need to introduce a Xen
>> > specific mechanism into QEMU to then route those config space
>> > transactions to the device models but that would be an improvement
>> > over the current cf8/cfc hackery anyway.
>> 
>> I think we need a way for QEMU to tell Xen the position of the MCFG
>> area, and any changes to it.
>> 
>> I don't think we want to emulate the PCIEXBAR register inside of Xen,
>> if we do that then we would likely have to emulate the full Express
>> Chipset inside of Xen.
>> 
>No, that's *exactly* what we should be doing. We should only be using
>QEMU for emulation of discrete peripheral devices.  

Emulated PCIe Switch (PCI-PCI bridge basically) can be considered a
discrete peripheral device which can function alone?

If we should emulate the whole PCIe bus, where will be the dividing
line between chipset emulation and PCIe hierarchy emulation?

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 10:06:09 +
Paul Durrant  wrote:

>> -Original Message-
>> From: Alexey G [mailto:x19...@gmail.com]
>> Sent: 22 March 2018 09:55
>> To: Jan Beulich 
>> Cc: Andrew Cooper ; Anthony Perard
>> ; Ian Jackson ;
>> Paul Durrant ; Roger Pau Monne
>> ; Wei Liu ; Stefano
>> Stabellini ; xen-devel@lists.xenproject.org
>> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate
>> MMCONFIG area in the MMIO hole + minor code refactoring
>> 
>> On Thu, 22 Mar 2018 03:04:16 -0600
>> "Jan Beulich"  wrote:
>>   
>> >>>> On 22.03.18 at 01:31,  wrote:  
>> >> On Wed, 21 Mar 2018 17:06:28 +
>> >> Paul Durrant  wrote:
>> >> [...]  
>> >>>> Well, this might work actually. Although the overall scenario
>> >>>> will be overcomplicated a bit for _PCI_CONFIG ioreqs. Here is
>> >>>> how it will look:
>> >>>>
>> >>>> QEMU receives PCIEXBAR update -> calls the new dmop to tell
>> >>>> Xen  
>> new  
>> >>>> MMCONFIG address/size -> Xen (re)maps MMIO trapping area ->  
>> someone  
>> >>>> is
>> >>>> accessing this area -> Xen intercepts this MMIO access
>> >>>>
>> >>>> But here's what happens next:
>> >>>>
>> >>>> Xen translates MMIO access into PCI_CONFIG and sends it to DM ->
>> >>>> DM receives _PCI_CONFIG ioreq -> DM translates BDF/addr info
>> >>>> back to the offset in emulated MMCONFIG range -> DM calls
>> >>>> address_space_read/write to trigger MMIO emulation
>> >>>>  
>> >>>
>> >>>That would only be true of a dm that cannot handle PCI config
>> >>>ioreqs directly.  
>> >>
>> >> It's just a bit problematic for xen-hvm.c (Xen ioreq processor in
>> >> QEMU).
>> >>
>> >> It receives these PCI conf ioreqs out of any context. To
>> >> workaround this, existing code issues I/O to emulated CF8h/CFCh
>> >> ports in order to allow QEMU to find their target. But we can't
>> >> use the same method for MMCONFIG accesses -- this works for basic
>> >> PCI conf space only.  
>> >
>> >I think you want to view this the other way around: No physical
>> >device would ever get to see MMCFG accesses (or CF8/CFC port
>> >ones). This same layering is what we should have in the
>> >virtualized case.  
>> 
>> We have purely virtual layout of the PCI bus along with virtual,
>> emulated and completely unrelated to host's MMCONFIG -- so what's
>> exposed? This emulated MMCONFIG simply a supplement to virtual PCI
>> bus and its layout correspond to the virtual PCI bus guest/QEMU see.
>> 
>> It's QEMU who controls chipset-specific PCIEXBAR emulation and knows
>> about MMCONFIG position and size.  
>
>...and I think that it the wrong solution for Xen. We only use QEMU as
>an emulator for peripheral devices; we should not be using it for this
>kind of emulation... that should be brought into the hypervisor.
>
>> QEMU informs Xen about where it is,  
>
>No. Xen should not care where QEMU wants to put it because the MMIO
>emulations should not even read QEMU.

QEMU does a lot of MMIO emulation, what's so special in the emulated
MMCONFIG? It has absolutely nothing to do with host's MMCONFIG, neither
in address/size or the internal layout. None of the host
MMCONFIG-related facilities touched in any way. It is purely virtual
thing.

I really don't understand why some people have that fear of emulated
MMCONFIG -- it's really the same thing as any other MMIO range QEMU
already emulates via map_io_range_to_ioreq_server(). No sensitive
information exposed. It is related only to emulated PCI conf space which
QEMU already knows about and use, providing emulated PCI devices for it.

>   Paul
>
>> in order to receive events about R/W accesses to this emulated area
>> -- so, why he should receive these events in a form of PCI conf
>> BDF/reg and not simply as MMCONFIG offset directly if it is
>> basically the same thing?  


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Jan Beulich
>>> On 22.03.18 at 12:56,  wrote:
> I really don't understand why some people have that fear of emulated
> MMCONFIG -- it's really the same thing as any other MMIO range QEMU
> already emulates via map_io_range_to_ioreq_server(). No sensitive
> information exposed. It is related only to emulated PCI conf space which
> QEMU already knows about and use, providing emulated PCI devices for it.

You continue to ignore the routing requirement multiple ioreq
servers impose.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 09:57:16 +
Roger Pau Monné  wrote:
[...]
>> Yes, and it is still needed as we have two distinct (and not equal)
>> interfaces to PCI conf space. Apart from 0..FFh range overlapping
>> they can be considered very different interfaces. And whether it is
>> a real system or emulated -- we can use either one of these two
>> interfaces or both.  
>
>The legacy PCI config space accesses and the MCFG config space access
>are just different methods of accessing the PCI configuration space,
>but the data _must_ be exactly the same. I don't see how a device
>would care about where the access to the config space originated.

If they were different methods of accessing the same thing, they
could've been used interchangeably. When we've got a PCI conf ioreq
which has offset>100h we know we cannot just pass it to emulated
CF8/CFC but have to emulate this specifically.

>> For QEMU zero changes are needed to support MMCONFIG MMIO accesses if
>> they come as MMIO ioreqs. It's just what its MMCONFIG emulation code
>> expects.  
>
>As I said many times in this thread, you seem to be focused around
>what's best for QEMU only, and this is wrong. The IOREQ interface is
>used by QEMU, but it's also used by other device emulators.
>
>I get the feeling that you assume that the correct solution is the one
>that involves less changes to Xen and QEMU. This is simply not true.
>
>> Anyway, for (kind of vague) users of the multiple ioreq servers
>> capability we can enable MMIO translation to PCI conf ioreqs. Note
>> that actually this is an extra step, not forwarding trapped MMCONFIG
>> MMIO accesses to the selected device model as is.
>>  
>> >Getting both IOREQ_TYPE_PCI_CONFIG and IOREQ_TYPE_COPY for PCI
>> >config space access is misleading.  
>> 
>> These are very different accesses, both in transport and
>> capabilities. 
>> >In both cases Xen would have to do the MCFG access decoding in order
>> >to figure out which IOREQ server will handle the request. At which
>> >point the only step that you avoid is the reconstruction of the
>> >memory access from the IOREQ_TYPE_PCI_CONFIG which is trivial.  
>> 
>> The "reconstruction of the memory access" you mentioned won't be easy
>> actually. The thing is, address_space_read/write is not all what we
>> need.
>> 
>> In order to translate PCI conf ioreqs back to emulated MMIO ops, we
>> need to be an involved party, mainly to know where MMCONFIG area is
>> located so we can construct the address within its range from BDF.
>> This piece of information is destroyed in the process of MMIO ioreq
>> translation to PCI conf type.  
>
>QEMU certainly knows the position of the MCFG area (because it's the
>one that tells Xen about it), so I don't understand your concerns
>above.
>> The code which parse PCI conf ioreqs in xen-hvm.c doesn't know
>> anything about the current emulated MMCONFIG state. The correct way
>> to have this info is to participate in its emulation. As we don't
>> participate, we have no other way than trying to gain backdoor
>> access to PCIHost fields via things like object_resolve_*(). This
>> solution is cumbersome and ugly but will work... and may break
>> anytime due to changes in QEMU.   
>
>OK, so you don't want to reconstruct the access, fine.
>
>Then just inject it using pcie_mmcfg_data_{read/write} or some similar
>wrapper. My suggestion was just to try to use the easier way to get
>this injected into QEMU.

QEMU knows its position, the problem it that xen-hvm.c (ioreq
processor) is rather isolated from MMCONFIG emulation.

If you check the pcie_mmcfg_data_read/write MMCONFIG handlers in QEMU,
you can see this:

static uint64_t pcie_mmcfg_data_read(void *opaque, <...>
{
PCIExpressHost *e = opaque;
...

We know this 'opaque' when we do MMIO-style MMCONFIG handling as
pcie_mmcfg_data_read/write are actual handlers.

But xen-hvm.c needs to gain access to PCIExpressHost out of nowhere,
which is possible but considered a hack by QEMU. We can also insert
some code to MMCONFIG emulation which will store info we need to some
global variables to be used across wildly different and unrelated
modules. It will work, but anyone who see it will have bad thoughts on
his mind.

>> QEMU maintainers will grin while looking at all this I'm afraid --
>> trapped MMIO accesses which are translated to PCI conf accesses which
>> in turn translated back to emulated MMIO accesses upon receiving,
>> along with tedious attempts to gain access to MMCONFIG-related info
>> as we're not invited to the MMCONFIG emulation party.
>>
>> The more I think about it, the more I like the existing
>> map_io_range_to_ioreq_server() approach. :( It works without doing
>> anything, no hacks, no new interfaces, both MMCONFIG and CF8/CFC are
>> working as expected. There is a problem to make it compatible with
>> the specific multiple ioreq servers feature, but providing a new
>> dmop/hypercall (which you suggest is a must have thing to trap
>> MMCONFIG MMIO to give QEMU only the free

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Roger Pau Monné
On Thu, Mar 22, 2018 at 10:29:22PM +1000, Alexey G wrote:
> On Thu, 22 Mar 2018 09:57:16 +
> Roger Pau Monné  wrote:
> [...]
> >> Yes, and it is still needed as we have two distinct (and not equal)
> >> interfaces to PCI conf space. Apart from 0..FFh range overlapping
> >> they can be considered very different interfaces. And whether it is
> >> a real system or emulated -- we can use either one of these two
> >> interfaces or both.  
> >
> >The legacy PCI config space accesses and the MCFG config space access
> >are just different methods of accessing the PCI configuration space,
> >but the data _must_ be exactly the same. I don't see how a device
> >would care about where the access to the config space originated.
> 
> If they were different methods of accessing the same thing, they
> could've been used interchangeably. When we've got a PCI conf ioreq
> which has offset>100h we know we cannot just pass it to emulated
> CF8/CFC but have to emulate this specifically.

This is already not the best approach to dispatch PCI config space
access in QEMU. I think the interface in QEMU should be:

pci_conf_space_{read/write}(sbdf, register, size , data)

And this would go directly into the device. But I assume this involves
a non-trivial amount of work to be implemented. Hence xen-hvm.c usage
of the IO port access replay.

> >OK, so you don't want to reconstruct the access, fine.
> >
> >Then just inject it using pcie_mmcfg_data_{read/write} or some similar
> >wrapper. My suggestion was just to try to use the easier way to get
> >this injected into QEMU.
> 
> QEMU knows its position, the problem it that xen-hvm.c (ioreq
> processor) is rather isolated from MMCONFIG emulation.
> 
> If you check the pcie_mmcfg_data_read/write MMCONFIG handlers in QEMU,
> you can see this:
> 
> static uint64_t pcie_mmcfg_data_read(void *opaque, <...>
> {
> PCIExpressHost *e = opaque;
> ...
> 
> We know this 'opaque' when we do MMIO-style MMCONFIG handling as
> pcie_mmcfg_data_read/write are actual handlers.
> 
> But xen-hvm.c needs to gain access to PCIExpressHost out of nowhere,
> which is possible but considered a hack by QEMU. We can also insert
> some code to MMCONFIG emulation which will store info we need to some
> global variables to be used across wildly different and unrelated
> modules. It will work, but anyone who see it will have bad thoughts on
> his mind.

Since you need to notify Xen the MCFG area address, why not just store
the MCFG address while doing this operation? You could do this with a
helper in xen-hvm.c, and keep the variable locally to that file.

In any case, this is a QEMU implementation detail. IMO the IOREQ
interface is clear and should not be bended like this just because
'this is easier to implement in QEMU'.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 06:09:44 -0600
"Jan Beulich"  wrote:

 On 22.03.18 at 12:56,  wrote:  
>> I really don't understand why some people have that fear of emulated
>> MMCONFIG -- it's really the same thing as any other MMIO range QEMU
>> already emulates via map_io_range_to_ioreq_server(). No sensitive
>> information exposed. It is related only to emulated PCI conf space
>> which QEMU already knows about and use, providing emulated PCI
>> devices for it.  
>
>You continue to ignore the routing requirement multiple ioreq
>servers impose.

If the emulated MMCONFIG approach will be modified to become
fully compatible with multiple ioreq servers (whatever they used for), I
assume there will be no objections that emulated MMCONFIG can't be
used?
I just want to clarify this moment -- why people think that
a completely emulated MMIO range, not related in any
way to host's MMCONFIG may compromise something.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Jan Beulich
>>> On 22.03.18 at 14:05,  wrote:
> On Thu, 22 Mar 2018 06:09:44 -0600
> "Jan Beulich"  wrote:
> 
> On 22.03.18 at 12:56,  wrote:  
>>> I really don't understand why some people have that fear of emulated
>>> MMCONFIG -- it's really the same thing as any other MMIO range QEMU
>>> already emulates via map_io_range_to_ioreq_server(). No sensitive
>>> information exposed. It is related only to emulated PCI conf space
>>> which QEMU already knows about and use, providing emulated PCI
>>> devices for it.  
>>
>>You continue to ignore the routing requirement multiple ioreq
>>servers impose.
> 
> If the emulated MMCONFIG approach will be modified to become
> fully compatible with multiple ioreq servers (whatever they used for), I
> assume there will be no objections that emulated MMCONFIG can't be
> used?
> I just want to clarify this moment -- why people think that
> a completely emulated MMIO range, not related in any
> way to host's MMCONFIG may compromise something.

Compromise? All that was said so far - afair - was that this is the
wrong way round design wise.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 07:20:00 -0600
"Jan Beulich"  wrote:

 On 22.03.18 at 14:05,  wrote:  
>> On Thu, 22 Mar 2018 06:09:44 -0600
>> "Jan Beulich"  wrote:
>>   
>> On 22.03.18 at 12:56,  wrote:
 I really don't understand why some people have that fear of
 emulated MMCONFIG -- it's really the same thing as any other MMIO
 range QEMU already emulates via map_io_range_to_ioreq_server(). No
 sensitive information exposed. It is related only to emulated PCI
 conf space which QEMU already knows about and use, providing
 emulated PCI devices for it.
>>>
>>>You continue to ignore the routing requirement multiple ioreq
>>>servers impose.  
>> 
>> If the emulated MMCONFIG approach will be modified to become
>> fully compatible with multiple ioreq servers (whatever they used
>> for), I assume there will be no objections that emulated MMCONFIG
>> can't be used?
>> I just want to clarify this moment -- why people think that
>> a completely emulated MMIO range, not related in any
>> way to host's MMCONFIG may compromise something.  
>
>Compromise? All that was said so far - afair - was that this is the
>wrong way round design wise.

I assume it's all about emulating some real system for HVM, for other
goals PV/PVH are available. What is a proper, design-wise way to
emulate the MMIO-based MMCONFIG range Q35 provides you think of?

Here is what I've heard so far in this thread:

1. Add a completely new dmop/hypercall so that QEMU can tell Xen where
emulated MMCONFIG MMIO area is located and in the same time map it for
MMIO trapping to intercept accesses. Latter action is the same what
map_io_range_to_ioreq_server() does, but let's ignore it for now
because there was opinion that we need to stick to a distinct hypercall.

2. Upon trapping accesses to this emulated range, Xen will pretend that
QEMU didn't just told him about MMCONFIG location and size and instead
convert MMIO access into PCI conf one and send the ioreq to QEMU or
some other DM.

3. If there will be a PCIEXBAR relocation (OVMF does it currently for
MMCONFIG usage, but we must later teach him non-QEMU manners), QEMU must
immediately inform Xen about any changes in MMCONFIG location/status.

4. QEMU receives PCI conf access while expecting the MMIO address, so
xen-hvm.c has to deal with it somehow, either obtaining MMCONFIG base
and recreating emulated MMIO access from BDF/reg or doing the dirty work
of finding PCIBus/PCIDevice target itself as it cannot use emulated
CF8/CFC ports due to legacy PCI conf size limitation.

Please confirm that it is a preferable solution or if something missing.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Jan Beulich
>>> On 22.03.18 at 15:34,  wrote:
> On Thu, 22 Mar 2018 07:20:00 -0600
> "Jan Beulich"  wrote:
> 
> On 22.03.18 at 14:05,  wrote:  
>>> On Thu, 22 Mar 2018 06:09:44 -0600
>>> "Jan Beulich"  wrote:
>>>   
>>> On 22.03.18 at 12:56,  wrote:
> I really don't understand why some people have that fear of
> emulated MMCONFIG -- it's really the same thing as any other MMIO
> range QEMU already emulates via map_io_range_to_ioreq_server(). No
> sensitive information exposed. It is related only to emulated PCI
> conf space which QEMU already knows about and use, providing
> emulated PCI devices for it.

You continue to ignore the routing requirement multiple ioreq
servers impose.  
>>> 
>>> If the emulated MMCONFIG approach will be modified to become
>>> fully compatible with multiple ioreq servers (whatever they used
>>> for), I assume there will be no objections that emulated MMCONFIG
>>> can't be used?
>>> I just want to clarify this moment -- why people think that
>>> a completely emulated MMIO range, not related in any
>>> way to host's MMCONFIG may compromise something.  
>>
>>Compromise? All that was said so far - afair - was that this is the
>>wrong way round design wise.
> 
> I assume it's all about emulating some real system for HVM, for other
> goals PV/PVH are available. What is a proper, design-wise way to
> emulate the MMIO-based MMCONFIG range Q35 provides you think of?
> 
> Here is what I've heard so far in this thread:
> 
> 1. Add a completely new dmop/hypercall so that QEMU can tell Xen where
> emulated MMCONFIG MMIO area is located and in the same time map it for
> MMIO trapping to intercept accesses. Latter action is the same what
> map_io_range_to_ioreq_server() does, but let's ignore it for now
> because there was opinion that we need to stick to a distinct hypercall.
> 
> 2. Upon trapping accesses to this emulated range, Xen will pretend that
> QEMU didn't just told him about MMCONFIG location and size and instead
> convert MMIO access into PCI conf one and send the ioreq to QEMU or
> some other DM.
> 
> 3. If there will be a PCIEXBAR relocation (OVMF does it currently for
> MMCONFIG usage, but we must later teach him non-QEMU manners), QEMU must
> immediately inform Xen about any changes in MMCONFIG location/status.
> 
> 4. QEMU receives PCI conf access while expecting the MMIO address, so
> xen-hvm.c has to deal with it somehow, either obtaining MMCONFIG base
> and recreating emulated MMIO access from BDF/reg or doing the dirty work
> of finding PCIBus/PCIDevice target itself as it cannot use emulated
> CF8/CFC ports due to legacy PCI conf size limitation.
> 
> Please confirm that it is a preferable solution or if something missing.

I'm afraid this is only part of the picture, as you've been told by
others before. We first of all need to settle on who emulates
the core chipset registers. Depending on that will be how Xen
would learn about the MCFG location inside the guest.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 08:42:09 -0600
"Jan Beulich"  wrote:

 On 22.03.18 at 15:34,  wrote:  
>> On Thu, 22 Mar 2018 07:20:00 -0600
>> "Jan Beulich"  wrote:
>>   
>> On 22.03.18 at 14:05,  wrote:
 On Thu, 22 Mar 2018 06:09:44 -0600
 "Jan Beulich"  wrote:
 
 On 22.03.18 at 12:56,  wrote:  
>> I really don't understand why some people have that fear of
>> emulated MMCONFIG -- it's really the same thing as any other MMIO
>> range QEMU already emulates via map_io_range_to_ioreq_server().
>> No sensitive information exposed. It is related only to emulated
>> PCI conf space which QEMU already knows about and use, providing
>> emulated PCI devices for it.  
>
>You continue to ignore the routing requirement multiple ioreq
>servers impose.
 
 If the emulated MMCONFIG approach will be modified to become
 fully compatible with multiple ioreq servers (whatever they used
 for), I assume there will be no objections that emulated MMCONFIG
 can't be used?
 I just want to clarify this moment -- why people think that
 a completely emulated MMIO range, not related in any
 way to host's MMCONFIG may compromise something.
>>>
>>>Compromise? All that was said so far - afair - was that this is the
>>>wrong way round design wise.  
>> 
>> I assume it's all about emulating some real system for HVM, for other
>> goals PV/PVH are available. What is a proper, design-wise way to
>> emulate the MMIO-based MMCONFIG range Q35 provides you think of?
>> 
>> Here is what I've heard so far in this thread:
>> 
>> 1. Add a completely new dmop/hypercall so that QEMU can tell Xen
>> where emulated MMCONFIG MMIO area is located and in the same time
>> map it for MMIO trapping to intercept accesses. Latter action is the
>> same what map_io_range_to_ioreq_server() does, but let's ignore it
>> for now because there was opinion that we need to stick to a
>> distinct hypercall.
>> 
>> 2. Upon trapping accesses to this emulated range, Xen will pretend
>> that QEMU didn't just told him about MMCONFIG location and size and
>> instead convert MMIO access into PCI conf one and send the ioreq to
>> QEMU or some other DM.
>> 
>> 3. If there will be a PCIEXBAR relocation (OVMF does it currently for
>> MMCONFIG usage, but we must later teach him non-QEMU manners), QEMU
>> must immediately inform Xen about any changes in MMCONFIG
>> location/status.
>> 
>> 4. QEMU receives PCI conf access while expecting the MMIO address, so
>> xen-hvm.c has to deal with it somehow, either obtaining MMCONFIG base
>> and recreating emulated MMIO access from BDF/reg or doing the dirty
>> work of finding PCIBus/PCIDevice target itself as it cannot use
>> emulated CF8/CFC ports due to legacy PCI conf size limitation.
>> 
>> Please confirm that it is a preferable solution or if something
>> missing.  
>
>I'm afraid this is only part of the picture, as you've been told by
>others before. We first of all need to settle on who emulates
>the core chipset registers. Depending on that will be how Xen
>would learn about the MCFG location inside the guest.

Few related thoughts:

1. MMCONFIG address is chipset-specific. On Q35 it's a PCIEXBAR, on
other x86 systems it may be HECBASE or else. So we can assume it is
bound to the emulated machine

2. We rely on QEMU to emulate different machines for us.

3. There are users which touch chipset-specific PCIEXBAR directly if
they see a Q35 system (OVMF so far)

Seems like we're pretty limited in freedom of choice in this
conditions, I'm afraid.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-22 Thread Alexey G
On Thu, 22 Mar 2018 12:44:02 +
Roger Pau Monné  wrote:

>On Thu, Mar 22, 2018 at 10:29:22PM +1000, Alexey G wrote:
>> On Thu, 22 Mar 2018 09:57:16 +
>> Roger Pau Monné  wrote:
>> [...]  
>> >> Yes, and it is still needed as we have two distinct (and not
>> >> equal) interfaces to PCI conf space. Apart from 0..FFh range
>> >> overlapping they can be considered very different interfaces. And
>> >> whether it is a real system or emulated -- we can use either one
>> >> of these two interfaces or both.
>> >
>> >The legacy PCI config space accesses and the MCFG config space
>> >access are just different methods of accessing the PCI
>> >configuration space, but the data _must_ be exactly the same. I
>> >don't see how a device would care about where the access to the
>> >config space originated.  
>> 
>> If they were different methods of accessing the same thing, they
>> could've been used interchangeably. When we've got a PCI conf ioreq
>> which has offset>100h we know we cannot just pass it to emulated
>> CF8/CFC but have to emulate this specifically.  
>
>This is already not the best approach to dispatch PCI config space
>access in QEMU. I think the interface in QEMU should be:
>
>pci_conf_space_{read/write}(sbdf, register, size , data)
>
>And this would go directly into the device. But I assume this involves
>a non-trivial amount of work to be implemented. Hence xen-hvm.c usage
>of the IO port access replay.

Yes, it's a helpful shortcut. The only bad thing that we can't use
it for PCI extended config accesses, a memory address within emulated
MMCONFIG much more preferable in current architecture.

>> >OK, so you don't want to reconstruct the access, fine.
>> >
>> >Then just inject it using pcie_mmcfg_data_{read/write} or some
>> >similar wrapper. My suggestion was just to try to use the easier
>> >way to get this injected into QEMU.  
>> 
>> QEMU knows its position, the problem it that xen-hvm.c (ioreq
>> processor) is rather isolated from MMCONFIG emulation.
>> 
>> If you check the pcie_mmcfg_data_read/write MMCONFIG handlers in
>> QEMU, you can see this:
>> 
>> static uint64_t pcie_mmcfg_data_read(void *opaque, <...>
>> {
>> PCIExpressHost *e = opaque;
>> ...
>> 
>> We know this 'opaque' when we do MMIO-style MMCONFIG handling as
>> pcie_mmcfg_data_read/write are actual handlers.
>> 
>> But xen-hvm.c needs to gain access to PCIExpressHost out of nowhere,
>> which is possible but considered a hack by QEMU. We can also insert
>> some code to MMCONFIG emulation which will store info we need to some
>> global variables to be used across wildly different and unrelated
>> modules. It will work, but anyone who see it will have bad thoughts
>> on his mind.  
>
>Since you need to notify Xen the MCFG area address, why not just store
>the MCFG address while doing this operation? You could do this with a
>helper in xen-hvm.c, and keep the variable locally to that file.
>
>In any case, this is a QEMU implementation detail. IMO the IOREQ
>interface is clear and should not be bended like this just because
>'this is easier to implement in QEMU'.

A bit of hack too, but might work. Anyway, it's an extra work we can
avoid if we simply skip PCI conf translation for MMCONFIG MMIO ioreqs
targeting QEMU. I completely agree that we need to translate these
accesses into PCI conf ioreqs for device DMs, but for QEMU it is an
unwanted and redundant step.

AFAIK (Paul might correct me here) the multiple device emulators
feature already makes use of the primary (aka default) DM and
device-specific DM distinction, so in theory it should be possible to
provide that translation only for device-specific DMs (which function
apart from the emulated machine and cannot use its facilities).

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-23 Thread Paul Durrant
> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
> Of Alexey G
> Sent: 22 March 2018 15:31
> To: Roger Pau Monne 
> Cc: Stefano Stabellini ; Wei Liu
> ; Andrew Cooper ; Ian
> Jackson ; Paul Durrant ;
> Jan Beulich ; Anthony Perard
> ; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Thu, 22 Mar 2018 12:44:02 +
> Roger Pau Monné  wrote:
> 
> >On Thu, Mar 22, 2018 at 10:29:22PM +1000, Alexey G wrote:
> >> On Thu, 22 Mar 2018 09:57:16 +
> >> Roger Pau Monné  wrote:
> >> [...]
> >> >> Yes, and it is still needed as we have two distinct (and not
> >> >> equal) interfaces to PCI conf space. Apart from 0..FFh range
> >> >> overlapping they can be considered very different interfaces. And
> >> >> whether it is a real system or emulated -- we can use either one
> >> >> of these two interfaces or both.
> >> >
> >> >The legacy PCI config space accesses and the MCFG config space
> >> >access are just different methods of accessing the PCI
> >> >configuration space, but the data _must_ be exactly the same. I
> >> >don't see how a device would care about where the access to the
> >> >config space originated.
> >>
> >> If they were different methods of accessing the same thing, they
> >> could've been used interchangeably. When we've got a PCI conf ioreq
> >> which has offset>100h we know we cannot just pass it to emulated
> >> CF8/CFC but have to emulate this specifically.
> >
> >This is already not the best approach to dispatch PCI config space
> >access in QEMU. I think the interface in QEMU should be:
> >
> >pci_conf_space_{read/write}(sbdf, register, size , data)
> >
> >And this would go directly into the device. But I assume this involves
> >a non-trivial amount of work to be implemented. Hence xen-hvm.c usage
> >of the IO port access replay.
> 
> Yes, it's a helpful shortcut. The only bad thing that we can't use
> it for PCI extended config accesses, a memory address within emulated
> MMCONFIG much more preferable in current architecture.
> 
> >> >OK, so you don't want to reconstruct the access, fine.
> >> >
> >> >Then just inject it using pcie_mmcfg_data_{read/write} or some
> >> >similar wrapper. My suggestion was just to try to use the easier
> >> >way to get this injected into QEMU.
> >>
> >> QEMU knows its position, the problem it that xen-hvm.c (ioreq
> >> processor) is rather isolated from MMCONFIG emulation.
> >>
> >> If you check the pcie_mmcfg_data_read/write MMCONFIG handlers in
> >> QEMU, you can see this:
> >>
> >> static uint64_t pcie_mmcfg_data_read(void *opaque, <...>
> >> {
> >> PCIExpressHost *e = opaque;
> >> ...
> >>
> >> We know this 'opaque' when we do MMIO-style MMCONFIG handling as
> >> pcie_mmcfg_data_read/write are actual handlers.
> >>
> >> But xen-hvm.c needs to gain access to PCIExpressHost out of nowhere,
> >> which is possible but considered a hack by QEMU. We can also insert
> >> some code to MMCONFIG emulation which will store info we need to
> some
> >> global variables to be used across wildly different and unrelated
> >> modules. It will work, but anyone who see it will have bad thoughts
> >> on his mind.
> >
> >Since you need to notify Xen the MCFG area address, why not just store
> >the MCFG address while doing this operation? You could do this with a
> >helper in xen-hvm.c, and keep the variable locally to that file.
> >
> >In any case, this is a QEMU implementation detail. IMO the IOREQ
> >interface is clear and should not be bended like this just because
> >'this is easier to implement in QEMU'.
> 
> A bit of hack too, but might work. Anyway, it's an extra work we can
> avoid if we simply skip PCI conf translation for MMCONFIG MMIO ioreqs
> targeting QEMU. I completely agree that we need to translate these
> accesses into PCI conf ioreqs for device DMs, but for QEMU it is an
> unwanted and redundant step.
> 
> AFAIK (Paul might correct me here) the multiple device emulators
> feature already makes use of the primary (aka default) DM and
> device-specific DM distinction, so in theory it should be possible to
> provide that translation only for device-specific DMs (which f

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-23 Thread Jan Beulich
>>> On 23.03.18 at 11:29,  wrote:
> No, that's not quite right. Only qemu-trad (and stubdom) are 'default' ioreq 
> servers. Upstream QEMU has registered individual PCI devices with Xen for 
> some time now, and hence gets proper PCI config IOREQs. Also we really really 
> want default ioreq servers as their interface to Xen is fragile and has only 
> just narrowly avoided being a security issue.

Did you miss some "don't" or "to go away"?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-23 Thread Paul Durrant
> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
> Of Jan Beulich
> Sent: 23 March 2018 11:39
> To: Paul Durrant 
> Cc: Stefano Stabellini ; Wei Liu
> ; Andrew Cooper ;
> 'Alexey G' ; xen-devel@lists.xenproject.org; Anthony
> Perard ; Ian Jackson ;
> Roger Pau Monne 
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> >>> On 23.03.18 at 11:29,  wrote:
> > No, that's not quite right. Only qemu-trad (and stubdom) are 'default' ioreq
> > servers. Upstream QEMU has registered individual PCI devices with Xen for
> > some time now, and hence gets proper PCI config IOREQs. Also we really
> really
> > want default ioreq servers as their interface to Xen is fragile and has only
> > just narrowly avoided being a security issue.
> 
> Did you miss some "don't" or "to go away"?
> 

Oops, yes! "to go away" definitely.

  Paul

> Jan
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-23 Thread Paul Durrant
> -Original Message-
> From: Alexey G [mailto:x19...@gmail.com]
> Sent: 22 March 2018 15:09
> To: Jan Beulich 
> Cc: Andrew Cooper ; Anthony Perard
> ; Ian Jackson ; Paul
> Durrant ; Roger Pau Monne
> ; Wei Liu ; StefanoStabellini
> ; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Thu, 22 Mar 2018 08:42:09 -0600
> "Jan Beulich"  wrote:
> 
> >>>> On 22.03.18 at 15:34,  wrote:
> >> On Thu, 22 Mar 2018 07:20:00 -0600
> >> "Jan Beulich"  wrote:
> >>
> >>>>>> On 22.03.18 at 14:05,  wrote:
> >>>> On Thu, 22 Mar 2018 06:09:44 -0600
> >>>> "Jan Beulich"  wrote:
> >>>>
> >>>>>>>> On 22.03.18 at 12:56,  wrote:
> >>>>>> I really don't understand why some people have that fear of
> >>>>>> emulated MMCONFIG -- it's really the same thing as any other
> MMIO
> >>>>>> range QEMU already emulates via
> map_io_range_to_ioreq_server().
> >>>>>> No sensitive information exposed. It is related only to emulated
> >>>>>> PCI conf space which QEMU already knows about and use, providing
> >>>>>> emulated PCI devices for it.
> >>>>>
> >>>>>You continue to ignore the routing requirement multiple ioreq
> >>>>>servers impose.
> >>>>
> >>>> If the emulated MMCONFIG approach will be modified to become
> >>>> fully compatible with multiple ioreq servers (whatever they used
> >>>> for), I assume there will be no objections that emulated MMCONFIG
> >>>> can't be used?
> >>>> I just want to clarify this moment -- why people think that
> >>>> a completely emulated MMIO range, not related in any
> >>>> way to host's MMCONFIG may compromise something.
> >>>
> >>>Compromise? All that was said so far - afair - was that this is the
> >>>wrong way round design wise.
> >>
> >> I assume it's all about emulating some real system for HVM, for other
> >> goals PV/PVH are available. What is a proper, design-wise way to
> >> emulate the MMIO-based MMCONFIG range Q35 provides you think of?
> >>
> >> Here is what I've heard so far in this thread:
> >>
> >> 1. Add a completely new dmop/hypercall so that QEMU can tell Xen
> >> where emulated MMCONFIG MMIO area is located and in the same time
> >> map it for MMIO trapping to intercept accesses. Latter action is the
> >> same what map_io_range_to_ioreq_server() does, but let's ignore it
> >> for now because there was opinion that we need to stick to a
> >> distinct hypercall.
> >>
> >> 2. Upon trapping accesses to this emulated range, Xen will pretend
> >> that QEMU didn't just told him about MMCONFIG location and size and
> >> instead convert MMIO access into PCI conf one and send the ioreq to
> >> QEMU or some other DM.
> >>
> >> 3. If there will be a PCIEXBAR relocation (OVMF does it currently for
> >> MMCONFIG usage, but we must later teach him non-QEMU manners),
> QEMU
> >> must immediately inform Xen about any changes in MMCONFIG
> >> location/status.
> >>
> >> 4. QEMU receives PCI conf access while expecting the MMIO address, so
> >> xen-hvm.c has to deal with it somehow, either obtaining MMCONFIG
> base
> >> and recreating emulated MMIO access from BDF/reg or doing the dirty
> >> work of finding PCIBus/PCIDevice target itself as it cannot use
> >> emulated CF8/CFC ports due to legacy PCI conf size limitation.
> >>
> >> Please confirm that it is a preferable solution or if something
> >> missing.
> >
> >I'm afraid this is only part of the picture, as you've been told by
> >others before. We first of all need to settle on who emulates
> >the core chipset registers. Depending on that will be how Xen
> >would learn about the MCFG location inside the guest.
> 
> Few related thoughts:
> 
> 1. MMCONFIG address is chipset-specific. On Q35 it's a PCIEXBAR, on
> other x86 systems it may be HECBASE or else. So we can assume it is
> bound to the emulated machine
> 

Xen emulates the machine so it should be emulating PCIEXBAR. 

> 2. We rely on QEMU to emulate different machines for us.
> 

We should not be. It's a historical artefact that we rely on QEMU for any part 
of machine emulation.

> 3. There are users which touch chipset-specific PCIEXBAR directly if
> they see a Q35 system (OVMF so far)
> 

And we should squash such accesses. The toolstack should be sole control of the 
guest memory map. It should be the only building MCFG so it should decide where 
the MMCONFIG regions go, not the firmware running in guest context.

> Seems like we're pretty limited in freedom of choice in this
> conditions, I'm afraid.

I don't think so. We're only limited if we use QEMU's Q35 emulation and what 
I'm saying is that we should not be doing that (nor should be we be using it to 
emulate any part of the PIIX today).

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-23 Thread Alexey G
On Fri, 23 Mar 2018 13:57:11 +
Paul Durrant  wrote:
[...]
>> Few related thoughts:
>> 
>> 1. MMCONFIG address is chipset-specific. On Q35 it's a PCIEXBAR, on
>> other x86 systems it may be HECBASE or else. So we can assume it is
>> bound to the emulated machine
>
>Xen emulates the machine so it should be emulating PCIEXBAR. 

Actually, Xen currently emulates only few devices. Others are
provided by QEMU, that's the problem.

>> 2. We rely on QEMU to emulate different machines for us.
>We should not be. It's a historical artefact that we rely on QEMU for
>any part of machine emulation.

HVM guests need to see something more or less close to real hardware to
run. Even if we later install PV drivers for network/storage/etc usage,
we still need to support system firmware (SeaBIOS/OVMF) and be able to
install any (ideally) OS which expects to be installed only on some
real x86 hw. We also need to be ready to fallback to the emulated hw if
eg. user will boot OS in the safe mode.

It all depends on what you mean by not relying on QEMU for any part
of machine emulation.

There is a number of mandatory devices which should be provided for a
typical x86 system. Xen emulates some of them, but there is a number
which he doesn't. Apart from "classic" devices like RTC, PIT, KBC, etc
we need to provide at least storage and network interfaces.

Windows installer won't be happy to boot from the PV storage device, he
prefers to encounter something like AHCI (Windows 7+), ATA (for older
OSes) or ATAPI if it is an iso cd.
Providing emulation for the AHCI+ATA+ATAPI trio alone is a non-trivial
task. QEMU itself provides only partial implementation of these, many
features are unsupported. Another very useful thing to emulate is USB.
Depending on the controller version and device classes required, this
may be far more complex to emulate than AHCI+ATA+ATAPI combined.

So, if you suggest to drop QEMU completely, it means that all this
functionality must be replaced by own. Not that hard, but still a lot
of effort.


OTOH, if you mean stripping QEMU of general PCI bus control and
replacing his emulated NB/SB with Xen-owned -- well, it theory it
should be possible, with patches on QEMU side.

In fact, the emulated chipset (NB+SB combo without supplemental devices)
itself is a small part of required emulation. It's relatively easy to
provide own analogs of for eg. 'mch' and 'ICH9-LPC' QEMU PCIDevice's,
the problem is to glue all remaining parts together.

I assume the final goal in this case is to have only a set of necessary
QEMU PCIDevice's for which we will be providing I/O, MMIO and PCI conf
trapping facilities. Only devices such as rtl8139, ich9-ahci and few
others.

Basically, this means a new, chipset-less QEMU machine type.
Well, in theory it is possible with a bit of effort I think. The main
question is where will be the NB/SB/PCIbus emulating part reside in
this case. As this part must still have some priveleges, it's basically
the same decision problem as with QEMU's dwelling place -- stubdomain,
Dom0 or else.

>> 3. There are users which touch chipset-specific PCIEXBAR directly if
>> they see a Q35 system (OVMF so far)
>
>And we should squash such accesses.
>

Yes, we have that privilege (i.e. allocating all IO/MMIO bases) for
hvmloader. OVMF should not differ in this subject to SeaBIOS.

>The toolstack should be sole
>control of the guest memory map. It should be the only building MCFG
>so it should decide where the MMCONFIG regions go, not the firmware
>running in guest context.

HVM memory layout is another problem which needs solution BTW. I had to
implement one for my PT goals, but it's very radical I'm afraid.

Right now there are wicked issues present in handling memory layout
between hvmloader and QEMU. They may see a different memory map, even
with overlaps in some (depending on MMIO hole size and content) cases --
like an attempt to place MMIO BAR over memory which is used for vram
backing storage by QEMU, causing variety of issues like emulated I/O
errors (with a storage device) during guest boot attempt.

Regarding control of the guest memory map in the toolstack only... The
problem is, only firmware can see a final memory map at the moment.
And only the device model knows about invisible "service" ranges for
emulated devices, like the LFB content (aka "VRAM") when it is not
mapped to a guest.

In order to calculate the final memory/MMIO hole split, we need to know:

1) all PCI devices on a PCI bus. At the moment Xen contributes only
devices like PT to the final PCI bus (via QMP device_add). Others are
QEMU ones. Even Xen platform PCI device relies on QEMU emulation.
Non-QEMU device emulators are another source of virtual PCI devices I
guess.

2) all chipset-specific emulated MMIO ranges. MMCONFIG is one of them
and largest (up to 256Mb for a segment). There are few other smaller
ranges, eg. Root Complex registers. All this ranges depend on the
emulated chipset.

3) all reserved memory ranges (this one what

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-26 Thread Roger Pau Monné
On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:
> On Fri, 23 Mar 2018 13:57:11 +
> Paul Durrant  wrote:
> [...]
> >> Few related thoughts:
> >> 
> >> 1. MMCONFIG address is chipset-specific. On Q35 it's a PCIEXBAR, on
> >> other x86 systems it may be HECBASE or else. So we can assume it is
> >> bound to the emulated machine
> >
> >Xen emulates the machine so it should be emulating PCIEXBAR. 
> 
> Actually, Xen currently emulates only few devices. Others are
> provided by QEMU, that's the problem.
> 
> >> 2. We rely on QEMU to emulate different machines for us.
> >We should not be. It's a historical artefact that we rely on QEMU for
> >any part of machine emulation.
> 
> HVM guests need to see something more or less close to real hardware to
> run. Even if we later install PV drivers for network/storage/etc usage,
> we still need to support system firmware (SeaBIOS/OVMF) and be able to
> install any (ideally) OS which expects to be installed only on some
> real x86 hw. We also need to be ready to fallback to the emulated hw if
> eg. user will boot OS in the safe mode.

I think Paul means that Xen should be emulating the platform devices
and part of the southbridge/northbridge functionality, but not all the
emulated devices provided to a guest.

> 
> It all depends on what you mean by not relying on QEMU for any part
> of machine emulation.
> 
> There is a number of mandatory devices which should be provided for a
> typical x86 system. Xen emulates some of them, but there is a number
> which he doesn't. Apart from "classic" devices like RTC, PIT, KBC, etc
> we need to provide at least storage and network interfaces.
> 
> Windows installer won't be happy to boot from the PV storage device, he
> prefers to encounter something like AHCI (Windows 7+), ATA (for older
> OSes) or ATAPI if it is an iso cd.
> Providing emulation for the AHCI+ATA+ATAPI trio alone is a non-trivial
> task. QEMU itself provides only partial implementation of these, many
> features are unsupported. Another very useful thing to emulate is USB.
> Depending on the controller version and device classes required, this
> may be far more complex to emulate than AHCI+ATA+ATAPI combined.
> 
> So, if you suggest to drop QEMU completely, it means that all this
> functionality must be replaced by own. Not that hard, but still a lot
> of effort.
> 
> 
> OTOH, if you mean stripping QEMU of general PCI bus control and
> replacing his emulated NB/SB with Xen-owned -- well, it theory it
> should be possible, with patches on QEMU side.
> 
> In fact, the emulated chipset (NB+SB combo without supplemental devices)
> itself is a small part of required emulation. It's relatively easy to
> provide own analogs of for eg. 'mch' and 'ICH9-LPC' QEMU PCIDevice's,
> the problem is to glue all remaining parts together.
> 
> I assume the final goal in this case is to have only a set of necessary
> QEMU PCIDevice's for which we will be providing I/O, MMIO and PCI conf
> trapping facilities. Only devices such as rtl8139, ich9-ahci and few
> others.
> 
> Basically, this means a new, chipset-less QEMU machine type.
> Well, in theory it is possible with a bit of effort I think. The main
> question is where will be the NB/SB/PCIbus emulating part reside in
> this case.

Mostly inside of Xen. Of course the IDE/SATA/USB/Ethernet... part of
the southbrigde will be emulated by a device model (ie: QEMU).

As you mention above, I also took a look and it seems like the amount
of registers that we should emulate for Q35 DRAM controller (D0:F0) is
fairly minimal based on current QEMU implementation. We could even
possibly get away by just emulating PCIEXBAR.

> As this part must still have some priveleges, it's basically
> the same decision problem as with QEMU's dwelling place -- stubdomain,
> Dom0 or else.
> 
> >> 3. There are users which touch chipset-specific PCIEXBAR directly if
> >> they see a Q35 system (OVMF so far)
> >
> >And we should squash such accesses.
> >
> 
> Yes, we have that privilege (i.e. allocating all IO/MMIO bases) for
> hvmloader. OVMF should not differ in this subject to SeaBIOS.
> 
> >The toolstack should be sole
> >control of the guest memory map. It should be the only building MCFG
> >so it should decide where the MMCONFIG regions go, not the firmware
> >running in guest context.
> 
> HVM memory layout is another problem which needs solution BTW. I had to
> implement one for my PT goals, but it's very radical I'm afraid.
> 
> Right now there are wicked issues present in handling memory layout
> between hvmloader and QEMU. They may see a different memory map, even
> with overlaps in some (depending on MMIO hole size and content) cases --
> like an attempt to place MMIO BAR over memory which is used for vram
> backing storage by QEMU, causing variety of issues like emulated I/O
> errors (with a storage device) during guest boot attempt.
> 
> Regarding control of the guest memory map in the toolstack only... The
> problem is, only firmware can s

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-26 Thread Alexey G
On Mon, 26 Mar 2018 10:24:38 +0100
Roger Pau Monné  wrote:

>On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:
[...]
>> In fact, the emulated chipset (NB+SB combo without supplemental
>> devices) itself is a small part of required emulation. It's
>> relatively easy to provide own analogs of for eg. 'mch' and
>> 'ICH9-LPC' QEMU PCIDevice's, the problem is to glue all remaining
>> parts together.
>> 
>> I assume the final goal in this case is to have only a set of
>> necessary QEMU PCIDevice's for which we will be providing I/O, MMIO
>> and PCI conf trapping facilities. Only devices such as rtl8139,
>> ich9-ahci and few others.
>> 
>> Basically, this means a new, chipset-less QEMU machine type.
>> Well, in theory it is possible with a bit of effort I think. The main
>> question is where will be the NB/SB/PCIbus emulating part reside in
>> this case.  
>
>Mostly inside of Xen. Of course the IDE/SATA/USB/Ethernet... part of
>the southbrigde will be emulated by a device model (ie: QEMU).
>
>As you mention above, I also took a look and it seems like the amount
>of registers that we should emulate for Q35 DRAM controller (D0:F0) is
>fairly minimal based on current QEMU implementation. We could even
>possibly get away by just emulating PCIEXBAR.

MCH emulation alone might be not an option. Besides, some
southbridge-specific features like emulating ACPI PM facilities for
domain power management (basically, anything at PMBASE) will be
preferable to implement on Xen side, especially considering the fact
that ACPI tables are already provided by Xen's libacpi/hvmloader, not
the device model.
I think the feature may require to cover at least the NB+SB
combination, at least Q35 MCH + ICH9 for start, ideally 82441FX+PIIX4
as well. Also, Xen should control emulated/PT PCI device placement.

Before going this way, it would be good to measure all risks.
Looks like there are two main directions currently:

I. (conservative) Let the main device model (QEMU) to inform Xen about
the current chipset-specific MMCONFIG location, to allow Xen to know
that some MMIO accesses to this area must be forwarded to other ioreq
servers (device emulators) in a form of PCI config read/write ioreqs,
if BDF corresponding to a MMCONFIG offset will point to the PCI device
owned by a device emulator.
In case of device emulators the conversion of MMIO accesses to PCI
config ones is a mandatory step, while the owner of the MMCONFIG MMIO
range may receive MMIO accesses in a native form without conversion
(a strongly preferable option for QEMU).

This approach assumes introducing of the new dmop/hypercall (something
like XEN_DMOP_mmcfg_location_change) to pass to Xen basic MMCONFIG
information -- address, enabled/disabled status (or simply address=0
instead) and size of the MMCONFIG area, eg. as a number of buses.
This information is enough to select a proper ioreq server in Xen and
allow multiple device emulators to function properly.
For future compatibility we can also provide the segment and
start/end bus range as arguments.

What this approach will require:


- new notification-style dmop/hypercall to tell Xen about the current
  emulated MMCONFIG location

- trivial changes in QEMU to use this dmop in Q35 PCIEXBAR handling code

- relatively simple Xen changes in ioreq.c to use the provided range
  for ioreq server selection. Also, to provide MMIO -> PCI config ioreq
  translation for supplemental ioreq servers which don't know anything
  about the emulated system

Risks:
--

Risk to break anything is minimal in this case.

If QEMU will not provide this information (eg. due to an outdated
version installed), only basic PCI config space accesses via CF8/CFC
will be forwarded to a distinct ioreq server. This means the extended
PCI config space accesses won't be forwarded to specific device
emulators. Other than these device emulators, anything else will
continue to work properly in this case. No differences will be for
guest OSes without PCIe ECAM support in either case.

In general, no breakthrough improvements, no negative side-effects.
Just PCIe ECAM working as expected and compatibility with multiple
ioreq servers is retained.


II. (a new feature) Move chipset emulation to Xen directly.

In this case no separate notification necessary as Xen will be
emulating the chosen chipset itself. MMCONFIG location will be known
from own PCIEXBAR emulation.

QEMU will be used only to emulate a minimal set of unrelated devices
(eg. storage/network/vga). Less dependency on QEMU overall.

More freedom to implement some specific features in the future like
smram support for EFI firmware needs. Chipset remapping (aka reclaim)
functionality for memory relocation may be implemented under complete
Xen control, avoiding usage of unsafe add_to_physmap hypercalls.

In future this will allow to move passthrough-supporting code from QEMU
(hw/xen/xen-pt*.c) to Xen, merging it with Roger's vpci series.
This will improve eg. th

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-27 Thread Roger Pau Monné
On Tue, Mar 27, 2018 at 05:42:11AM +1000, Alexey G wrote:
> On Mon, 26 Mar 2018 10:24:38 +0100
> Roger Pau Monné  wrote:
> 
> >On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:
> [...]
> >> In fact, the emulated chipset (NB+SB combo without supplemental
> >> devices) itself is a small part of required emulation. It's
> >> relatively easy to provide own analogs of for eg. 'mch' and
> >> 'ICH9-LPC' QEMU PCIDevice's, the problem is to glue all remaining
> >> parts together.
> >> 
> >> I assume the final goal in this case is to have only a set of
> >> necessary QEMU PCIDevice's for which we will be providing I/O, MMIO
> >> and PCI conf trapping facilities. Only devices such as rtl8139,
> >> ich9-ahci and few others.
> >> 
> >> Basically, this means a new, chipset-less QEMU machine type.
> >> Well, in theory it is possible with a bit of effort I think. The main
> >> question is where will be the NB/SB/PCIbus emulating part reside in
> >> this case.  
> >
> >Mostly inside of Xen. Of course the IDE/SATA/USB/Ethernet... part of
> >the southbrigde will be emulated by a device model (ie: QEMU).
> >
> >As you mention above, I also took a look and it seems like the amount
> >of registers that we should emulate for Q35 DRAM controller (D0:F0) is
> >fairly minimal based on current QEMU implementation. We could even
> >possibly get away by just emulating PCIEXBAR.
> 
> MCH emulation alone might be not an option. Besides, some
> southbridge-specific features like emulating ACPI PM facilities for
> domain power management (basically, anything at PMBASE) will be
> preferable to implement on Xen side, especially considering the fact
> that ACPI tables are already provided by Xen's libacpi/hvmloader, not
> the device model.

Likely, but AFAICT this is kind of already broken, because PM1a and
TMR is already emulated by Xen at hardcoded values. See
xen/arch/x86/hvm/pmtimer.c.

> I think the feature may require to cover at least the NB+SB
> combination, at least Q35 MCH + ICH9 for start, ideally 82441FX+PIIX4
> as well. Also, Xen should control emulated/PT PCI device placement.

Q35 MCH (D0:F0) it's required in order to trap access to PCIEXBAR.

Could you be more concise about ICH9?

The ICH9 spec contains multiple devices, for example it includes an
ethernet controller and a SATA controller, which we should not emulate
inside of Xen.

> II. (a new feature) Move chipset emulation to Xen directly.
> 
> In this case no separate notification necessary as Xen will be
> emulating the chosen chipset itself. MMCONFIG location will be known
> from own PCIEXBAR emulation.
> 
> QEMU will be used only to emulate a minimal set of unrelated devices
> (eg. storage/network/vga). Less dependency on QEMU overall.
> 
> More freedom to implement some specific features in the future like
> smram support for EFI firmware needs. Chipset remapping (aka reclaim)
> functionality for memory relocation may be implemented under complete
> Xen control, avoiding usage of unsafe add_to_physmap hypercalls.
> 
> In future this will allow to move passthrough-supporting code from QEMU
> (hw/xen/xen-pt*.c) to Xen, merging it with Roger's vpci series.
> This will improve eg. the PT + stubdomain situation a lot -- PCI config
> space accesses for PT devices will be handled in a uniform way without
> Dom0 interaction.
> This particular feature can be implemented for the previous approach as
> well, still it is easier to do when Xen controls the emulated machine
> 
> In general, this is a good long-term direction.
> 
> What this approach will require:
> 
> 
> - Changes in QEMU code to support a new chipset-less machine(s). In
>   theory might be possible to implement on top of the "null" machine
>   concept

Not all parts of the chipset should go inside of Xen, ATM I only
foresee Q35 MCH being implemented inside of Xen. So I'm not sure
calling this a chipset-less machine is correct from QEMU PoV.

> - Major changes in Xen code to implement the actual chipset emulation
>   there
> 
> - Changes on the toolstack side as the emulated machine will be
>   selected and used differently
> 
> - Moving passthrough support from QEMU to Xen will likely require to
>   re-divide areas of responsibility for PCI device passthrough between
>   xen-pciback and the hypervisor. It might be more convenient to
>   perform some tasks of xen-pciback in Xen directly

Moving pci-passthough from QEMU to Xen is IMO a separate project, and
by the text you provide I'm not sure how is that related to the Q35
chipset implementation.

> - strong dependency between Xen/libxl/QEMU/etc versions -- any outdated
>   component will be a major problem. Can be resolved by providing some
>   compatibility code

Well, you would only be able to use the Q35 feature with the right
version of the components.

> - longer implementation time
> 
> Risks:
> --
> 
> - A major architecture change with possible issues encountered during
>   the implementation
> 
> - Moving the e

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-27 Thread Alexey G
On Tue, 27 Mar 2018 09:45:30 +0100
Roger Pau Monné  wrote:

>On Tue, Mar 27, 2018 at 05:42:11AM +1000, Alexey G wrote:
>> On Mon, 26 Mar 2018 10:24:38 +0100
>> Roger Pau Monné  wrote:
>>   
>> >On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:  
>> [...]  
>> >> In fact, the emulated chipset (NB+SB combo without supplemental
>> >> devices) itself is a small part of required emulation. It's
>> >> relatively easy to provide own analogs of for eg. 'mch' and
>> >> 'ICH9-LPC' QEMU PCIDevice's, the problem is to glue all remaining
>> >> parts together.
>> >> 
>> >> I assume the final goal in this case is to have only a set of
>> >> necessary QEMU PCIDevice's for which we will be providing I/O,
>> >> MMIO and PCI conf trapping facilities. Only devices such as
>> >> rtl8139, ich9-ahci and few others.
>> >> 
>> >> Basically, this means a new, chipset-less QEMU machine type.
>> >> Well, in theory it is possible with a bit of effort I think. The
>> >> main question is where will be the NB/SB/PCIbus emulating part
>> >> reside in this case.
>> >
>> >Mostly inside of Xen. Of course the IDE/SATA/USB/Ethernet... part of
>> >the southbrigde will be emulated by a device model (ie: QEMU).
>> >
>> >As you mention above, I also took a look and it seems like the
>> >amount of registers that we should emulate for Q35 DRAM controller
>> >(D0:F0) is fairly minimal based on current QEMU implementation. We
>> >could even possibly get away by just emulating PCIEXBAR.  
>> 
>> MCH emulation alone might be not an option. Besides, some
>> southbridge-specific features like emulating ACPI PM facilities for
>> domain power management (basically, anything at PMBASE) will be
>> preferable to implement on Xen side, especially considering the fact
>> that ACPI tables are already provided by Xen's libacpi/hvmloader, not
>> the device model.  
>
>Likely, but AFAICT this is kind of already broken, because PM1a and
>TMR is already emulated by Xen at hardcoded values. See
>xen/arch/x86/hvm/pmtimer.c.

Yes, that should be an argument to try to implement PMBASE emulation in
Xen too. Although this needs to be checked against dependencies in
QEMU first, especially with ACPI-related code.

This way we can have a better flexibility to use an arbitrary PMBASE
value, not just having to hardcode it to ACPI_PM1A_EVT_BLK_ADDRESS_V1
in all related components.

>> I think the feature may require to cover at least the NB+SB
>> combination, at least Q35 MCH + ICH9 for start, ideally 82441FX+PIIX4
>> as well. Also, Xen should control emulated/PT PCI device placement.  
>
>Q35 MCH (D0:F0) it's required in order to trap access to PCIEXBAR.

Absolutely.


BTW, another somewhat related problem at the moment is that Xen knows
nothing about a chipset-specific MMIO hole(s). Due to this, it is
possible for a guest to map PT BARs outside the MMIO hole, leading to
errors like this:

(XEN) memory_map:remove: dom4 gfn=c8000 mfn=c8000 nr=2000
(XEN) memory_map:add: dom4 gfn=c8000 mfn=c8000 nr=2000
(XEN) p2m.c:1121:d0v5 p2m_set_entry: 0xc8000:9 -> -22 (0xc8000)
(XEN) memory_map:fail: dom4 gfn=c8000 mfn=c8000 nr=2000 ret:-22
(XEN) memory_map:remove: dom4 gfn=c8000 mfn=c8000 nr=2000
(XEN) p2m.c:1228:d0v5 gfn_to_mfn failed! gfn=c8000 type:4
(XEN) memory_map: error -22 removing dom4 access to [c8000,c9fff]
(XEN) memory_map:remove: dom4 gfn=c8000 mfn=c8000 nr=2000
(XEN) p2m.c:1228:d0v5 gfn_to_mfn failed! gfn=c8000 type:4
(XEN) memory_map: error -22 removing dom4 access to [c8000,c9fff]
(XEN) memory_map:add: dom4 gfn=c8000 mfn=c8000 nr=2000

Note that it was merely a lame BAR sizing attempt from the guest-side SW
(a PCI config space viewing tool) -- writing F's to the high part of the
MMIO BAR first.

If we will know the guest's MMIO hole bounds, we can adapt to this
behavior, avoiding erroneous mapping attempts to a wrong address
outside the MMIO hole. Only the MMIO hole designated range can be used
to map PT device BARs.

So, if we will be actually emulating MCH's MMIO hole related registers
in Xen as well -- we can use them as scratchpad registers (write-once
of course) to pass this kind of information between Xen and other
involved parties as an alternative to eg. a dedicated hypercall.

>Could you be more concise about ICH9?
>
>The ICH9 spec contains multiple devices, for example it includes an
>ethernet controller and a SATA controller, which we should not emulate
>inside of Xen.

ICH built-in devices from out PoV can be considered as distinct PCI
devices (as long as they're actually distinct devices in PCI config
space).
It's a QEMU's approach for them -- these devices can be added to a q35
machine optionally. Only a minimal set of devices provided initially,
like MCH/LPC/AHCI. SMBus controller (0:1F.3) added by default too, but
it's not useful much at the moment.

So mostly we can consider the LPC bridge (0:1F.0) for emulation of
all devices provided by a real ICH SB.

>> II. (a new feature) Move chipset emu

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-28 Thread Roger Pau Monné
On Wed, Mar 28, 2018 at 01:37:29AM +1000, Alexey G wrote:
> On Tue, 27 Mar 2018 09:45:30 +0100
> Roger Pau Monné  wrote:
> 
> >On Tue, Mar 27, 2018 at 05:42:11AM +1000, Alexey G wrote:
> >> On Mon, 26 Mar 2018 10:24:38 +0100
> >> Roger Pau Monné  wrote:
> >>   
> >> >On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:  

> BTW, another somewhat related problem at the moment is that Xen knows
> nothing about a chipset-specific MMIO hole(s). Due to this, it is
> possible for a guest to map PT BARs outside the MMIO hole, leading to
> errors like this:
> 
> (XEN) memory_map:remove: dom4 gfn=c8000 mfn=c8000 nr=2000
> (XEN) memory_map:add: dom4 gfn=c8000 mfn=c8000 nr=2000
> (XEN) p2m.c:1121:d0v5 p2m_set_entry: 0xc8000:9 -> -22 (0xc8000)
> (XEN) memory_map:fail: dom4 gfn=c8000 mfn=c8000 nr=2000 ret:-22
> (XEN) memory_map:remove: dom4 gfn=c8000 mfn=c8000 nr=2000
> (XEN) p2m.c:1228:d0v5 gfn_to_mfn failed! gfn=c8000 type:4
> (XEN) memory_map: error -22 removing dom4 access to [c8000,c9fff]
> (XEN) memory_map:remove: dom4 gfn=c8000 mfn=c8000 nr=2000
> (XEN) p2m.c:1228:d0v5 gfn_to_mfn failed! gfn=c8000 type:4
> (XEN) memory_map: error -22 removing dom4 access to [c8000,c9fff]
> (XEN) memory_map:add: dom4 gfn=c8000 mfn=c8000 nr=2000
> 
> Note that it was merely a lame BAR sizing attempt from the guest-side SW
> (a PCI config space viewing tool) -- writing F's to the high part of the
> MMIO BAR first.

You should disable memory decoding before attempting to size a BAR.

This error has nothing to do with trying to move a BAR outside of the
MMIO hole, this error is caused by the gfn being bigger than the guest
physical address width AFAICT.

> If we will know the guest's MMIO hole bounds, we can adapt to this
> behavior, avoiding erroneous mapping attempts to a wrong address
> outside the MMIO hole. Only the MMIO hole designated range can be used
> to map PT device BARs.
> 
> So, if we will be actually emulating MCH's MMIO hole related registers
> in Xen as well -- we can use them as scratchpad registers (write-once
> of course) to pass this kind of information between Xen and other
> involved parties as an alternative to eg. a dedicated hypercall.

I'm not sure where this information is stored in MCH, guest OSes tend
to fetch this from the ACPI _CRS method of the host-pci bridge device.

I also don't see QEMU emulating such registers, but yes, I won't be
opposed to storing/reporting this in some registers if that's indeed
supported. Note that I don't think this should be mandatory for adding
Q35 support though.

> >> What this approach will require:
> >> 
> >> 
> >> - Changes in QEMU code to support a new chipset-less machine(s). In
> >>   theory might be possible to implement on top of the "null" machine
> >>   concept  
> >
> >Not all parts of the chipset should go inside of Xen, ATM I only
> >foresee Q35 MCH being implemented inside of Xen. So I'm not sure
> >calling this a chipset-less machine is correct from QEMU PoV.
> 
> Emulating only MCH in Xen will still require lot of changes but 
> overall benefit will become unclear -- basically, we just move
> PCIEXBAR emulation to Xen from QEMU.

At least it would make Xen the one controlling the MCFG area, which is
important. It would also be the first step into moving other chipset
functionality into Xen.

Not doing it just perpetuates the bad precedent that we already have
with the previous chipset.

> >What are specifically the registers that you mention?
> 
> Write-once emulation of TOLUD/TOUUD/REMAPBASE/REMAPLIMIT registers for
> hvmloader to use. That's the approach I'm actually using to make
> 'hvmloader/allow-memory-relocate=1' to work. Memory relocation without
> relying on add_to_physmap hypercall for hvmloader (which it does
> currently) while having MMIO/memory layout synchronized between all
> parties. There are multiple benefits (mostly for PT needs), including
> the MMIO hole auto-sizing support but this approach won't be accepted
> well with "toolstack should do everything" attitude I'm afraid.

You seem to be trying to fix several issues at the same time, which
just makes this much more complex than needed. The initial aim of this
series was to allow HVM guests to use the Q35 chipset. I think that's
what we should focus on.

As you have listed above (and in other emails) there are many
limitations with the current HVM approach, which I would be more than
happy for you to solve. But IMO not all of them must be solved in
order to add Q35 support.

Since this series and email thread has already gone quite far, would
you mind writing a design document with the approach that we
discussed?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-28 Thread Paul Durrant
> -Original Message-
> >IMO a single entity should be in control of the memory layout, and
> >that's the toolstack.
> >
> >Ideally we should not allow the firmware to change the layout at all.
> 
> This approach is terribly wrong, I don't know why opinions like this
> so common at Citrix. The toolstack is a least informed side. If
> MMIO/memory layout should be immutable, it must be calculated
> considering all factors, like chipset-specific MMIO ranges or ranges
> which cannot be used for the MMIO hole.
> 

Why is this approach wrong? Code running in the guest is non-privileged and we 
really don't want it messing around with memory layout. We really want to be in 
a position to e.g. build ACPI tables in the toolstack and we cannot do this 
until the layout becomes immutable.

> We need to know all resource requirements of device-model's and PT
> PCI devices, all chipset-specific MMIO ranges (which belong to a device
> model), all RMRRs (host's property) and all device-model invisible
> ranges like VRAM backing store (another device model's property).

Yes, indeed we do.

> And we need to know in which manner hvmloader will be allocating BARs
> to the MMIO hole -- eg. either in a forward direction starting from some
> base or moving backwards from the end of 4Gb (minus hardcoded ranges).

Eventually we want to get rid of hvmloader. Why do we need to know anything 
about its enumeration of BARs? After all they could be completely re-enumerated 
by the guest OS during or after boot (and indeed Windows does precisely that).

> Basically this means that we have to depend on hvmloader code/version
> too in the toolstack, which is wrong on its own -- we should have a
> freedom to modify the BAR allocation algo in hvmloader at any time.
> 

It should be irrelevant. The toolstack should decide on the sizes and locations 
of the MMIO holes and they should remain fixed, and be enforced by Xen. This 
avoids issues that we currently have such as guests populating RAM inside MMIO 
holes.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-28 Thread Alexey G
On Wed, 28 Mar 2018 10:30:32 +0100
Roger Pau Monné  wrote:

>On Wed, Mar 28, 2018 at 01:37:29AM +1000, Alexey G wrote:
>> On Tue, 27 Mar 2018 09:45:30 +0100
>> Roger Pau Monné  wrote:
>>   
>> >On Tue, Mar 27, 2018 at 05:42:11AM +1000, Alexey G wrote:  
>> >> On Mon, 26 Mar 2018 10:24:38 +0100
>> >> Roger Pau Monné  wrote:
>> >> 
>> >> >On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote:
>
>> BTW, another somewhat related problem at the moment is that Xen knows
>> nothing about a chipset-specific MMIO hole(s). Due to this, it is
>> possible for a guest to map PT BARs outside the MMIO hole, leading to
>> errors like this:
>> 
>> (XEN) memory_map:remove: dom4 gfn=c8000 mfn=c8000 nr=2000
>> (XEN) memory_map:add: dom4 gfn=c8000 mfn=c8000 nr=2000
>> (XEN) p2m.c:1121:d0v5 p2m_set_entry: 0xc8000:9 -> -22
>> (0xc8000) (XEN) memory_map:fail: dom4 gfn=c8000 mfn=c8000
>> nr=2000 ret:-22 (XEN) memory_map:remove: dom4 gfn=c8000
>> mfn=c8000 nr=2000 (XEN) p2m.c:1228:d0v5 gfn_to_mfn failed!
>> gfn=c8000 type:4 (XEN) memory_map: error -22 removing dom4
>> access to [c8000,c9fff] (XEN) memory_map:remove: dom4
>> gfn=c8000 mfn=c8000 nr=2000 (XEN) p2m.c:1228:d0v5 gfn_to_mfn
>> failed! gfn=c8000 type:4 (XEN) memory_map: error -22
>> removing dom4 access to [c8000,c9fff] (XEN) memory_map:add: dom4
>> gfn=c8000 mfn=c8000 nr=2000
>> 
>> Note that it was merely a lame BAR sizing attempt from the
>> guest-side SW (a PCI config space viewing tool) -- writing F's to
>> the high part of the MMIO BAR first.  
>
>You should disable memory decoding before attempting to size a BAR.

The problem is, that PCI config space viewer is not mine. :)
It should disable the decoding first normally, yes, but it doesn't. Yet
there are no problems on the real system and these errors while being
run in a VM. IIRC powercycling the guest and triggering these errors
multiple times even had negative impact on host's stability, so it's a
good test case.

>This error has nothing to do with trying to move a BAR outside of the
>MMIO hole, this error is caused by the gfn being bigger than the guest
>physical address width AFAICT.

In fact, it's the essence of the error -- an attempt to map the range
where it shouldn't be attempted to map at all.
p2m_set_entry is too deep to encounter this error, it should be avoided
much earlier. If we knew the limits where we can (and cannot) map the
PT device BARs, we can check if we really need to proceed with the
mapping. This way we can handle that "mid-sizing/mid-change" condition
when only half of the 64-bit mem BAR has been written.

>> If we will know the guest's MMIO hole bounds, we can adapt to this
>> behavior, avoiding erroneous mapping attempts to a wrong address
>> outside the MMIO hole. Only the MMIO hole designated range can be
>> used to map PT device BARs.
>> 
>> So, if we will be actually emulating MCH's MMIO hole related
>> registers in Xen as well -- we can use them as scratchpad registers
>> (write-once of course) to pass this kind of information between Xen
>> and other involved parties as an alternative to eg. a dedicated
>> hypercall.  
>
>I'm not sure where this information is stored in MCH, guest OSes tend
>to fetch this from the ACPI _CRS method of the host-pci bridge device.
>
>I also don't see QEMU emulating such registers, but yes, I won't be
>opposed to storing/reporting this in some registers if that's indeed
>supported. Note that I don't think this should be mandatory for adding
>Q35 support though.

This info needed for Xen, not guest OSes -- in order to avoid errors
like described above. If we will be emulating MCH in Xen internally, we
can emulate this registers as well. It will be simpler than introducing
a new hypercall to inform Xen about the established MMIO hole range.

Anyway, you're right, it's a side issue. Just an example for what else
the built-in MCH emulation may be useful.

>> >> What this approach will require:
>> >> 
>> >> 
>> >> - Changes in QEMU code to support a new chipset-less machine(s).
>> >> In theory might be possible to implement on top of the "null"
>> >> machine concept
>> >
>> >Not all parts of the chipset should go inside of Xen, ATM I only
>> >foresee Q35 MCH being implemented inside of Xen. So I'm not sure
>> >calling this a chipset-less machine is correct from QEMU PoV.  
>> 
>> Emulating only MCH in Xen will still require lot of changes but 
>> overall benefit will become unclear -- basically, we just move
>> PCIEXBAR emulation to Xen from QEMU.  
>
>At least it would make Xen the one controlling the MCFG area, which is
>important. It would also be the first step into moving other chipset
>functionality into Xen.
>
>Not doing it just perpetuates the bad precedent that we already have
>with the previous chipset.

I think it will be kinda ugly if we will be emulating just MCH in Xen
and ICH9 (+ all the rest) in QEMU at the same time. It looks more like
so

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-28 Thread Paul Durrant
> -Original Message-
> 
> I think we must all agree which approach to implement next. Basically,
> whether we need to completely discard the option #1 for this series and
> move on with #2. That lengthy requirements/risks email was an attempt to
> provide some ground for comparison.
> 
> Leaving only required devices like vga/usb/network/storage to QEMU while
> emulating everything else in Xen is a good milestone, but, as I
> understood we currently targeting less ambitious goals for option #2 --
> emulating only MCH in Xen while emulating ICH9 etc in QEMU.

Option #2 is right direction architecturally; the trick is figuring out how to 
get there in stages.

I think the fact that Xen emulation obscures QEMU emulation means that we can 
start to do this without needing too much, if any, change in QEMU. It looks 
like handling MMCONFIG inside Xen would be a reasonable first stage. Stage two 
could be explanding Roger's vpci work to handle PCI pass-through to guests. Not 
sure what would come next.

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-28 Thread Alexey G
On Wed, 28 Mar 2018 10:03:29 +
Paul Durrant  wrote:
>> >IMO a single entity should be in control of the memory layout, and
>> >that's the toolstack.
>> >
>> >Ideally we should not allow the firmware to change the layout at
>> >all.  
>> 
>> This approach is terribly wrong, I don't know why opinions like this
>> so common at Citrix. The toolstack is a least informed side. If
>> MMIO/memory layout should be immutable, it must be calculated
>> considering all factors, like chipset-specific MMIO ranges or ranges
>> which cannot be used for the MMIO hole.
>>   
>
>Why is this approach wrong? Code running in the guest is
>non-privileged and we really don't want it messing around with memory
>layout. We really want to be in a position to e.g. build ACPI tables
>in the toolstack and we cannot do this until the layout becomes
>immutable.

Only firmware code in the guest can correctly determine guest's MMIO
hole requirements (BIOS typically, but hvmloader in our case).

It is impossible to do in the toolstack at the moment, because it
doesn't know at least
- MMIO BAR sizes of device-model's PCI devices
- chipset-specific MMIO ranges the DM emulates for a chosen machine
- the way these ranges are allocated to the MMIO hole by guest firmware

Even providing some interface to query all related information from a
device model won't cover the problem how firmware will be allocating
these ranges to the MMIO hole. Any code (or version) change can make
the toolstack's expectations wrong ->

>> We need to know all resource requirements of device-model's and PT
>> PCI devices, all chipset-specific MMIO ranges (which belong to a
>> device model), all RMRRs (host's property) and all device-model
>> invisible ranges like VRAM backing store (another device model's
>> property).  
>
>Yes, indeed we do.
>
>> And we need to know in which manner hvmloader will be allocating BARs
>> to the MMIO hole -- eg. either in a forward direction starting from
>> some base or moving backwards from the end of 4Gb (minus hardcoded
>> ranges).  
>
>Eventually we want to get rid of hvmloader.

...especially if BAR allocation will be delegated from hvmloader to
other firmware like SeaBIOS/OVMF.

> Why do we need to know
>anything about its enumeration of BARs? After all they could be
>completely re-enumerated by the guest OS during or after boot (and
>indeed Windows does precisely that).

You probably confuse BAR assignment with BAR values enumeration.

Windows reallocates all PCI BARs in specific conditions only, they call
this feature 'PCI resources rebalancing'. Normally it sticks to the PCI
BAR allocation setup provided by firmware (hvmloader in our case). BAR
reallocation doesn't really matter as long as we have a correct MMIO
hole size.

The very last thing a user needs to do is guessing the correct value of
the mmio_hole_size parameter, which value will be ok for all his PT
devices while being not too large at the same time to leave more RAM
for 32-bit guests.

Those 32-bit guests are most problematic for MMIO hole sizing. We should
try to keep the MMIO hole size as small as possible to reduce RAM losses
while at the same time we are not permitted to allocate any BARs to the
high MMIO hole -- moving 64-bit BARs above 4Gb will automatically make
such devices non-functional for 32-bit guests.

This means we need to calculate the precise MMIO hole size, according
to all factors. And only firmware code in the guest can do it right.

>> Basically this means that we have to depend on hvmloader code/version
>> too in the toolstack, which is wrong on its own -- we should have a
>> freedom to modify the BAR allocation algo in hvmloader at any time.
>>   
>
>It should be irrelevant. The toolstack should decide on the sizes and
>locations of the MMIO holes and they should remain fixed, and be
>enforced by Xen. This avoids issues that we currently have such as
>guests populating RAM inside MMIO holes.

The toolstack can't do it. There should be some one-time way to
communicate the MMIO hole setup between Xen and hvmloader. And after
that we can make it immutable.

Write-once interface via emulated platform registers (either designated
for this purpose or any arbitrarily chosen) is a safe approach. We have
full control of what can be provided or allowed to guest firmware via
this interface. A dedicated hypercall should be ok too, but it's a bit
overkill.

What is definitely not safe is to allow hvmloader to use the
add_to_physmap hypercall.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-05-29 Thread Jan Beulich
>>> On 12.03.18 at 19:33,  wrote:
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>  #define PCI_ISA_DEVFN   0x08/* dev 1, fn 0 */
>  #define PCI_ISA_IRQ_MASK0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>  #define PCI_ICH9_LPC_DEVFN  0xf8/* dev 31, fn 0 */
> +#define PCI_MCH_DEVFN   0   /* bus 0, dev 0, func 0 */

Just MCH is liable to become ambiguous in the future. Perhaps PCI_Q35_MCH_DEVFN?

> @@ -172,10 +173,14 @@ void pci_setup(void)
>  
>  /* Create a list of device BARs in descending order of size. */
>  struct bars {
> -uint32_t is_64bar;
>  uint32_t devfn;
>  uint32_t bar_reg;
>  uint64_t bar_sz;
> +uint64_t addr_mask; /* which bits of the base address can be written 
> */
> +uint32_t bar_data;  /* initial value - BAR flags here */

Why 32 bits? You only use the low few ones afaics. Also please avoid fixed width
types unless you really need them.

> @@ -259,13 +264,21 @@ void pci_setup(void)
>  bar_reg = PCI_ROM_ADDRESS;
>  
>  bar_data = pci_readl(devfn, bar_reg);
> +
> +is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> +   PCI_BASE_ADDRESS_SPACE_MEMORY) ||
> +   (bar_reg == PCI_ROM_ADDRESS));

Once you make is_mem properly bool, !! won't be needed anymore.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-05-29 Thread Alexey G
On Tue, 29 May 2018 08:23:51 -0600
"Jan Beulich"  wrote:

 On 12.03.18 at 19:33,  wrote:  
>> --- a/tools/firmware/hvmloader/config.h
>> +++ b/tools/firmware/hvmloader/config.h
>> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>>  #define PCI_ISA_DEVFN   0x08/* dev 1, fn 0 */
>>  #define PCI_ISA_IRQ_MASK0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>>  #define PCI_ICH9_LPC_DEVFN  0xf8/* dev 31, fn 0 */
>> +#define PCI_MCH_DEVFN   0   /* bus 0, dev 0, func 0 */  
>
>Just MCH is liable to become ambiguous in the future. Perhaps 
>PCI_Q35_MCH_DEVFN?

Agree, PCI_Q35_MCH_DEVFN is more explicit.

>> @@ -172,10 +173,14 @@ void pci_setup(void)
>>  
>>  /* Create a list of device BARs in descending order of size. */
>>  struct bars {
>> -uint32_t is_64bar;
>>  uint32_t devfn;
>>  uint32_t bar_reg;
>>  uint64_t bar_sz;
>> +uint64_t addr_mask; /* which bits of the base address can be 
>> written */
>> +uint32_t bar_data;  /* initial value - BAR flags here */  
>
>Why 32 bits? You only use the low few ones afaics. Also please avoid fixed 
>width
>types unless you really need them.

bar_data is supposed to hold only BAR's kludge bits like 'enabled' bit
values or MMCONFIG width bits. All of them occupy the low dword only
while BAR's high dword is just a part of the address which will be
replaced by allocated one (for mem64 BARs), thus no need to keep the
high half.

So this is a sort of minor optimization -- avoiding using 64-bit operand
size when 32 bit is enough.

>> @@ -259,13 +264,21 @@ void pci_setup(void)
>>  bar_reg = PCI_ROM_ADDRESS;
>>  
>>  bar_data = pci_readl(devfn, bar_reg);
>> +
>> +is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>> +   PCI_BASE_ADDRESS_SPACE_MEMORY) ||
>> +   (bar_reg == PCI_ROM_ADDRESS));  
>
>Once you make is_mem properly bool, !! won't be needed anymore.

OK, will switch to bool.

>Jan
>
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-05-29 Thread Alexey G
On Wed, 30 May 2018 03:56:07 +1000
Alexey G  wrote:

>On Tue, 29 May 2018 08:23:51 -0600
>"Jan Beulich"  wrote:
>
> On 12.03.18 at 19:33,  wrote:
>>> --- a/tools/firmware/hvmloader/config.h
>>> +++ b/tools/firmware/hvmloader/config.h
>>> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>>>  #define PCI_ISA_DEVFN   0x08/* dev 1, fn 0 */
>>>  #define PCI_ISA_IRQ_MASK0x0c20U /* ISA IRQs 5,10,11 are PCI connected 
>>> */
>>>  #define PCI_ICH9_LPC_DEVFN  0xf8/* dev 31, fn 0 */
>>> +#define PCI_MCH_DEVFN   0   /* bus 0, dev 0, func 0 */
>>
>>Just MCH is liable to become ambiguous in the future. Perhaps 
>>PCI_Q35_MCH_DEVFN?  
>
>Agree, PCI_Q35_MCH_DEVFN is more explicit.
>
>>> @@ -172,10 +173,14 @@ void pci_setup(void)
>>>  
>>>  /* Create a list of device BARs in descending order of size. */
>>>  struct bars {
>>> -uint32_t is_64bar;
>>>  uint32_t devfn;
>>>  uint32_t bar_reg;
>>>  uint64_t bar_sz;
>>> +uint64_t addr_mask; /* which bits of the base address can be 
>>> written */
>>> +uint32_t bar_data;  /* initial value - BAR flags here */
>>
>>Why 32 bits? You only use the low few ones afaics. Also please avoid fixed 
>>width
>>types unless you really need them.  
>
>bar_data is supposed to hold only BAR's kludge bits like 'enabled' bit
>values or MMCONFIG width bits. All of them occupy the low dword only
>while BAR's high dword is just a part of the address which will be
>replaced by allocated one (for mem64 BARs), thus no need to keep the
>high half.
>
>So this is a sort of minor optimization -- avoiding using 64-bit operand
>size when 32 bit is enough.

Sorry, looks like I've misread the question. You were actually 
suggesting to make bar_data shorter. 8 bits is enough at the moment, so
bar_data can be changed to uint8_t, yes.

Regarding avoiding using bool here -- the only reason was adapting to
the existing code style. For some reason the existing hvmloader code
prefers to use uint-types for bool values.

>>> @@ -259,13 +264,21 @@ void pci_setup(void)
>>>  bar_reg = PCI_ROM_ADDRESS;
>>>  
>>>  bar_data = pci_readl(devfn, bar_reg);
>>> +
>>> +is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
>>> +   PCI_BASE_ADDRESS_SPACE_MEMORY) ||
>>> +   (bar_reg == PCI_ROM_ADDRESS));
>>
>>Once you make is_mem properly bool, !! won't be needed anymore.  
>
>OK, will switch to bool.
>
>>Jan
>>
>>  
>


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-05-29 Thread Alexey G
>On Wed, 30 May 2018 03:56:07 +1000
>Alexey G  wrote:
>
>>On Tue, 29 May 2018 08:23:51 -0600
>>"Jan Beulich"  wrote:
>>  
>> On 12.03.18 at 19:33,  wrote:  
 --- a/tools/firmware/hvmloader/config.h
 +++ b/tools/firmware/hvmloader/config.h
 @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
  #define PCI_ISA_DEVFN   0x08/* dev 1, fn 0 */
  #define PCI_ISA_IRQ_MASK0x0c20U /* ISA IRQs 5,10,11 are PCI connected 
 */
  #define PCI_ICH9_LPC_DEVFN  0xf8/* dev 31, fn 0 */
 +#define PCI_MCH_DEVFN   0   /* bus 0, dev 0, func 0 */  
>>>
>>>Just MCH is liable to become ambiguous in the future. Perhaps 
>>>PCI_Q35_MCH_DEVFN?
>>
>>Agree, PCI_Q35_MCH_DEVFN is more explicit.

On the other thought, we can reuse one MCH BDF #define for multiple
emulated chipsets, not just for something completely distinct to Q35
but even for those which mostly require merely changing PCI DIDs (like
P35 etc.) So in this case producing multiple #defines like
PCI_{Q|P|G}35_MCH_DEVFN for the same BDF 0:0.0 might be excessive.

PCI_ICH9_LPC_DEVFN can be actually reused too, its BDF location
survived many chipset generations so its #define can be shared as well
(though renamed to something like PCI_LPC_BRIDGE_DEVFN).

 @@ -172,10 +173,14 @@ void pci_setup(void)
  
  /* Create a list of device BARs in descending order of size. */
  struct bars {
 -uint32_t is_64bar;
  uint32_t devfn;
  uint32_t bar_reg;
  uint64_t bar_sz;
 +uint64_t addr_mask; /* which bits of the base address can be 
 written */
 +uint32_t bar_data;  /* initial value - BAR flags here */  
>>>

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-05-30 Thread Jan Beulich
>>> On 30.05.18 at 06:32,  wrote:
>> On Wed, 30 May 2018 03:56:07 +1000
>>Alexey G  wrote:
>>
>>>On Tue, 29 May 2018 08:23:51 -0600
>>>"Jan Beulich"  wrote:
>>>  
>>> On 12.03.18 at 19:33,  wrote:  
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>  #define PCI_ISA_DEVFN   0x08/* dev 1, fn 0 */
>  #define PCI_ISA_IRQ_MASK0x0c20U /* ISA IRQs 5,10,11 are PCI 
> connected */
>  #define PCI_ICH9_LPC_DEVFN  0xf8/* dev 31, fn 0 */
> +#define PCI_MCH_DEVFN   0   /* bus 0, dev 0, func 0 */  

Just MCH is liable to become ambiguous in the future. Perhaps 
PCI_Q35_MCH_DEVFN?
>>>
>>>Agree, PCI_Q35_MCH_DEVFN is more explicit.
> 
> On the other thought, we can reuse one MCH BDF #define for multiple
> emulated chipsets, not just for something completely distinct to Q35
> but even for those which mostly require merely changing PCI DIDs (like
> P35 etc.) So in this case producing multiple #defines like
> PCI_{Q|P|G}35_MCH_DEVFN for the same BDF 0:0.0 might be excessive.
> 
> PCI_ICH9_LPC_DEVFN can be actually reused too, its BDF location
> survived many chipset generations so its #define can be shared as well
> (though renamed to something like PCI_LPC_BRIDGE_DEVFN).

PCI_x35_MCH_DEVFN then, with a brief comment explaining the x?

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-05-30 Thread Jan Beulich
>>> On 29.05.18 at 20:47,  wrote:
> On Wed, 30 May 2018 03:56:07 +1000
> Alexey G  wrote:
>>On Tue, 29 May 2018 08:23:51 -0600
>>"Jan Beulich"  wrote:
>> On 12.03.18 at 19:33,  wrote:
 @@ -172,10 +173,14 @@ void pci_setup(void)
  
  /* Create a list of device BARs in descending order of size. */
  struct bars {
 -uint32_t is_64bar;
  uint32_t devfn;
  uint32_t bar_reg;
  uint64_t bar_sz;
 +uint64_t addr_mask; /* which bits of the base address can be 
 written */
 +uint32_t bar_data;  /* initial value - BAR flags here */
>>>
>>>Why 32 bits? You only use the low few ones afaics. Also please avoid fixed 
>>>width
>>>types unless you really need them.  
>>
>>bar_data is supposed to hold only BAR's kludge bits like 'enabled' bit
>>values or MMCONFIG width bits. All of them occupy the low dword only
>>while BAR's high dword is just a part of the address which will be
>>replaced by allocated one (for mem64 BARs), thus no need to keep the
>>high half.
>>
>>So this is a sort of minor optimization -- avoiding using 64-bit operand
>>size when 32 bit is enough.
> 
> Sorry, looks like I've misread the question. You were actually 
> suggesting to make bar_data shorter. 8 bits is enough at the moment, so
> bar_data can be changed to uint8_t, yes.

Right.

> Regarding avoiding using bool here -- the only reason was adapting to
> the existing code style. For some reason the existing hvmloader code
> prefers to use uint-types for bool values.

And wrongly so. We're slowly moving over, and we'd prefer the issue to
not be widened by new code.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-05-30 Thread Alexey G
On Wed, 30 May 2018 02:13:30 -0600
"Jan Beulich"  wrote:

 On 30.05.18 at 06:32,  wrote:  
>>> On Wed, 30 May 2018 03:56:07 +1000
>>>Alexey G  wrote:
>>>  
On Tue, 29 May 2018 08:23:51 -0600
"Jan Beulich"  wrote:

 On 12.03.18 at 19:33,  wrote:
>> --- a/tools/firmware/hvmloader/config.h
>> +++ b/tools/firmware/hvmloader/config.h
>> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>>  #define PCI_ISA_DEVFN   0x08/* dev 1, fn 0 */
>>  #define PCI_ISA_IRQ_MASK0x0c20U /* ISA IRQs 5,10,11 are PCI 
>> connected */
>>  #define PCI_ICH9_LPC_DEVFN  0xf8/* dev 31, fn 0 */
>> +#define PCI_MCH_DEVFN   0   /* bus 0, dev 0, func 0 */
>
>Just MCH is liable to become ambiguous in the future. Perhaps 
>PCI_Q35_MCH_DEVFN?  

Agree, PCI_Q35_MCH_DEVFN is more explicit.  
>> 
>> On the other thought, we can reuse one MCH BDF #define for multiple
>> emulated chipsets, not just for something completely distinct to Q35
>> but even for those which mostly require merely changing PCI DIDs (like
>> P35 etc.) So in this case producing multiple #defines like
>> PCI_{Q|P|G}35_MCH_DEVFN for the same BDF 0:0.0 might be excessive.
>> 
>> PCI_ICH9_LPC_DEVFN can be actually reused too, its BDF location
>> survived many chipset generations so its #define can be shared as well
>> (though renamed to something like PCI_LPC_BRIDGE_DEVFN).  
>
>PCI_x35_MCH_DEVFN then, with a brief comment explaining the x?

Hmm, I'm afraid there are too many chipsets sharing similarity with Q35,
including x31 and x33 series. Also, it might be confusing due to
existence of X-series chipsets like Intel X38.

I think it's better to rename this #define to PCI_Q35_MCH_DEVFN for now
as you suggested and leave the choice of unified names  for anyone (if
any) who will be actually adding P35/G35/etc emulation on top of Q35's.
So far we're limited to Q35 after all.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-05-30 Thread Alexey G
On Wed, 30 May 2018 02:12:37 -0600
"Jan Beulich"  wrote:

 On 29.05.18 at 20:47,  wrote:  
>> On Wed, 30 May 2018 03:56:07 +1000
>> Alexey G  wrote:  
>>>On Tue, 29 May 2018 08:23:51 -0600
>>>"Jan Beulich"  wrote:  
>>> On 12.03.18 at 19:33,  wrote:  
> @@ -172,10 +173,14 @@ void pci_setup(void)
>  
>  /* Create a list of device BARs in descending order of size. */
>  struct bars {
> -uint32_t is_64bar;
>  uint32_t devfn;
>  uint32_t bar_reg;
>  uint64_t bar_sz;
> +uint64_t addr_mask; /* which bits of the base address can be 
> written */
> +uint32_t bar_data;  /* initial value - BAR flags here */  

Why 32 bits? You only use the low few ones afaics. Also please avoid fixed 
width
types unless you really need them.
>>>
>>>bar_data is supposed to hold only BAR's kludge bits like 'enabled' bit
>>>values or MMCONFIG width bits. All of them occupy the low dword only
>>>while BAR's high dword is just a part of the address which will be
>>>replaced by allocated one (for mem64 BARs), thus no need to keep the
>>>high half.
>>>
>>>So this is a sort of minor optimization -- avoiding using 64-bit operand
>>>size when 32 bit is enough.  
>> 
>> Sorry, looks like I've misread the question. You were actually 
>> suggesting to make bar_data shorter. 8 bits is enough at the moment, so
>> bar_data can be changed to uint8_t, yes.  
>
>Right.

Ok, I'll switch to smaller types though not sure if it will make any
significant impact I'm afraid. 

In particular, bar_data will be typically used in 32/64-bit 
arithmetics, using a 32-bit datatype means we avoiding explicit zero
extension for both 32 and 64-bit operations while for an uint8_t field
the compiler will have to provide extra MOVZX instructions to embed a
8-bit operand into 32/64-bit expressions. 32-bit bar_reg can be made
16-bit in the same way but any memory usage improvements will be
similarly counteracted by a requirement to use 66h-prefixed
instructions for it.

Anyway, as the BAR allocation code is not memory- or
time-consuming/critical, I guess any option will be good.

>> Regarding avoiding using bool here -- the only reason was adapting to
>> the existing code style. For some reason the existing hvmloader code
>> prefers to use uint-types for bool values.  
>
>And wrongly so. We're slowly moving over, and we'd prefer the issue to
>not be widened by new code.

BTW, there are other changes pending for hvmloader/pci.c which will
(hopefully :) ) replace its BAR allocation and RMRR handling code, so
this patch can be considered as sort of intermediate one -- I'm using a
heavily reworked version of hvmloader/pci.c which I'd like to upstream.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-05-31 Thread Jan Beulich
>>> Alexey G  05/31/18 7:15 AM >>>
>On Wed, 30 May 2018 02:12:37 -0600 "Jan Beulich"  wrote:
> On 29.05.18 at 20:47,  wrote:  
>>> On Wed, 30 May 2018 03:56:07 +1000
>>> Alexey G  wrote:  
On Tue, 29 May 2018 08:23:51 -0600
"Jan Beulich"  wrote:  
 On 12.03.18 at 19:33,  wrote:  
>> @@ -172,10 +173,14 @@ void pci_setup(void)
>>  
>>  /* Create a list of device BARs in descending order of size. */
>>  struct bars {
>> -uint32_t is_64bar;
>>  uint32_t devfn;
>>  uint32_t bar_reg;
>>  uint64_t bar_sz;
>> +uint64_t addr_mask; /* which bits of the base address can be 
>> written */
>> +uint32_t bar_data;  /* initial value - BAR flags here */  
>
>Why 32 bits? You only use the low few ones afaics. Also please avoid fixed 
>width
>types unless you really need them.

bar_data is supposed to hold only BAR's kludge bits like 'enabled' bit
values or MMCONFIG width bits. All of them occupy the low dword only
while BAR's high dword is just a part of the address which will be
replaced by allocated one (for mem64 BARs), thus no need to keep the
high half.

So this is a sort of minor optimization -- avoiding using 64-bit operand
size when 32 bit is enough.  
>>> 
>>> Sorry, looks like I've misread the question. You were actually 
>>> suggesting to make bar_data shorter. 8 bits is enough at the moment, so
>>> bar_data can be changed to uint8_t, yes.  
>>
>>Right.
>
>Ok, I'll switch to smaller types though not sure if it will make any
>significant impact I'm afraid. 
>
>In particular, bar_data will be typically used in 32/64-bit 
>arithmetics, using a 32-bit datatype means we avoiding explicit zero
>extension for both 32 and 64-bit operations while for an uint8_t field
>the compiler will have to provide extra MOVZX instructions to embed a
>8-bit operand into 32/64-bit expressions. 32-bit bar_reg can be made
>16-bit in the same way but any memory usage improvements will be
>similarly counteracted by a requirement to use 66h-prefixed
>instructions for it.

Hmm, yes, the space saving from using less wide types are probably indeed
not worth it. But then please switch to "unsigned int" instead of uint_t
whenever the exact size doesn't matter.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-06-01 Thread Alexey G
On Thu, 31 May 2018 23:30:35 -0600
"Jan Beulich"  wrote:

 Alexey G  05/31/18 7:15 AM >>>  
>>On Wed, 30 May 2018 02:12:37 -0600 "Jan Beulich"  wrote:  
>> On 29.05.18 at 20:47,  wrote:
 On Wed, 30 May 2018 03:56:07 +1000
 Alexey G  wrote:
>On Tue, 29 May 2018 08:23:51 -0600
>"Jan Beulich"  wrote:
> On 12.03.18 at 19:33,  wrote:
>>> @@ -172,10 +173,14 @@ void pci_setup(void)
>>>  
>>>  /* Create a list of device BARs in descending order of size. */
>>>  struct bars {
>>> -uint32_t is_64bar;
>>>  uint32_t devfn;
>>>  uint32_t bar_reg;
>>>  uint64_t bar_sz;
>>> +uint64_t addr_mask; /* which bits of the base address can be 
>>> written */
>>> +uint32_t bar_data;  /* initial value - BAR flags here */   
>>>  
>>
>>Why 32 bits? You only use the low few ones afaics. Also please avoid 
>>fixed width
>>types unless you really need them.  
>
>bar_data is supposed to hold only BAR's kludge bits like 'enabled' bit
>values or MMCONFIG width bits. All of them occupy the low dword only
>while BAR's high dword is just a part of the address which will be
>replaced by allocated one (for mem64 BARs), thus no need to keep the
>high half.
>
>So this is a sort of minor optimization -- avoiding using 64-bit operand
>size when 32 bit is enough.
 
 Sorry, looks like I've misread the question. You were actually 
 suggesting to make bar_data shorter. 8 bits is enough at the moment, so
 bar_data can be changed to uint8_t, yes.
>>>
>>>Right.  
>>
>>Ok, I'll switch to smaller types though not sure if it will make any
>>significant impact I'm afraid. 
>>
>>In particular, bar_data will be typically used in 32/64-bit 
>>arithmetics, using a 32-bit datatype means we avoiding explicit zero
>>extension for both 32 and 64-bit operations while for an uint8_t field
>>the compiler will have to provide extra MOVZX instructions to embed a
>>8-bit operand into 32/64-bit expressions. 32-bit bar_reg can be made
>>16-bit in the same way but any memory usage improvements will be
>>similarly counteracted by a requirement to use 66h-prefixed
>>instructions for it.  
>
>Hmm, yes, the space saving from using less wide types are probably indeed
>not worth it. But then please switch to "unsigned int" instead of uint_t
>whenever the exact size doesn't matter.

Ok, will do in v2.

>Jan
>
>

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-19 Thread Roger Pau Monné
On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko wrote:
> Much like normal PCI BARs or other chipset-specific memory-mapped
> resources, MMCONFIG area needs space in MMIO hole, so we must allocate
> it manually.
> 
> The actual MMCONFIG size depends on a number of PCI buses available which
> should be covered by ECAM. Possible options are 64MB, 128MB and 256MB.
> As we are limited to the bus 0 currently, thus using lowest possible
> setting (64MB), #defined via PCI_MAX_MCFG_BUSES in hvmloader/config.h.
> When multiple PCI buses support for Xen will be implemented,
> PCI_MAX_MCFG_BUSES may be changed to calculation of the number of buses
> according to results of the PCI devices enumeration.
> 
> The way to allocate MMCONFIG range in MMIO hole is similar to how other
> PCI BARs are allocated. The patch extends 'bars' structure to make
> it universal for any arbitrary BAR type -- either IO, MMIO, ROM or
> a chipset-specific resource.

I'm not sure this is fully correct. The IOREQ interface can
differentiate PCI devices and forward config space accesses to
different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change you
will forward all MCFG accesses to QEMU, which will likely be wrong if
there are multiple PCI-device emulators for the same domain.

Ie: AFAICT Xen needs to know about the MCFG emulation and detect
accesses to it in order to forward them to the right emulators.

Adding Paul who knows more about all this.

> One important new field is addr_mask, which tells which bits of the base
> address can (should) be written. Different address types (ROM, MMIO BAR,
> PCIEXBAR) will have different addr_mask values.
> 
> For every assignable BAR range we store its size, PCI device BDF (devfn
> actually) to which it belongs, BAR type (mem/io/mem64) and corresponding
> register offset in device PCI conf space. This way we can insert MMCONFIG
> entry into bars array in the same manner like for any other BARs. In this
> case, the devfn field will point to MCH PCI device and bar_reg will
> contain PCIEXBAR register offset. It will be assigned a slot in MMIO hole
> later in a very same way like for plain PCI BARs, with respect to its size
> alignment.
> 
> Also, to reduce code complexity, all long mem/mem64 BAR flags checks are
> replaced by simple bars[i] field probing, eg.:
> -if ( (bar_reg == PCI_ROM_ADDRESS) ||
> - ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> -  PCI_BASE_ADDRESS_SPACE_MEMORY) )
> +if ( bars[i].is_mem )

This should be a separate change IMO.

> 
> Signed-off-by: Alexey Gerasimenko 
> ---
>  tools/firmware/hvmloader/config.h   |   4 ++
>  tools/firmware/hvmloader/pci.c  | 127 
> 
>  tools/firmware/hvmloader/pci_regs.h |   2 +
>  3 files changed, 106 insertions(+), 27 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/config.h 
> b/tools/firmware/hvmloader/config.h
> index 6fde6b7b60..5443ecd804 100644
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>  #define PCI_ISA_DEVFN   0x08/* dev 1, fn 0 */
>  #define PCI_ISA_IRQ_MASK0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>  #define PCI_ICH9_LPC_DEVFN  0xf8/* dev 31, fn 0 */
> +#define PCI_MCH_DEVFN   0   /* bus 0, dev 0, func 0 */
>  
>  /* MMIO hole: Hardcoded defaults, which can be dynamically expanded. */
>  #define PCI_MEM_END 0xfc00
>  
> +/* possible values are: 64, 128, 256 */
> +#define PCI_MAX_MCFG_BUSES  64

What the reasoning for this value? Do we know which devices need ECAM
areas?

> +
>  #define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
>  
>  extern unsigned long pci_mem_start, pci_mem_end;
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 033bd20992..6de124bbd5 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -158,9 +158,10 @@ static void class_specific_pci_device_setup(uint16_t 
> vendor_id,
>  
>  void pci_setup(void)
>  {
> -uint8_t is_64bar, using_64bar, bar64_relocate = 0;
> +uint8_t is_64bar, using_64bar, bar64_relocate = 0, is_mem;
>  uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
>  uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
> +uint64_t addr_mask;
>  uint16_t vendor_id, device_id;
>  unsigned int bar, pin, link, isa_irq;
>  int is_running_on_q35 = 0;
> @@ -172,10 +173,14 @@ void pci_setup(void)
>  
>  /* Create a list of device BARs in descending order of size. */
>  struct bars {
> -uint32_t is_64bar;
>  uint32_t devfn;
>  uint32_t bar_reg;
>  uint64_t bar_sz;
> +uint64_t addr_mask; /* which bits of the base address can be written 
> */
> +uint32_t bar_data;  /* initial value - BAR flags here */
> +uint8_t  is_64bar;
> +uint8_t  is_mem;
> +uint8_t  padding[2];

Why are you manually adding a padding here? 

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-19 Thread Alexey G
On Mon, 19 Mar 2018 15:58:02 +
Roger Pau Monné  wrote:

>On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko wrote:
>> Much like normal PCI BARs or other chipset-specific memory-mapped
>> resources, MMCONFIG area needs space in MMIO hole, so we must
>> allocate it manually.
>> 
>> The actual MMCONFIG size depends on a number of PCI buses available
>> which should be covered by ECAM. Possible options are 64MB, 128MB
>> and 256MB. As we are limited to the bus 0 currently, thus using
>> lowest possible setting (64MB), #defined via PCI_MAX_MCFG_BUSES in
>> hvmloader/config.h. When multiple PCI buses support for Xen will be
>> implemented, PCI_MAX_MCFG_BUSES may be changed to calculation of the
>> number of buses according to results of the PCI devices enumeration.
>> 
>> The way to allocate MMCONFIG range in MMIO hole is similar to how
>> other PCI BARs are allocated. The patch extends 'bars' structure to
>> make it universal for any arbitrary BAR type -- either IO, MMIO, ROM
>> or a chipset-specific resource.  
>
>I'm not sure this is fully correct. The IOREQ interface can
>differentiate PCI devices and forward config space accesses to
>different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change you
>will forward all MCFG accesses to QEMU, which will likely be wrong if
>there are multiple PCI-device emulators for the same domain.
>
>Ie: AFAICT Xen needs to know about the MCFG emulation and detect
>accesses to it in order to forward them to the right emulators.
>
>Adding Paul who knows more about all this.

In which use cases multiple PCI-device emulators are used for a single
HVM domain? Is it a proprietary setup?

I assume it is somehow related to this code in xen-hvm.c:
/* Fake a write to port 0xCF8 so that
 * the config space access will target the
 * correct device model.
 */
val = (1u << 31) | ((req->addr & 0x0f00) <...>
do_outp(0xcf8, 4, val);
if yes, similar thing can be made for IOREQ_TYPE_COPY accesses to
the emulated MMCONFIG if needed.

In HVM+QEMU case we are not limited to merely passed through devices,
most of the observable PCI config space devices belong to one particular
QEMU instance. This dictates the overall emulated MMCONFIG layout
for a domain which should be in sync to what QEMU emulates via CF8h/CFCh
accesses... and between multiple device model instances (if there are
any, still not sure what multiple PCI-device emulators you mentioned
really are).

Basically, we have an emulated MMCONFIG area of 64/128/256MB size in
the MMIO hole of the guest HVM domain. (BTW, this area itself can be
considered a feature of the chipset the device model emulates.)
It can be relocated to some other place in MMIO hole, this means that
QEMU will trap accesses to the specific to the emulated chipset
PCIEXBAR register and will issue same MMIO unmap/map calls as for
any normal emulated MMIO range.

On the other hand, it won't be easy to provide emulated MMCONFIG
translation into IOREQ_TYPE_PCI_CONFIG from Xen side. Xen should know
current emulated MMCONFIG area position and size in order to translate
(or not) accesses to it into corresponding BDF/reg pair (+whether that
area is enabled for decoding or not). This will likely require to
introduce new hypercall(s).

The question is if there will be any difference or benefit at all.

It's basically the same emulated MMIO range after all, but in one case
we trap accesses to it in Xen and translate them into
IOREQ_TYPE_PCI_CONFIG requests.
We have to provide some infrastructure to let Xen know where the device 
model/guest expects to use the MMCONFIG area (and its size). The
device model will need to use this infrastructure, informing Xen of
any changes. Also, due to MMCONFIG nature there might be some pitfalls
like a necessity to send multiple IOREQ_TYPE_PCI_CONFIG ioreqs caused by
a single memory read/write operation.

In another case, we still have an emulated MMIO range, but Xen will send
plain IOREQ_TYPE_COPY requests to QEMU which it handles itself.
In such case, all code to work with MMCONFIG accesses is available for
reuse right away (mmcfg -> pci_* translation in QEMU), no new
functionality required neither in Xen or QEMU.

>> One important new field is addr_mask, which tells which bits of the
>> base address can (should) be written. Different address types (ROM,
>> MMIO BAR, PCIEXBAR) will have different addr_mask values.
>> 
>> For every assignable BAR range we store its size, PCI device BDF
>> (devfn actually) to which it belongs, BAR type (mem/io/mem64) and
>> corresponding register offset in device PCI conf space. This way we
>> can insert MMCONFIG entry into bars array in the same manner like
>> for any other BARs. In this case, the devfn field will point to MCH
>> PCI device and bar_reg will contain PCIEXBAR register offset. It
>> will be assigned a slot in MMIO hole later in a very same way like
>> for plain PCI BARs, with respect to

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-20 Thread Roger Pau Monné
On Tue, Mar 20, 2018 at 05:49:22AM +1000, Alexey G wrote:
> On Mon, 19 Mar 2018 15:58:02 +
> Roger Pau Monné  wrote:
> 
> >On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko wrote:
> >> Much like normal PCI BARs or other chipset-specific memory-mapped
> >> resources, MMCONFIG area needs space in MMIO hole, so we must
> >> allocate it manually.
> >> 
> >> The actual MMCONFIG size depends on a number of PCI buses available
> >> which should be covered by ECAM. Possible options are 64MB, 128MB
> >> and 256MB. As we are limited to the bus 0 currently, thus using
> >> lowest possible setting (64MB), #defined via PCI_MAX_MCFG_BUSES in
> >> hvmloader/config.h. When multiple PCI buses support for Xen will be
> >> implemented, PCI_MAX_MCFG_BUSES may be changed to calculation of the
> >> number of buses according to results of the PCI devices enumeration.
> >> 
> >> The way to allocate MMCONFIG range in MMIO hole is similar to how
> >> other PCI BARs are allocated. The patch extends 'bars' structure to
> >> make it universal for any arbitrary BAR type -- either IO, MMIO, ROM
> >> or a chipset-specific resource.  
> >
> >I'm not sure this is fully correct. The IOREQ interface can
> >differentiate PCI devices and forward config space accesses to
> >different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change you
> >will forward all MCFG accesses to QEMU, which will likely be wrong if
> >there are multiple PCI-device emulators for the same domain.
> >
> >Ie: AFAICT Xen needs to know about the MCFG emulation and detect
> >accesses to it in order to forward them to the right emulators.
> >
> >Adding Paul who knows more about all this.
> 
> In which use cases multiple PCI-device emulators are used for a single
> HVM domain? Is it a proprietary setup?

Likely. I think XenGT might be using it. It's a feature of the IOREQ
implementation in Xen.

Traditional PCI config space accesses are not IO port space accesses.
The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and IOREQ
servers can register devices they would like to receive configuration
space accesses for. QEMU is already making use of this, see for
example xen_map_pcidev in the QEMU code.

By treating MCFG accesses as MMIO you are bypassing the IOREQ PCI
layer, and thus a IOREQ server could register a PCI device and only
receive PCI configuration accesses from the IO port space, while MCFG
accesses would be forwarded somewhere else.

I think you need to make the IOREQ code aware of the MCFG area and
XEN_DMOP_IO_RANGE_PCI needs to forward both IO space and MCFG accesses
to the right IOREQ server.

> I assume it is somehow related to this code in xen-hvm.c:
> /* Fake a write to port 0xCF8 so that
>  * the config space access will target the
>  * correct device model.
>  */
> val = (1u << 31) | ((req->addr & 0x0f00) <...>
> do_outp(0xcf8, 4, val);
> if yes, similar thing can be made for IOREQ_TYPE_COPY accesses to
> the emulated MMCONFIG if needed.

I have to admit I don't know that much about QEMU, and I have no idea
what the chunk above is supposed to accomplish.

> 
> In HVM+QEMU case we are not limited to merely passed through devices,
> most of the observable PCI config space devices belong to one particular
> QEMU instance. This dictates the overall emulated MMCONFIG layout
> for a domain which should be in sync to what QEMU emulates via CF8h/CFCh
> accesses... and between multiple device model instances (if there are
> any, still not sure what multiple PCI-device emulators you mentioned
> really are).

In newer versions of Xen (>4.5 IIRC, Paul knows more), QEMU doesn't
directly trap accesses to the 0xcf8/0xcfc IO ports, it's Xen instead
the one that detects and decodes such accesses, and then forwards them
to the IOREQ server that has been registered to handle them.

You cannot simply forward all MCFG accesses to QEMU as MMIO accesses,
Xen needs to decode them and they need to be handled as
IOREQ_TYPE_PCI_CONFIG requests, not IOREQ_TYPE_COPY IMO.

> 
> Basically, we have an emulated MMCONFIG area of 64/128/256MB size in
> the MMIO hole of the guest HVM domain. (BTW, this area itself can be
> considered a feature of the chipset the device model emulates.)
> It can be relocated to some other place in MMIO hole, this means that
> QEMU will trap accesses to the specific to the emulated chipset
> PCIEXBAR register and will issue same MMIO unmap/map calls as for
> any normal emulated MMIO range.
> 
> On the other hand, it won't be easy to provide emulated MMCONFIG
> translation into IOREQ_TYPE_PCI_CONFIG from Xen side. Xen should know
> current emulated MMCONFIG area position and size in order to translate
> (or not) accesses to it into corresponding BDF/reg pair (+whether that
> area is enabled for decoding or not). This will likely require to
> introduce new hypercall(s).

Yes, you will have to introduce new hypercalls to tell Xen the
position/si

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

2018-03-20 Thread Paul Durrant
> -Original Message-
> From: Roger Pau Monne
> Sent: 20 March 2018 08:51
> To: Alexey G 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Ian Jackson ; Jan
> Beulich ; Wei Liu ; Paul Durrant
> 
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Tue, Mar 20, 2018 at 05:49:22AM +1000, Alexey G wrote:
> > On Mon, 19 Mar 2018 15:58:02 +
> > Roger Pau Monné  wrote:
> >
> > >On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko wrote:
> > >> Much like normal PCI BARs or other chipset-specific memory-mapped
> > >> resources, MMCONFIG area needs space in MMIO hole, so we must
> > >> allocate it manually.
> > >>
> > >> The actual MMCONFIG size depends on a number of PCI buses available
> > >> which should be covered by ECAM. Possible options are 64MB, 128MB
> > >> and 256MB. As we are limited to the bus 0 currently, thus using
> > >> lowest possible setting (64MB), #defined via PCI_MAX_MCFG_BUSES in
> > >> hvmloader/config.h. When multiple PCI buses support for Xen will be
> > >> implemented, PCI_MAX_MCFG_BUSES may be changed to calculation
> of the
> > >> number of buses according to results of the PCI devices enumeration.
> > >>
> > >> The way to allocate MMCONFIG range in MMIO hole is similar to how
> > >> other PCI BARs are allocated. The patch extends 'bars' structure to
> > >> make it universal for any arbitrary BAR type -- either IO, MMIO, ROM
> > >> or a chipset-specific resource.
> > >
> > >I'm not sure this is fully correct. The IOREQ interface can
> > >differentiate PCI devices and forward config space accesses to
> > >different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change
> you
> > >will forward all MCFG accesses to QEMU, which will likely be wrong if
> > >there are multiple PCI-device emulators for the same domain.
> > >
> > >Ie: AFAICT Xen needs to know about the MCFG emulation and detect
> > >accesses to it in order to forward them to the right emulators.
> > >
> > >Adding Paul who knows more about all this.
> >
> > In which use cases multiple PCI-device emulators are used for a single
> > HVM domain? Is it a proprietary setup?
> 
> Likely. I think XenGT might be using it. It's a feature of the IOREQ
> implementation in Xen.
> 

Multiple ioreq servers are a supported use-case for Xen, if only experimental 
at this point. And indeed xengt is one such use-case.

> Traditional PCI config space accesses are not IO port space accesses.
> The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and IOREQ
> servers can register devices they would like to receive configuration
> space accesses for. QEMU is already making use of this, see for
> example xen_map_pcidev in the QEMU code.
> 
> By treating MCFG accesses as MMIO you are bypassing the IOREQ PCI
> layer, and thus a IOREQ server could register a PCI device and only
> receive PCI configuration accesses from the IO port space, while MCFG
> accesses would be forwarded somewhere else.
> 
> I think you need to make the IOREQ code aware of the MCFG area and
> XEN_DMOP_IO_RANGE_PCI needs to forward both IO space and MCFG
> accesses
> to the right IOREQ server.

Yes, Xen must intercept all accesses to PCI config space and route them 
accordingly.

> 
> > I assume it is somehow related to this code in xen-hvm.c:
> > /* Fake a write to port 0xCF8 so that
> >  * the config space access will target the
> >  * correct device model.
> >  */
> > val = (1u << 31) | ((req->addr & 0x0f00) <...>
> > do_outp(0xcf8, 4, val);
> > if yes, similar thing can be made for IOREQ_TYPE_COPY accesses to
> > the emulated MMCONFIG if needed.
> 
> I have to admit I don't know that much about QEMU, and I have no idea
> what the chunk above is supposed to accomplish.
> 

The easiest way to make QEMU behave appropriately when dealing with a config 
space ioreq was indeed to make it appear as a write to cf8 followed by a read 
or write to cfc.

> >
> > In HVM+QEMU case we are not limited to merely passed through devices,
> > most of the observable PCI config space devices belong to one particular
> > QEMU instance. This dictates the overall emulated MMCONFIG layout
> > for a domain which should be in sync to what QEMU emulates via
> CF8h/CFCh
> > accesses... and between multiple device model instance