[Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
Hi all, I would like to discuss with ARM64 and ACPI Linux maintainers the best way to complete ACPI support in Linux for Dom0 on ARM64. As a reminder, Xen can only parse static ACPI tables. It doesn't have a bytecode interpreter. Xen maps all ACPI tables to Dom0, which parses them as it does on native. Device memory is mapped in stage-2 by Xen upon Dom0 request: a small driver under drivers/xen registers for BUS_NOTIFY_ADD_DEVICE events, then calls xen_map_device_mmio, which issues XENMEM_add_to_physmap_range hypercalls to Xen that creates the appropriate stage-2 mappings. This approach works well, but it breaks in few interesting cases. Specifically, anything that requires a device memory mapping but it is not a device, doesn't generate a BUS_NOTIFY_ADD_DEVICE event, thus, no hypercalls to Xen are made. Examples are: ACPI OperationRegion (1), ECAM (2), other memory regions described in static tables such as BERT (3). What is the best way to map these regions in Dom0? I am going to detail a few options that have been proposed and evaluated so far. (2) and (3), being described by static tables, could be parsed by Xen and mapped beforehand. However, this approach wouldn't work for (1). Additionally, Xen and Linux versions can mix and match, so it is possible, even likely, to run an old Xen and a new Dom0 on a new platform. Xen might not know about a new ACPI table, while Linux might. In this scenario, Xen wouldn't be able to map the region described in the new table beforehand, but Linux would still try to access it. I imagine that this problem could be work-arounded by blacklisting any unknown static tables in Xen, but it seems suboptimal. (By blacklisting, I mean removing them before starting Dom0.) For this reason, and to use the same approach for (1), (2) and (3), it looks like the best solution is for Dom0 to request the stage-2 mappings to Xen. If we go down this route, what is the best way to do it? a) One option is to provide a Xen specific implementation of acpi_os_ioremap in Linux. I think this is the cleanest approach, but unfortunately, it doesn't cover cases where ioremap is used directly. (2) is one of such cases, see arch/arm64/kernel/pci.c:pci_acpi_setup_ecam_mapping and drivers/pci/ecam.c:pci_ecam_create. (3) is another one of these cases, see drivers/acpi/apei/bert.c:bert_init. b) Otherwise, we could write an alternative implementation of ioremap on arm64. The Xen specific ioremap would request a stage-2 mapping first, then create the stage-1 mapping as usual. However, this means issuing an hypercall for every ioremap call. c) Finally, a third option is to create the stage-2 mappings seamlessly in Xen upon Dom0 memory faults. Keeping in mind that SMMU and guest pagetables are shared in the Xen hypervisor, this approach does not work if one of the pages that need a stage-2 mapping is used as DMA target before Dom0 accesses it. No SMMU mappings would be available for the page yet, so the DMA transaction would fail. After Dom0 touches the page, the DMA transaction would succeed. I don't know how likely is this scenario to happen, but it seems fragile to rely on it. For these reasons, I think that the best option might be b). Do you agree? Did I miss anything? Do you have other suggestions? Many thanks, Stefano References: https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg01693.html https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg02425.html https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg02531.html ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
>>> On 17.01.17 at 23:20, wrote: > b) Otherwise, we could write an alternative implementation of ioremap > on arm64. The Xen specific ioremap would request a stage-2 mapping > first, then create the stage-1 mapping as usual. However, this means > issuing an hypercall for every ioremap call. +1 for this being the best choice among the ones listed. In fact this may also be a reasonable approach for PVHv2, unless facing resistance by the x86 maintainers. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
On Tue, Jan 17, 2017 at 02:20:54PM -0800, Stefano Stabellini wrote: > a) One option is to provide a Xen specific implementation of > acpi_os_ioremap in Linux. I think this is the cleanest approach, but > unfortunately, it doesn't cover cases where ioremap is used directly. (2) > is one of such cases, see > arch/arm64/kernel/pci.c:pci_acpi_setup_ecam_mapping and > drivers/pci/ecam.c:pci_ecam_create. (3) is another one of these cases, > see drivers/acpi/apei/bert.c:bert_init. This is basically the same as b) from Xen's PoV, the only difference is where you would call the hypercall from Dom0 to establish stage-2 mappings. > b) Otherwise, we could write an alternative implementation of ioremap > on arm64. The Xen specific ioremap would request a stage-2 mapping > first, then create the stage-1 mapping as usual. However, this means > issuing an hypercall for every ioremap call. This seems fine to me, and at present is the only way to get something working. As you said not being able to discover OperationRegions from Xen means that there's a chance some MMIO might not be added to the stage-2 mappings. Then what's the initial memory map state when Dom0 is booted? There are no MMIO mappings at all, and Dom0 must request mappings for everything? What happens to ACPI tables crafted for Dom0 that reside in RAM? That would apply to the STAO and to the other tables that are crafted for Dom0 at build time. Should Dom0 also request stage-2 mappings for them, and Xen simply ignore those calls? > c) Finally, a third option is to create the stage-2 mappings seamlessly > in Xen upon Dom0 memory faults. Keeping in mind that SMMU and guest > pagetables are shared in the Xen hypervisor, this approach does not work > if one of the pages that need a stage-2 mapping is used as DMA target > before Dom0 accesses it. No SMMU mappings would be available for the > page yet, so the DMA transaction would fail. After Dom0 touches the > page, the DMA transaction would succeed. I don't know how likely is this > scenario to happen, but it seems fragile to rely on it. Don't you get faults on SMMU failures? But I guess those are not synchronous, so there's no way you can add mappings on fault, like you can for processor accesses. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
On Wed, 18 Jan 2017, Jan Beulich wrote: > >>> On 17.01.17 at 23:20, wrote: > > b) Otherwise, we could write an alternative implementation of ioremap > > on arm64. The Xen specific ioremap would request a stage-2 mapping > > first, then create the stage-1 mapping as usual. However, this means > > issuing an hypercall for every ioremap call. > > +1 for this being the best choice among the ones listed. In fact > this may also be a reasonable approach for PVHv2, unless facing > resistance by the x86 maintainers. Adding more context so that the Linux folks can follow. PVHv2 is a new Xen x86 virtual machine type with very similar requirements as Xen ARM virtual machines. In other words, PVHv2 has the same problems described here, except that it's x86, rather than arm64. With my ARM maintainer hat on, I described the issue for Xen on ARM, but it pretty much applies to x86 as well. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
On Wed, 18 Jan 2017, Roger Pau Monné wrote: > On Tue, Jan 17, 2017 at 02:20:54PM -0800, Stefano Stabellini wrote: > > a) One option is to provide a Xen specific implementation of > > acpi_os_ioremap in Linux. I think this is the cleanest approach, but > > unfortunately, it doesn't cover cases where ioremap is used directly. (2) > > is one of such cases, see > > arch/arm64/kernel/pci.c:pci_acpi_setup_ecam_mapping and > > drivers/pci/ecam.c:pci_ecam_create. (3) is another one of these cases, > > see drivers/acpi/apei/bert.c:bert_init. > > This is basically the same as b) from Xen's PoV, the only difference is where > you would call the hypercall from Dom0 to establish stage-2 mappings. Right, but it is important from the Linux point of view, this is why I am asking the Linux maintainers. > > b) Otherwise, we could write an alternative implementation of ioremap > > on arm64. The Xen specific ioremap would request a stage-2 mapping > > first, then create the stage-1 mapping as usual. However, this means > > issuing an hypercall for every ioremap call. > > This seems fine to me, and at present is the only way to get something > working. > As you said not being able to discover OperationRegions from Xen means that > there's a chance some MMIO might not be added to the stage-2 mappings. > > Then what's the initial memory map state when Dom0 is booted? There are no > MMIO > mappings at all, and Dom0 must request mappings for everything? Yes > What happens to ACPI tables crafted for Dom0 that reside in RAM? That would > apply to the STAO and to the other tables that are crafted for Dom0 at build > time. Should Dom0 also request stage-2 mappings for them, and Xen simply > ignore > those calls? The ACPI (and UEFI) tables are mapped by Xen > > c) Finally, a third option is to create the stage-2 mappings seamlessly > > in Xen upon Dom0 memory faults. Keeping in mind that SMMU and guest > > pagetables are shared in the Xen hypervisor, this approach does not work > > if one of the pages that need a stage-2 mapping is used as DMA target > > before Dom0 accesses it. No SMMU mappings would be available for the > > page yet, so the DMA transaction would fail. After Dom0 touches the > > page, the DMA transaction would succeed. I don't know how likely is this > > scenario to happen, but it seems fragile to rely on it. > > Don't you get faults on SMMU failures? But I guess those are not synchronous, > so there's no way you can add mappings on fault, like you can for processor > accesses. Right___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
Hi, On 18/01/17 19:05, Stefano Stabellini wrote: On Wed, 18 Jan 2017, Roger Pau Monné wrote: On Tue, Jan 17, 2017 at 02:20:54PM -0800, Stefano Stabellini wrote: a) One option is to provide a Xen specific implementation of acpi_os_ioremap in Linux. I think this is the cleanest approach, but unfortunately, it doesn't cover cases where ioremap is used directly. (2) is one of such cases, see arch/arm64/kernel/pci.c:pci_acpi_setup_ecam_mapping and drivers/pci/ecam.c:pci_ecam_create. (3) is another one of these cases, see drivers/acpi/apei/bert.c:bert_init. This is basically the same as b) from Xen's PoV, the only difference is where you would call the hypercall from Dom0 to establish stage-2 mappings. Right, but it is important from the Linux point of view, this is why I am asking the Linux maintainers. b) Otherwise, we could write an alternative implementation of ioremap on arm64. The Xen specific ioremap would request a stage-2 mapping first, then create the stage-1 mapping as usual. However, this means issuing an hypercall for every ioremap call. This seems fine to me, and at present is the only way to get something working. As you said not being able to discover OperationRegions from Xen means that there's a chance some MMIO might not be added to the stage-2 mappings. Then what's the initial memory map state when Dom0 is booted? There are no MMIO mappings at all, and Dom0 must request mappings for everything? Yes To give more context here, the UEFI memory map does not report all the MMIO regions. So there is no possibility to map MMIO at boot. What happens to ACPI tables crafted for Dom0 that reside in RAM? That would apply to the STAO and to the other tables that are crafted for Dom0 at build time. Should Dom0 also request stage-2 mappings for them, and Xen simply ignore those calls? The ACPI (and UEFI) tables are mapped by Xen I think Royger's point is DOM0 cannot tell whether a region has been mapped by Xen or not. The function ioremap will be used to map anything (it is the leaf of all mapping functions), and will call Xen no matter the address passed. It could be a RAM region, HW device region, emulated device region. For the RAM and emulated device region we don't want Xen to modify the mapping. Note that the current hypercall does not report an error back (see [1]) but I think it is a different error. Also, I think we would need some work in Xen because, from my understanding of acpi_iomem_deny_access, DOM0 would be able to map the RAM through this hypercall. So it would mess-up with the page table even if it should not. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
On Wed, Jan 18, 2017 at 07:13:23PM +, Julien Grall wrote: > Hi, > > On 18/01/17 19:05, Stefano Stabellini wrote: > > On Wed, 18 Jan 2017, Roger Pau Monné wrote: > > > On Tue, Jan 17, 2017 at 02:20:54PM -0800, Stefano Stabellini wrote: > > > > a) One option is to provide a Xen specific implementation of > > > > acpi_os_ioremap in Linux. I think this is the cleanest approach, but > > > > unfortunately, it doesn't cover cases where ioremap is used directly. > > > > (2) > > > > is one of such cases, see > > > > arch/arm64/kernel/pci.c:pci_acpi_setup_ecam_mapping and > > > > drivers/pci/ecam.c:pci_ecam_create. (3) is another one of these cases, > > > > see drivers/acpi/apei/bert.c:bert_init. > > > > > > This is basically the same as b) from Xen's PoV, the only difference is > > > where > > > you would call the hypercall from Dom0 to establish stage-2 mappings. > > > > Right, but it is important from the Linux point of view, this is why I > > am asking the Linux maintainers. > > > > > > > > b) Otherwise, we could write an alternative implementation of ioremap > > > > on arm64. The Xen specific ioremap would request a stage-2 mapping > > > > first, then create the stage-1 mapping as usual. However, this means > > > > issuing an hypercall for every ioremap call. > > > > > > This seems fine to me, and at present is the only way to get something > > > working. > > > As you said not being able to discover OperationRegions from Xen means > > > that > > > there's a chance some MMIO might not be added to the stage-2 mappings. > > > > > > Then what's the initial memory map state when Dom0 is booted? There are > > > no MMIO > > > mappings at all, and Dom0 must request mappings for everything? > > > > Yes > > To give more context here, the UEFI memory map does not report all the MMIO > regions. So there is no possibility to map MMIO at boot. I've been able to get a Dom0 booting on x86 by mapping all the regions marked as ACPI in the memory map, plus the BARs of PCI devices and the MCFG areas. But this is not really optimal, since as Stefano says: 1. If there are new tables that contain memory regions that should be mapped to Dom0, Xen will need to be updated in order to work on those systems. 2. ATM it's not possible for Xen to know all the OperationRegions described in the ACPI DSDT/SSDT tables. I'm not that worried about 1, since the user will also need a newish Dom0 kernel in order to access those devices, and it doesn't seem like new ACPI tables appear out of the blue everyday. It however puts more pressure on Xen in order to aggressively track ACPI changes. In order to fix 2 an AML parser would need to be added to Xen. > > > > > > > What happens to ACPI tables crafted for Dom0 that reside in RAM? That > > > would > > > apply to the STAO and to the other tables that are crafted for Dom0 at > > > build > > > time. Should Dom0 also request stage-2 mappings for them, and Xen simply > > > ignore > > > those calls? > > > > The ACPI (and UEFI) tables are mapped by Xen > > I think Royger's point is DOM0 cannot tell whether a region has been mapped > by Xen or not. > > The function ioremap will be used to map anything (it is the leaf of all > mapping functions), and will call Xen no matter the address passed. It could > be a RAM region, HW device region, emulated device region. Exactly, from a guest OS PoV it would request those mappings for all device memory regions. But from Xen's perspective, those request might be made against at least 3 different types of p2m regions: - Regions trapped by emulated devices inside of Xen: no direct MMIO mapping should be established in this case. - RAM regions that belong to Xen-crafted ACPI tables (STAO, MADT...). - Real MMIO regions that should be passed through. Right now AFAIK Xen doesn't track any of this information, so we would need additional non-trivial logic in order to account for all this inside the hypervisor. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
On Thu, 19 Jan 2017, Roger Pau Monné wrote: > On Wed, Jan 18, 2017 at 07:13:23PM +, Julien Grall wrote: > > Hi, > > > > On 18/01/17 19:05, Stefano Stabellini wrote: > > > On Wed, 18 Jan 2017, Roger Pau Monné wrote: > > > > On Tue, Jan 17, 2017 at 02:20:54PM -0800, Stefano Stabellini wrote: > > > > > a) One option is to provide a Xen specific implementation of > > > > > acpi_os_ioremap in Linux. I think this is the cleanest approach, but > > > > > unfortunately, it doesn't cover cases where ioremap is used directly. > > > > > (2) > > > > > is one of such cases, see > > > > > arch/arm64/kernel/pci.c:pci_acpi_setup_ecam_mapping and > > > > > drivers/pci/ecam.c:pci_ecam_create. (3) is another one of these cases, > > > > > see drivers/acpi/apei/bert.c:bert_init. > > > > > > > > This is basically the same as b) from Xen's PoV, the only difference is > > > > where > > > > you would call the hypercall from Dom0 to establish stage-2 mappings. > > > > > > Right, but it is important from the Linux point of view, this is why I > > > am asking the Linux maintainers. > > > > > > > > > > > b) Otherwise, we could write an alternative implementation of ioremap > > > > > on arm64. The Xen specific ioremap would request a stage-2 mapping > > > > > first, then create the stage-1 mapping as usual. However, this means > > > > > issuing an hypercall for every ioremap call. > > > > > > > > This seems fine to me, and at present is the only way to get something > > > > working. > > > > As you said not being able to discover OperationRegions from Xen means > > > > that > > > > there's a chance some MMIO might not be added to the stage-2 mappings. > > > > > > > > Then what's the initial memory map state when Dom0 is booted? There are > > > > no MMIO > > > > mappings at all, and Dom0 must request mappings for everything? > > > > > > Yes > > > > To give more context here, the UEFI memory map does not report all the MMIO > > regions. So there is no possibility to map MMIO at boot. > > I've been able to get a Dom0 booting on x86 by mapping all the regions marked > as ACPI in the memory map, plus the BARs of PCI devices and the MCFG areas. > But > this is not really optimal, since as Stefano says: > > 1. If there are new tables that contain memory regions that should be mapped > to >Dom0, Xen will need to be updated in order to work on those systems. > 2. ATM it's not possible for Xen to know all the OperationRegions described > in >the ACPI DSDT/SSDT tables. > > I'm not that worried about 1, since the user will also need a newish Dom0 > kernel in order to access those devices, and it doesn't seem like new ACPI > tables appear out of the blue everyday. It however puts more pressure on Xen > in > order to aggressively track ACPI changes. > > In order to fix 2 an AML parser would need to be added to Xen. This brings me to another possible solution: we could map everything beforehand in Xen as you wrote, then use a), an alternative implementation of acpi_os_ioremap, to fix problem 2. In this scheme, 1. remains unfixed. I think this is suboptimal, but it is a possibility. > > > > What happens to ACPI tables crafted for Dom0 that reside in RAM? That > > > > would > > > > apply to the STAO and to the other tables that are crafted for Dom0 at > > > > build > > > > time. Should Dom0 also request stage-2 mappings for them, and Xen > > > > simply ignore > > > > those calls? > > > > > > The ACPI (and UEFI) tables are mapped by Xen > > > > I think Royger's point is DOM0 cannot tell whether a region has been mapped > > by Xen or not. > > > > The function ioremap will be used to map anything (it is the leaf of all > > mapping functions), and will call Xen no matter the address passed. It could > > be a RAM region, HW device region, emulated device region. > > Exactly, from a guest OS PoV it would request those mappings for all device > memory regions. But from Xen's perspective, those request might be made > against > at least 3 different types of p2m regions: > > - Regions trapped by emulated devices inside of Xen: no direct MMIO mapping >should be established in this case. > - RAM regions that belong to Xen-crafted ACPI tables (STAO, MADT...). > - Real MMIO regions that should be passed through. > > Right now AFAIK Xen doesn't track any of this information, so we would need > additional non-trivial logic in order to account for all this inside the > hypervisor. I think this is something we'll have to do to guarantee that we have a good implementation of XENMEM_add_to_physmap_range in Xen, regardless of the rest of the discussion.___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
Hello, On 19/01/2017 19:22, Stefano Stabellini wrote: On Thu, 19 Jan 2017, Roger Pau Monné wrote: On Wed, Jan 18, 2017 at 07:13:23PM +, Julien Grall wrote: Hi, On 18/01/17 19:05, Stefano Stabellini wrote: On Wed, 18 Jan 2017, Roger Pau Monné wrote: On Tue, Jan 17, 2017 at 02:20:54PM -0800, Stefano Stabellini wrote: a) One option is to provide a Xen specific implementation of acpi_os_ioremap in Linux. I think this is the cleanest approach, but unfortunately, it doesn't cover cases where ioremap is used directly. (2) is one of such cases, see arch/arm64/kernel/pci.c:pci_acpi_setup_ecam_mapping and drivers/pci/ecam.c:pci_ecam_create. (3) is another one of these cases, see drivers/acpi/apei/bert.c:bert_init. This is basically the same as b) from Xen's PoV, the only difference is where you would call the hypercall from Dom0 to establish stage-2 mappings. Right, but it is important from the Linux point of view, this is why I am asking the Linux maintainers. b) Otherwise, we could write an alternative implementation of ioremap on arm64. The Xen specific ioremap would request a stage-2 mapping first, then create the stage-1 mapping as usual. However, this means issuing an hypercall for every ioremap call. This seems fine to me, and at present is the only way to get something working. As you said not being able to discover OperationRegions from Xen means that there's a chance some MMIO might not be added to the stage-2 mappings. Then what's the initial memory map state when Dom0 is booted? There are no MMIO mappings at all, and Dom0 must request mappings for everything? Yes To give more context here, the UEFI memory map does not report all the MMIO regions. So there is no possibility to map MMIO at boot. I've been able to get a Dom0 booting on x86 by mapping all the regions marked as ACPI in the memory map, plus the BARs of PCI devices and the MCFG areas. But how do you find the BAR? Is it by reading the BAR from the config space when a PCI is added? Also, you are assuming that the MCFG will describe the host controller. This is the case only when host controller available at boot. So you may miss some here. Furthermore, on ARM we have other static tables (such as GTDT) contain MMIO region to map. Lastly, all devices are not PCI and you may have platform devices. The platform devices will only be described in the ASL. Just in case, those regions are not necessarily described in UEFI memory map. So you need DOM0 to tell the list of regions. But this is not really optimal, since as Stefano says: 1. If there are new tables that contain memory regions that should be mapped to Dom0, Xen will need to be updated in order to work on those systems. 2. ATM it's not possible for Xen to know all the OperationRegions described in the ACPI DSDT/SSDT tables. I'm not that worried about 1, since the user will also need a newish Dom0 kernel in order to access those devices, and it doesn't seem like new ACPI tables appear out of the blue everyday. It however puts more pressure on Xen in order to aggressively track ACPI changes. In order to fix 2 an AML parser would need to be added to Xen. This brings me to another possible solution: we could map everything beforehand in Xen as you wrote, then use a), an alternative implementation of acpi_os_ioremap, to fix problem 2. In this scheme, 1. remains unfixed. This solution will not work on ARM (see why above). It is not the first time we have this discussion, so it is probably time to document in the tree how ACPI is working with Xen on ARM/x86. This would avoid us to come back again in the future on why an hypercall from Dom0 is required to map MMIO. I think this is suboptimal, but it is a possibility. What happens to ACPI tables crafted for Dom0 that reside in RAM? That would apply to the STAO and to the other tables that are crafted for Dom0 at build time. Should Dom0 also request stage-2 mappings for them, and Xen simply ignore those calls? The ACPI (and UEFI) tables are mapped by Xen I think Royger's point is DOM0 cannot tell whether a region has been mapped by Xen or not. The function ioremap will be used to map anything (it is the leaf of all mapping functions), and will call Xen no matter the address passed. It could be a RAM region, HW device region, emulated device region. Exactly, from a guest OS PoV it would request those mappings for all device memory regions. But from Xen's perspective, those request might be made against at least 3 different types of p2m regions: - Regions trapped by emulated devices inside of Xen: no direct MMIO mapping should be established in this case. - RAM regions that belong to Xen-crafted ACPI tables (STAO, MADT...). - Real MMIO regions that should be passed through. Right now AFAIK Xen doesn't track any of this information, so we would need additional non-trivial logic in order to account for all this inside the hypervisor. I think this is something we'll have
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
On Thu, Jan 19, 2017 at 09:14:03PM +0100, Julien Grall wrote: > Hello, > > On 19/01/2017 19:22, Stefano Stabellini wrote: > > On Thu, 19 Jan 2017, Roger Pau Monné wrote: > > > On Wed, Jan 18, 2017 at 07:13:23PM +, Julien Grall wrote: > > > > Hi, > > > > > > > > On 18/01/17 19:05, Stefano Stabellini wrote: > > > > > On Wed, 18 Jan 2017, Roger Pau Monné wrote: > > > > > > On Tue, Jan 17, 2017 at 02:20:54PM -0800, Stefano Stabellini wrote: > > > > > > > a) One option is to provide a Xen specific implementation of > > > > > > > acpi_os_ioremap in Linux. I think this is the cleanest approach, > > > > > > > but > > > > > > > unfortunately, it doesn't cover cases where ioremap is used > > > > > > > directly. (2) > > > > > > > is one of such cases, see > > > > > > > arch/arm64/kernel/pci.c:pci_acpi_setup_ecam_mapping and > > > > > > > drivers/pci/ecam.c:pci_ecam_create. (3) is another one of these > > > > > > > cases, > > > > > > > see drivers/acpi/apei/bert.c:bert_init. > > > > > > > > > > > > This is basically the same as b) from Xen's PoV, the only > > > > > > difference is where > > > > > > you would call the hypercall from Dom0 to establish stage-2 > > > > > > mappings. > > > > > > > > > > Right, but it is important from the Linux point of view, this is why I > > > > > am asking the Linux maintainers. > > > > > > > > > > > > > > > > > b) Otherwise, we could write an alternative implementation of > > > > > > > ioremap > > > > > > > on arm64. The Xen specific ioremap would request a stage-2 mapping > > > > > > > first, then create the stage-1 mapping as usual. However, this > > > > > > > means > > > > > > > issuing an hypercall for every ioremap call. > > > > > > > > > > > > This seems fine to me, and at present is the only way to get > > > > > > something working. > > > > > > As you said not being able to discover OperationRegions from Xen > > > > > > means that > > > > > > there's a chance some MMIO might not be added to the stage-2 > > > > > > mappings. > > > > > > > > > > > > Then what's the initial memory map state when Dom0 is booted? There > > > > > > are no MMIO > > > > > > mappings at all, and Dom0 must request mappings for everything? > > > > > > > > > > Yes > > > > > > > > To give more context here, the UEFI memory map does not report all the > > > > MMIO > > > > regions. So there is no possibility to map MMIO at boot. > > > > > > I've been able to get a Dom0 booting on x86 by mapping all the regions > > > marked > > > as ACPI in the memory map, plus the BARs of PCI devices and the MCFG > > > areas. > > But how do you find the BAR? Is it by reading the BAR from the config space > when a PCI is added? Not really, part of this is already done at boot time, Xen does a brute-force scan of the segment 0 (see scan_pci_devices). For ECAM areas the hardware domain must issue an hypercall (PHYSDEVOP_pci_mmcfg_reserved) in order to notify Xen about their presence before attempting to access this region. This should cause Xen to scan the ECAM and add any devices (at least this was my idea). > Also, you are assuming that the MCFG will describe the host controller. This > is the case only when host controller available at boot. So you may miss > some here. Yes, I know, that's why we need the hypercall. The information in the MCFG table might be incomplete, and the hardware domain would have to fetch extra ECAM information from the _SEG method of host bridge devices in the ACPI namespace. > Furthermore, on ARM we have other static tables (such as GTDT) contain MMIO > region to map. > > Lastly, all devices are not PCI and you may have platform devices. The > platform devices will only be described in the ASL. Just in case, those > regions are not necessarily described in UEFI memory map. Will those devices work properly in such scenario? (ie: are they behind the SMMU?) > So you need DOM0 to tell the list of regions. Yes, I agree that we need such hypercall ATM, although I think that we might be able to get rid of it in the long term if we are able to parse the AML tables from Xen. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
Hi Royger, On 20/01/2017 12:01, Roger Pau Monné wrote: On Thu, Jan 19, 2017 at 09:14:03PM +0100, Julien Grall wrote: On 19/01/2017 19:22, Stefano Stabellini wrote: On Thu, 19 Jan 2017, Roger Pau Monné wrote: On Wed, Jan 18, 2017 at 07:13:23PM +, Julien Grall wrote: Hi, On 18/01/17 19:05, Stefano Stabellini wrote: On Wed, 18 Jan 2017, Roger Pau Monné wrote: On Tue, Jan 17, 2017 at 02:20:54PM -0800, Stefano Stabellini wrote: a) One option is to provide a Xen specific implementation of acpi_os_ioremap in Linux. I think this is the cleanest approach, but unfortunately, it doesn't cover cases where ioremap is used directly. (2) is one of such cases, see arch/arm64/kernel/pci.c:pci_acpi_setup_ecam_mapping and drivers/pci/ecam.c:pci_ecam_create. (3) is another one of these cases, see drivers/acpi/apei/bert.c:bert_init. This is basically the same as b) from Xen's PoV, the only difference is where you would call the hypercall from Dom0 to establish stage-2 mappings. Right, but it is important from the Linux point of view, this is why I am asking the Linux maintainers. b) Otherwise, we could write an alternative implementation of ioremap on arm64. The Xen specific ioremap would request a stage-2 mapping first, then create the stage-1 mapping as usual. However, this means issuing an hypercall for every ioremap call. This seems fine to me, and at present is the only way to get something working. As you said not being able to discover OperationRegions from Xen means that there's a chance some MMIO might not be added to the stage-2 mappings. Then what's the initial memory map state when Dom0 is booted? There are no MMIO mappings at all, and Dom0 must request mappings for everything? Yes To give more context here, the UEFI memory map does not report all the MMIO regions. So there is no possibility to map MMIO at boot. I've been able to get a Dom0 booting on x86 by mapping all the regions marked as ACPI in the memory map, plus the BARs of PCI devices and the MCFG areas. But how do you find the BAR? Is it by reading the BAR from the config space when a PCI is added? Not really, part of this is already done at boot time, Xen does a brute-force scan of the segment 0 (see scan_pci_devices). For ECAM areas the hardware domain must issue an hypercall (PHYSDEVOP_pci_mmcfg_reserved) in order to notify Xen about their presence before attempting to access this region. This should cause Xen to scan the ECAM and add any devices (at least this was my idea). In case of ARM, Xen does not use any PCI devices (no PCI UART) itself so scanning before hand is not necessary. If possible, I would prefer to have the scanning in one place rather than spreading in different place depending on the segments (we do have multiple segment on ARM). Also, you are assuming that the MCFG will describe the host controller. This is the case only when host controller available at boot. So you may miss some here. Yes, I know, that's why we need the hypercall. The information in the MCFG table might be incomplete, and the hardware domain would have to fetch extra ECAM information from the _SEG method of host bridge devices in the ACPI namespace. Furthermore, on ARM we have other static tables (such as GTDT) contain MMIO region to map. Lastly, all devices are not PCI and you may have platform devices. The platform devices will only be described in the ASL. Just in case, those regions are not necessarily described in UEFI memory map. Will those devices work properly in such scenario? (ie: are they behind the SMMU?) Platform devices might or might not be behind of the SMMU. It will depend if the device is DMA-capable and often whether the implementer decided to have an SMMU. Today, there is no SMMU support (it is by-passed) for ACPI so we haven't yet encountered the problem. We have an item to investigate the work to be done here. Some thoughts, each device will be associated to one of multiple StreamID (ID to configure the SMMU). I guess we could find this from the static table IORT, need to check. Also, some of the platform device will have MSIs, I suspect the number of MSIs will be either hardcoded or found in the ASL. So we would need to have a new hypercall to advertise those. So you need DOM0 to tell the list of regions. Yes, I agree that we need such hypercall ATM, although I think that we might be able to get rid of it in the long term if we are able to parse the AML tables from Xen. Are you suggesting to bring a full AML parser in Xen? If so, it will be much bigger than Xen ARM itself. I would need a strong use case to accept a such thing. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
On Fri, Jan 20, 2017 at 02:10:33PM +0100, Julien Grall wrote: > Hi Royger, > > On 20/01/2017 12:01, Roger Pau Monné wrote: > > On Thu, Jan 19, 2017 at 09:14:03PM +0100, Julien Grall wrote: > In case of ARM, Xen does not use any PCI devices (no PCI UART) itself so > scanning before hand is not necessary. If possible, I would prefer to have > the scanning in one place rather than spreading in different place depending > on the segments (we do have multiple segment on ARM). AFAIK on ARM you could do the scan inside of the implementation of PHYSDEVOP_pci_mmcfg_reserved hypercall, and that would be enough in order to discover the devices. > > > > > Also, you are assuming that the MCFG will describe the host controller. > > > This > > > is the case only when host controller available at boot. So you may miss > > > some here. > > > > Yes, I know, that's why we need the hypercall. The information in the MCFG > > table might be incomplete, and the hardware domain would have to fetch extra > > ECAM information from the _SEG method of host bridge devices in the ACPI > > namespace. > > > > > Furthermore, on ARM we have other static tables (such as GTDT) contain > > > MMIO > > > region to map. > > > > > > Lastly, all devices are not PCI and you may have platform devices. The > > > platform devices will only be described in the ASL. Just in case, those > > > regions are not necessarily described in UEFI memory map. > > > > Will those devices work properly in such scenario? (ie: are they behind the > > SMMU?) > > Platform devices might or might not be behind of the SMMU. It will depend if > the device is DMA-capable and often whether the implementer decided to have > an SMMU. > > Today, there is no SMMU support (it is by-passed) for ACPI so we haven't yet > encountered the problem. We have an item to investigate the work to be done > here. Why would you need the SMMU for ACPI? > Some thoughts, each device will be associated to one of multiple StreamID > (ID to configure the SMMU). I guess we could find this from the static table > IORT, need to check. > > Also, some of the platform device will have MSIs, I suspect the number of > MSIs will be either hardcoded or found in the ASL. So we would need to have > a new hypercall to advertise those. > > > > > > So you need DOM0 to tell the list of regions. > > > > Yes, I agree that we need such hypercall ATM, although I think that we > > might be > > able to get rid of it in the long term if we are able to parse the AML > > tables > > from Xen. > > Are you suggesting to bring a full AML parser in Xen? If so, it will be much > bigger than Xen ARM itself. I would need a strong use case to accept a such > thing. It could be placed in the init section, and get rid of it after boot. Also, I find it hard to believe that an AML parser is bigger than the whole Xen on ARM. The OpenBSD folks have a DSDT parser in ~4000 lines of code [0], and that's probably way more than what Xen actually needs. Roger. [0] https://github.com/openbsd/src/blob/master/sys/dev/acpi/dsdt.c ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
On Fri, 20 Jan 2017, Roger Pau Monné wrote: > > > > So you need DOM0 to tell the list of regions. > > > > > > Yes, I agree that we need such hypercall ATM, although I think that we > > > might be > > > able to get rid of it in the long term if we are able to parse the AML > > > tables > > > from Xen. > > > > Are you suggesting to bring a full AML parser in Xen? If so, it will be much > > bigger than Xen ARM itself. I would need a strong use case to accept a such > > thing. > > It could be placed in the init section, and get rid of it after boot. Also, I > find it hard to believe that an AML parser is bigger than the whole Xen on > ARM. > The OpenBSD folks have a DSDT parser in ~4000 lines of code [0], and that's > probably way more than what Xen actually needs. Even if it were actually possible, 4 KLOC is a significant increase in code size, given that last time I counted Xen on ARM was under 90 KLOC. Regardless, some AML methods have side effects. I don't know if the plan of accessing some AML in Xen, then remapping tables and accessing them again in Dom0 is actually sound. I don't think the spec supports it.___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Device memory mappings for Dom0 on ARM64 ACPI systems
On Fri, Jan 20, 2017 at 02:53:34PM -0800, Stefano Stabellini wrote: > On Fri, 20 Jan 2017, Roger Pau Monné wrote: > > > > > So you need DOM0 to tell the list of regions. > > > > > > > > Yes, I agree that we need such hypercall ATM, although I think that we > > > > might be > > > > able to get rid of it in the long term if we are able to parse the AML > > > > tables > > > > from Xen. > > > > > > Are you suggesting to bring a full AML parser in Xen? If so, it will be > > > much > > > bigger than Xen ARM itself. I would need a strong use case to accept a > > > such > > > thing. > > > > It could be placed in the init section, and get rid of it after boot. Also, > > I > > find it hard to believe that an AML parser is bigger than the whole Xen on > > ARM. > > The OpenBSD folks have a DSDT parser in ~4000 lines of code [0], and that's > > probably way more than what Xen actually needs. > > Even if it were actually possible, 4 KLOC is a significant increase in > code size, given that last time I counted Xen on ARM was under 90 KLOC. > Regardless, some AML methods have side effects. I don't know if the plan > of accessing some AML in Xen, then remapping tables and accessing them > again in Dom0 is actually sound. I don't think the spec supports it. Sure, Xen wouldn't be allowed to execute any AML method (so it's probably even less than 4KLOC, if the code is ripped). But Xen would be able to discover OperationRegions. In any case this is future work, and the hypercall route seems the most sensible one right now, we could always return -EEXIST if Xen starts mapping all this on behalf of the hardware domain at some point. I just wanted to point this out because I think the claims that have been made about AML parsers being bigger than Xen itself are not true, and discarding them based on size only is not reasonable. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel