On Fri, 25 Mar 2022, Wei Chen wrote: > # Proposal for Porting Xen to Armv8-R64 > > This proposal will introduce the PoC work of porting Xen to Armv8-R64, > which includes: > - The changes of current Xen capability, like Xen build system, memory > management, domain management, vCPU context switch. > - The expanded Xen capability, like static-allocation and direct-map. > > ***Notes:*** > 1. ***This proposal only covers the work of porting Xen to Armv8-R64*** > ***single CPU.Xen SMP support on Armv8-R64 relates to Armv8-R*** > ***Trusted-Frimware (TF-R). This is an external dependency,*** > ***so we think the discussion of Xen SMP support on Armv8-R64*** > ***should be started when single-CPU support is complete.*** > 2. ***This proposal will not touch xen-tools. In current stange,*** > ***Xen on Armv8-R64 only support dom0less, all guests should*** > ***be booted from device tree.*** > > ## Changelogs > Draft-A -> Draft-B: > 1. Update Kconfig options usage. > 2. Update the section for XEN_START_ADDRESS. > 3. Add description of MPU initialization before parsing device tree. > 4. Remove CONFIG_ARM_MPU_EL1_PROTECTION_REGIONS. > 5. Update the description of ioremap_nocache/cache. > 6. Update about the free_init_memory on Armv8-R. > 7. Describe why we need to switch the MPU configuration later. > 8. Add alternative proposal in TODO. > 9. Add use tool to generate Xen Armv8-R device tree in TODO. > 10. Add Xen PIC/PIE discussion in TODO. > 11. Add Xen event channel support in TODO. > > ## Contributors: > Wei Chen <wei.c...@arm.com> > Penny Zheng <penny.zh...@arm.com> > > ## 1. Essential Background > > ### 1.1. Armv8-R64 Profile > The Armv-R architecture profile was designed to support use cases that > have a high sensitivity to deterministic execution. (e.g. Fuel Injection, > Brake control, Drive trains, Motor control etc) > > Arm announced Armv8-R in 2013, it is the latest generation Arm architecture > targeted at the Real-time profile. It introduces virtualization at the highest > security level while retaining the Protected Memory System Architecture (PMSA) > based on a Memory Protection Unit (MPU). In 2020, Arm announced Cortex-R82, > which is the first Arm 64-bit Cortex-R processor based on Armv8-R64. > > - The latest Armv8-R64 document can be found here: > [Arm Architecture Reference Manual Supplement - Armv8, for Armv8-R AArch64 > architecture > profile](https://developer.arm.com/documentation/ddi0600/latest/). > > - Armv-R Architecture progression: > Armv7-R -> Armv8-R AArch32 -> Armv8 AArch64 > The following figure is a simple comparison of "R" processors based on > different Armv-R Architectures. > > ![image](https://drive.google.com/uc?export=view&id=1nE5RAXaX8zY2KPZ8imBpbvIr2eqBguEB) > > - The Armv8-R architecture evolved additional features on top of Armv7-R: > - An exception model that is compatible with the Armv8-A model > - Virtualization with support for guest operating systems > - PMSA virtualization using MPUs In EL2. > - The new features of Armv8-R64 architecture > - Adds support for the 64-bit A64 instruction set, previously Armv8-R > only supported A32. > - Supports up to 48-bit physical addressing, previously up to 32-bit > addressing was supported. > - Optional Arm Neon technology and Advanced SIMD > - Supports three Exception Levels (ELs) > - Secure EL2 - The Highest Privilege, MPU only, for firmware, > hypervisor > - Secure EL1 - RichOS (MMU) or RTOS (MPU) > - Secure EL0 - Application Workloads > - Optionally supports Virtual Memory System Architecture at S-EL1/S-EL0. > This means it's possible to run rich OS kernels - like Linux - either > bare-metal or as a guest. > - Differences with the Armv8-A AArch64 architecture > - Supports only a single Security state - Secure. There is not Non-Secure > execution state supported. > - EL3 is not supported, EL2 is mandatory. This means secure EL2 is the > highest EL. > - Supports the A64 ISA instruction > - With a small set of well-defined differences > - Provides a PMSA (Protected Memory System Architecture) based > virtualization model. > - As opposed to Armv8-A AArch64's VMSA based Virtualization > - Can support address bits up to 52 if FEAT_LPA is enabled, > otherwise 48 bits. > - Determines the access permissions and memory attributes of > the target PA. > - Can implement PMSAv8-64 at EL1 and EL2 > - Address translation flat-maps the VA to the PA for EL2 Stage 1. > - Address translation flat-maps the VA to the PA for EL1 Stage 1. > - Address translation flat-maps the IPA to the PA for EL1 Stage 2. > - PMSA in EL1 & EL2 is configurable, VMSA in EL1 is configurable. > > ### 1.2. Xen Challenges with PMSA Virtualization > Xen is PMSA unaware Type-1 Hypervisor, it will need modifications to run > with an MPU and host multiple guest OSes. > > - No MMU at EL2: > - No EL2 Stage 1 address translation > - Xen provides fixed ARM64 virtual memory layout as basis of EL2 > stage 1 address translation, which is not applicable on MPU system, > where there is no virtual addressing. As a result, any operation > involving transition from PA to VA, like ioremap, needs modification > on MPU system. > - Xen's run-time addresses are the same as the link time addresses. > - Enable PIC/PIE (position-independent code) on a real-time target > processor probably very rare. Further discussion in 2.1 and TODO > sections. > - Xen will need to use the EL2 MPU memory region descriptors to manage > access permissions and attributes for accesses made by VMs at EL1/0. > - Xen currently relies on MMU EL1 stage 2 table to manage these > accesses. > - No MMU Stage 2 translation at EL1: > - A guest doesn't have an independent guest physical address space > - A guest can not reuse the current Intermediate Physical Address > memory layout > - A guest uses physical addresses to access memory and devices > - The MPU at EL2 manages EL1 stage 2 access permissions and attributes > - There are a limited number of MPU protection regions at both EL2 and EL1: > - Architecturally, the maximum number of protection regions is 256, > typical implementations have 32. > - By contrast, Xen does not need to consider the number of page table > entries in theory when using MMU. > - The MPU protection regions at EL2 need to be shared between the hypervisor > and the guest stage 2. > - Requires careful consideration - may impact feature 'fullness' of both > the hypervisor and the guest > - By contrast, when using MMU, Xen has standalone P2M table for guest > stage 2 accesses. > > ## 2. Proposed changes of Xen > ### **2.1. Changes of build system:** > > - ***Introduce new Kconfig options for Armv8-R64***: > Unlike Armv8-A, because lack of MMU support on Armv8-R64, we may not > expect one Xen binary to run on all machines. Xen images are not common > across Armv8-R64 platforms. Xen must be re-built for different Armv8-R64 > platforms. Because these platforms may have different memory layout and > link address. > - `ARM64_V8R`: > This option enables Armv8-R profile for Arm64. Enabling this option > results in selecting MPU. This Kconfig option is used to gate some > Armv8-R64 specific code except MPU code, like some code for Armv8-R64 > only system ID registers access. > > - `ARM_MPU` > This option enables MPU on Armv8-R architecture. Enabling this option > results in disabling MMU. This Kconfig option is used to gate some > ARM_MPU specific code. Once when this Kconfig option has been enabled, > the MMU relate code will not be built for Armv8-R64. The reason why > not depends on runtime detection to select MMU or MPU is that, we don't > think we can use one image for both Armv8-R64 and Armv8-A64. Another > reason that we separate MPU and V8R in provision to allow to support MPU > on 32bit Arm one day. > > ***Try to use `if ( IS_ENABLED(CONFIG_ARMXXXX) )` instead of spreading*** > ***`#ifdef CONFIG_ARMXXXX` everywhere, if it is possible.*** > > - ***About Xen start address for Armv8-R64***: > On Armv8-A, Xen has a fixed virtual start address (link address too) on all > Armv8-A platforms. In an MMU based system, Xen can map its loaded address > to this virtual start address. On Armv8-A platforms, the Xen start address > does not need to be configurable. But on Armv8-R platforms, they don't have > MMU to map loaded address to a fixed virtual address. And different > platforms > will have very different address space layout, so it's impossible for Xen to > specify a fixed physical address for all Armv8-R platforms' start address. > > - `XEN_START_ADDRESS` > This option allows to set the custom address at which Xen will be > linked. This address must be aligned to a page size. Xen's run-time > addresses are the same as the link time addresses. > ***Notes: Fixed link address means the Xen binary could not be*** > ***relocated by EFI loader. So in current stage, Xen could not*** > ***be launched as an EFI application on Armv8-R64.(TODO#3.3)*** > > - Provided by platform files. > We can reuse the existed arm/platforms store platform specific files. > And `XEN_START_ADDRESS` is one kind of platform specific information. > So we can use platform file to define default `XEN_START_ADDRESS` for > each platform. > > - Provided by Kconfig. > This option can be an independent or a supplymental option. Users can > define a customized `XEN_START_ADDRESS` to override the default value > in platform's file. > > - Generated from device tree by build scripts (optional) > Vendors who want to enable Xen on their Armv8-R platforms, they can > use some tools/scripts to parse their boards device tree to generate > the basic platform information. These tools/scripts do not necessarily > need to be integrated in Xen, but Xen can give some recommended > configuration. For example, Xen can recommend Armv8-R platforms to use > lowest ram start address + 2MB as the default Xen start address. > The generated platform files can be placed to arm/platforms for > maintenance. > > - Enable Xen PIC/PIE (optional) > We have mentioned about PIC/PIE in section 1.2. With PIC/PIE support, > Xen can run from everywhere it has been loaded. But it's rare to use > PIC/PIE on a real-time system (code size, more memory access). So a > partial PIC/PIE image maybe better (see 3. TODO section). But partial > PIC/PIE image may not solve this Xen start address issue.
I like the description of the XEN_START_ADDRESS problem and solutions. For the initial implementation, a platform file is fine. We need to start easy. Afterwards, I think it would be far better to switch to a script that automatically generates XEN_START_ADDRESS from the host device tree. Also, if we provide a way to customize the start address via Kconfig, then the script that reads the device tree could simply output the right CONFIG_* option for Xen to build. It wouldn't even have to generate an header file. > - ***About MPU initialization before parsing device tree***: > Before Xen can start parsing information from device tree and use > this information to setup MPU, Xen need an initial MPU state. This > is because: > 1. More deterministic: Arm MPU supports background regions, if we > don't configure the MPU regions and don't enable MPU. The default > MPU background attributes will take effect. The default background > attributes are `IMPLEMENTATION DEFINED`. That means all RAM regions > may be configured to device memory and RWX. Random values in RAM or > maliciously embedded data can be exploited. > 2. More compatible: On some Armv8-R64 platforms, if MPU is disabled, > the `dc zva` instruction will make the system halt (This is one > side effect of MPU background attributes, the RAM has been configured > as device memory). And this instruction will be embedded in some > built-in functions, like `memory set`. If we use `-ddont_use_dc` to > rebuild GCC, the built-in functions will not contain `dc zva`. > However, it is obviously unlikely that we will be able to recompile > all GCC for ARMv8-R64. > > - Reuse `XEN_START_ADDRESS` > In the very beginning of Xen boot, Xen just need to cover a limited > memory range and very few devices (actually only UART device). So we > can use two MPU regions to map: > 1. `XEN_START_ADDRESS` to `XEN_START_ADDRESS + 2MB` or. > `XEN_START_ADDRESS` to `XEN_START_ADDRESS + image_size`as > normal memory. > 2. `UART` MMIO region base to `UART` MMIO region end to device memory. > These two are enough to support Xen run in boot time. And we don't need > to provide additional platform information for initial normal memory > and device memory regions. In current PoC we have used this option > for implementation, and it's the same as Armv8-A. > > - Additional platform information for initial MPU state > Introduce some macros to allow users to set initial normal > memory regions: > `ARM_MPU_NORMAL_MEMORY_START` and `ARM_MPU_NORMAL_MEMORY_END` > and device memory: > `ARM_MPU_DEVICE_MEMORY_START` and `ARM_MPU_DEVICE_MEMORY_END` > These macros are the same platform specific information as > `XEN_START_ADDRESS`, so the options#1/#2/#3 of generating > `XEN_START_ADDRESS` also can be applied to these macros. > ***From our current PoC work, we think these macros may*** > ***not be necessary. But we still place them here to see*** > ***whether the community will have some different scenarios*** > ***that we haven't considered.*** I think it is fine for now. And their values could be automatically generated by the same script that will automatically generate XEN_START_ADDRESS from the host device tree. > - ***Define new system registers for compiliers***: > Armv8-R64 is based on Armv8.4. That means we will use some Armv8.4 > specific system registers. As Armv8-R64 only have secure state, so > at least, `VSTCR_EL2` and `VSCTLR_EL2` will be used for Xen. And the > first GCC version that supports Armv8.4 is GCC 8.1. In addition to > these, PMSA of Armv8-R64 introduced lots of MPU related system registers: > `PRBAR_ELx`, `PRBARx_ELx`, `PRLAR_ELx`, `PRLARx_ELx`, `PRENR_ELx` and > `MPUIR_ELx`. But the first GCC version to support these system registers > is GCC 11. So we have two ways to make compilers to work properly with > these system registers. > 1. Bump GCC version to GCC 11. > The pros of this method is that, we don't need to encode these > system registers in macros by ourselves. But the cons are that, > we have to update Makefiles to support GCC 11 for Armv8-R64. > 1.1. Check the GCC version 11 for Armv8-R64. > 1.2. Add march=armv8r to CFLAGS for Armv8-R64. > 1.3. Solve the confliction of march=armv8r and mcpu=generic > These changes will affect common Makefiles, not only Arm Makefiles. > And GCC 11 is new, lots of toolchains and Distro haven't supported it. > > 2. Encode new system registers in macros ***(preferred)*** > ``` > /* Virtualization Secure Translation Control Register */ > #define VSTCR_EL2 S3_4_C2_C6_2 > /* Virtualization System Control Register */ > #define VSCTLR_EL2 S3_4_C2_C0_0 > /* EL1 MPU Protection Region Base Address Register encode */ > #define PRBAR_EL1 S3_0_C6_C8_0 > ... > /* EL2 MPU Protection Region Base Address Register encode */ > #define PRBAR_EL2 S3_4_C6_C8_0 > ... > ``` > If we encode all above system registers, we don't need to bump GCC > version. And the common CFLAGS Xen is using still can be applied to > Armv8-R64. We don't need to modify Makefiles to add specific CFLAGS. > ***Notes:*** > ***Armv8-R AArch64 supports the A64 ISA instruction set with*** > ***some modifications:*** > ***Redefines DMB, DSB, and adds an DFB. But actually, the*** > ***encodings of DMB and DSB are still the same with A64.*** > ***And DFB is an alias of DSB #12. In this case, we think*** > ***we don't need a new architecture specific flag to*** > ***generate new instructions for Armv8-R.*** I think that for the initial implementation either way is fine. I agree that macros would be better than requiring GCC 11. > ### **2.2. Changes of the initialization process** > In general, we still expect Armv8-R64 and Armv8-A64 to have a consistent > initialization process. In addition to some architecutre differences, there > is no more than reusable code that we will distinguish through CONFIG_ARM_MPU > or CONFIG_ARM64_V8R. We want most of the initialization code to be reusable > between Armv8-R64 and Armv8-A64. > > - We will reuse the original head.s and setup.c of Arm. But replace the > MMU and page table operations in these files with configuration operations > for MPU and MPU regions. > > - We provide a boot-time MPU configuration. This MPU configuration will > support Xen to finish its initialization. And this boot-time MPU > configuration will record the memory regions that will be parsed from > device tree. > > In the end of Xen initialization, we will use a runtime MPU configuration > to replace boot-time MPU configuration. The runtime MPU configuration will > merge and reorder memory regions to save more MPU regions for guests. > > ![img](https://drive.google.com/uc?export=view&id=1wTFyK2XfU3lTlH1PqRDoacQVTwUtWIGU) > > - Defer system unpausing domain after free_init_memory. > When Xen initialization is about to end, Xen unpauses guests created > during initialization. But this will cause some issues. The unpause > action occurs before free_init_memory, however the runtime MPU > configuration is built after free_init_memory. In Draft-A, we had > discussed whether a zeroing operation for init code and data is > enough or not. Because I had just given a security reason for doing > free_init_memory on Armv8-R (free_init_memory will drop the Xen init > code & data, this will reduce the code an attacker can exploit). > But I forgot other very important reasons: > 1. Init code and data will occupy two MPU regions, because they > have different memory attributes. > 2. It's not easy to zero init code section, because it's readonly. > We have to update its MPU region to make this section RW. This > operation doesn't do much less than free_init_memory. > 3. Zeroing init code and data will not release the two MPU regions > they are using. This would be a very big waste of a limited MPU > regions resource. > 4. Current free_init_memory operation is reusing lots of Armv8-A > codes, except re-add init memory to Xen heap. Becuase we're using > static heap on Armv8-R. > > So if the unpaused guests start executing the context switch at this > point, then its MPU context will base on the boot-time MPU configuration. > Probably it will be inconsistent with runtime MPU configuration, this > will cause unexpected problems (This may not happen in a single core > system, but on SMP systems, this problem is forseeable, so we hope to > solve it at the beginning). > > Why we need to switch the MPU configuration that late? > Because we need to re-order the MPU regions to reduce complexity of runtime > MPU regions management. > 1. In the boot stage, we allocate MPU regions in sequence until the max. > Since a few MPU regions will get removed along the way, they will leave > holes there. For example, when heap is ready, fdt will be reallocated > in the heap, which means the MPU region for device tree is never needed. > And also in free_init_memory, although we do not add init memory to heap, > we still reclaim the MPU regions they are using. Without ordering, we > may need a bitmap to record such information. > > In context switch, the memory layout is quite different for guest mode > and hypervisor mode. When switching to guest mode, only guest RAM, > emulated/passthrough devices, etc could be seen, but in hypervisor mode, > all Xen used devices and guests RAM shall be seen. And without > reordering, > we need to iterate all MPU regions to find according regions to disable > during runtime context switch, that's definitely a overhead. > > So we propose an ordering at the tail of the boot time, to put all fixed > MPU regions in the head, like xen text/data, etc, and put all flexible > ones at tail, like device memory, guests RAM. > > Then later in runtime, like context switch, we could easily just disable > ones from tail and inserts new ones in the tail. > > ### **2.3. Changes to reduce memory fragmentation** > > In general, memory in Xen system can be classified to 4 classes: > `image sections`, `heap sections`, `guest RAM`, `boot modules (guest Kernel, > initrd and dtb)` > > Currently, Xen doesn't have any restriction for users how to allocate > memory for different classes. That means users can place boot modules > anywhere, can reserve Xen heap memory anywhere and can allocate guest > memory anywhere. > > In a VMSA system, this would not be too much of a problem, since the > MMU can manage memory at a granularity of 4KB after all. But in a > PMSA system, this will be a big problem. On Armv8-R64, the max MPU > protection regions number has been limited to 256. But in typical > processor implementations, few processors will design more than 32 > MPU protection regions. Add in the fact that Xen shares MPU protection > regions with guest's EL1 Stage 2. It becomes even more important > to properly plan the use of MPU protection regions. > > - An ideal of memory usage layout restriction: > ![img](https://drive.google.com/uc?export=view&id=1kirOL0Tx2aAypTtd3kXAtd75XtrngcnW) > 1. Reserve proper MPU regions for Xen image (code, rodata and data + bss). > 2. Reserve one MPU region for boot modules. > That means the placement of all boot modules, include guest kernel, > initrd and dtb, will be limited to this MPU region protected area. > 3. Reserve one or more MPU regions for Xen heap. > On Armv8-R64, the guest memory is predefined in device tree, it will > not be allocated from heap. Unlike Armv8-A64, we will not move all > free memory to heap. We want Xen heap is dertermistic too, so Xen on > Armv8-R64 also rely on Xen static heap feature. The memory for Xen > heap will be defined in tree too. Considering that physical memory > can also be discontinuous, one or more MPU protection regions needs > to be reserved for Xen HEAP. > 4. If we name above used MPU protection regions PART_A, and name left > MPU protection regions PART_B: > 4.1. In hypervisor context, Xen will map left RAM and devices to PART_B. > This will give Xen the ability to access whole memory. > 4.2. In guest context, Xen will create EL1 stage 2 mapping in PART_B. > In this case, Xen just need to update PART_B in context switch, > but keep PART_A as fixed. > > ***Notes: Static allocation will be mandatory on MPU based systems*** > > **A sample device tree of memory layout restriction**: > ``` > chosen { > ... > /* > * Define a section to place boot modules, > * all boot modules must be placed in this section. > */ > mpu,boot-module-section = <0x10000000 0x10000000>; > /* > * Define a section to cover all guest RAM. All guest RAM must be located > * within this section. The pros is that, in best case, we can only have > * one MPU protection region to map all guest RAM for Xen. > */ > mpu,guest-memory-section = <0x20000000 0x30000000>; > /* > * Define a memory section that can cover all device memory that > * will be used in Xen. > */ > mpu,device-memory-section = <0x80000000 0x7ffff000>; > /* Define a section for Xen heap */ > xen,static-mem = <0x50000000 0x20000000>; > > domU1 { > ... > #xen,static-mem-address-cells = <0x01>; > #xen,static-mem-size-cells = <0x01>; > /* Statically allocated guest memory, within mpu,guest-memory-section > */ > xen,static-mem = <0x30000000 0x1f000000>; > > module@11000000 { > compatible = "multiboot,kernel\0multiboot,module"; > /* Boot module address, within mpu,boot-module-section */ > reg = <0x11000000 0x3000000>; > ... > }; > > module@10FF0000 { > compatible = "multiboot,device-tree\0multiboot,module"; > /* Boot module address, within mpu,boot-module-section */ > reg = <0x10ff0000 0x10000>; > ... > }; > }; > }; > ``` > It's little hard for users to compose such a device tree by hand. Based > on the discussion of Draft-A, Xen community suggested users to use some > tools like > [imagebuilder](https://gitlab.com/ViryaOS/imagebuilder/-/blob/master/scripts/uboot-script-gen#L390) > to generate the above device tree properties. > Please goto TODO#3.3 section to get more details of this suggestion. Yes, I think we'll need an ImageBuilder script to populate these entries automatically. With George's help, I moved ImageBuilder to Xen Project. This is the new repository: https://gitlab.com/xen-project/imagebuilder The script to generate mpu,boot-module-section and the other mpu addresses could be the same ImageBuilder script that generates also XEN_START_ADDRESS. > ### **2.4. Changes of memory management** > Xen is coupled with VMSA, in order to port Xen to Armv8-R64, we have to > decouple Xen from VMSA. And give Xen the ablity to manage memory in PMSA. > > 1. ***Use buddy allocator to manage physical pages for PMSA*** > From the view of physical page, PMSA and VMSA don't have any difference. > So we can reuse buddy allocator on Armv8-R64 to manage physical pages. > The difference is that, in VMSA, Xen will map allocated pages to virtual > addresses. But in PMSA, Xen just convert the pages to physical address. > > 2. ***Can not use virtual address for memory management*** > As Armv8-R64 only has PMSA in EL2, Xen loses the ability of using virtual > address to manage memory. This brings some problems, some virtual address > based features could not work well on Armv8-R64, like `FIXMAP`, > `vmap/vumap`, > `ioremap` and `alternative`. > > But the functions or macros of these features are used in lots of common > code. So it's not good to use `#ifdef CONFIG_ARM_MPU` to gate relate code > everywhere. In this case, we propose to use stub helpers to make the > changes > transparently to common code. > 1. For `FIXMAP`, we will use `0` in `FIXMAP_ADDR` for all fixmap > operations. > This will return physical address directly of fixmapped item. > 2. For `vmap/vumap`, we will use some empty inline stub helpers: > ``` > static inline void vm_init_type(...) {} > static inline void *__vmap(...) > { > return NULL; > } > static inline void vunmap(const void *va) {} > static inline void *vmalloc(size_t size) > { > return NULL; > } > static inline void *vmalloc_xen(size_t size) > { > return NULL; > } > static inline void vfree(void *va) {} > ``` > > 3. For `ioremap`, it depends on `vmap`. As we have make `vmap` to always > return `NULL`, they could not work well on Armv8-R64 without changes. > `ioremap` will return input address directly. But if some extended > functions like `ioremap_nocache`, `ioremap_cache`, need to ask a new > memory attributes. As Armv8-R doesn't have infinite MPU regions for > Xen to split the memory area from its located MPU region and assign > the new attributes to it. So in `ioremap_nocache`, `ioremap_cache`, > if the input attributes are different from current memory attributes, > these functions will return `NULL`. > ``` > static inline void *ioremap_attr(...) > { > /* We don't have the ability to change input PA cache attributes > */ > if ( CACHE_ATTR_need_change ) > return NULL; > return (void *)pa; > } > static inline void __iomem *ioremap_nocache(...) > { > return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE); > } > static inline void __iomem *ioremap_cache(...) > { > return ioremap_attr(start, len, PAGE_HYPERVISOR); > } > static inline void __iomem *ioremap_wc(...) > { > return ioremap_attr(start, len, PAGE_HYPERVISOR_WC); > } > void *ioremap(...) > { > return ioremap_attr(pa, len, PAGE_HYPERVISOR_NOCACHE); > } > > ``` > 4. For `alternative`, it has been listed in TODO, we will simply disable > it on Armv8-R64 in current stage. But simply disable `alternative` > will make `cpus_have_const_cap` always return false. > ``` > * System capability check for constant cap */ > #define cpus_have_const_cap(num) ({ \ > register_t __ret; \ > \ > asm volatile (ALTERNATIVE("mov %0, #0", \ > "mov %0, #1", \ > num) \ > : "=r" (__ret)); \ > \ > unlikely(__ret); \ > }) > ``` > So, before we have an PMSA `alternative` implementation, we have to > implement a separate `cpus_have_const_cap` for Armv8-R64: > ``` > #define cpus_have_const_cap(num) cpus_have_cap(num) > ``` > > ### **2.5. Changes of guest management** > Armv8-R64 only supports PMSA in EL2, but it supports configurable > VMSA or PMSA in EL1. This means Xen will have a new type guest on > Armv8-R64 - MPU based guest. > > 1. **Add a new domain type - MPU_DOMAIN** > When user want to create a guest that will be using MPU in EL1, user > should add a `mpu` property in device tree `domU` node, like following > example: > ``` > domU2 { > compatible = "xen,domain"; > direct-map; > mpu; --> Indicates this domain will use PMSA in EL1. > ... > }; > ``` > Corresponding to `mpu` property in device tree, we also need to introduce > a new flag `XEN_DOMCTL_CDF_INTERNAL_mpu` for domain to mark itself as an > MPU domain. This flag will be used in domain creation and domain doing > vCPU context switch. > 1. Domain creation need this flag to decide enable PMSA or VMSA in EL1. > 2. vCPU context switch need this flag to decide save/restore MMU or MPU > related registers. > > 2. **Add MPU registers for vCPU to save EL1 MPU context** > Current Xen only supports MMU based guest, so it hasn't considered to > save/restore MPU context. In this case, we need to add MPU registers > to `arch_vcpu`: > ``` > struct arch_vcpu > { > ... > #ifdef CONFIG_ARM_MPU > /* Virtualization Translation Control Register */ > register_t vtcr_el2; > > /* EL1 MPU regions' registers */ > pr_t *mpu_regions; > #endif > ... > } > ``` > Armv8-R64 can support max to 256 MPU regions. But that's just theoretical. > So we don't want to embed `pr_t mpu_regions[256]` in `arch_vcpu` directly, > this will be a memory waste in most cases. Instead we use a pointer in > `arch_vcpu` to link with a dynamically allocated `mpu_regions`: > ``` > p->arch.mpu_regions = _xzalloc(sizeof(pr_t) * mpu_regions_count_el1, > SMP_CACHE_BYTES); > ``` > As `arch_vcpu` is used very frequently in context switch, so Xen defines > `arch_vcpu` as a cache alignment data structure. `mpu_regions` also will > be used very frequently in Armv8-R context switch. So we use `_xzalloc` > to allocate `SMP_CACHE_BYTES` alignment memory for `mpu_regions`. > > `mpu_regions_count_el1` can be detected from `MPUIR_EL1` system register > in Xen boot stage. The limitation is that, if we define a static > `arch_vcpu`, we have to allocate `mpu_regions` before using it. > > 3. **MPU based P2M table management** > Armv8-R64 EL2 doesn't have EL1 stage 2 address translation. But through > PMSA, it still has the ability to control the permissions and attributes > of EL1 stage 2. In this case, we still hope to keep the interface > consistent with MMU based P2M as far as possible. > > p2m->root will point to an allocated memory. In Armv8-A64, this memory > is used to save the EL1 stage 2 translation table. But in Armv8-R64, > this memory will be used to store EL2 MPU protection regions that are > used by guest. During domain creation, Xen will prepare the data in > this memory to make guest can access proper RAM and devices. When the > guest's vCPU will be scheduled in, this data will be written to MPU > protection region registers. > > ### **2.6. Changes of exception trap** > As Armv8-R64 has compatible excetpion mode with Armv8-A64, so we can reuse > most > of Armv8-A64's exception trap & handler code. But except the trap based on EL1 > stage 2 translation abort. > > In Armv8-A64, we use `FSC_FLT_TRANS` > ``` > case FSC_FLT_TRANS: > ... > if ( is_data ) > { > enum io_state state = try_handle_mmio(regs, hsr, gpa); > ... > } > ``` > But for Armv8-R64, we have to use `FSC_FLT_PERM` > ``` > case FSC_FLT_PERM: > ... > if ( is_data ) > { > enum io_state state = try_handle_mmio(regs, hsr, gpa); > ... > } > ``` > > ### **2.5. Changes of device driver** > Because Armv8-R64 only has single secure state, this will affect some > devices that have two secure state, like GIC. But fortunately, most > vendors will not link a two secure state GIC to Armv8-R64 processors. > Current GIC driver can work well with single secure state GIC for Armv8-R64. > > ### **2.7. Changes of virtual device** > Currently, we only support pass-through devices in guest. Becuase event > channel, xen-bus, xen-storage and other advanced Xen features haven't been > enabled in Armv8-R64. > > ## 3. TODO > This section describes some features that are not currently implemented in > the PoC. Those features are things that should be looked in a second stage > and will not be part of the initial support of MPU/Armv8-R. Those jobs could > be done by Arm or any Xen contributors. > > ### 3.1. Alternative framework support > On Armv8-A system, `alternative` is depending on `VMAP` function to remap > a code section to a new read/write virtual address. But on Armv8-R, we do > not have virtual address to do remap. So as an alternative method, we will > disable the MPU to make all RAM `RWX` in "apply alternative all patches" > progress temporarily. > > 1. Disable MPU -> Code section becomes RWX. > 2. Apply alternative patches to Xen text. > 3. Enable MPU -> Code section restores to RX. > > All memory is RWX, there may be some security risk. But, because > "alternative apply patches" happens in Xen init stage, it propoably > doesn't matter as much. > > ### 3.2. Xen Event Channel Support > In Current RFC patches we haven't enabled the event channel support. > But I think it's good opportunity to do some discussion in advanced. > On Armv8-R, all VMs are native direct-map, because there is no stage2 > MMU translation. Current event channel implementation depends on some > shared pages between Xen and guest: `shared_info` and per-cpu `vcpu_info`. > > For `shared_info`, in current implementation, Xen will allocate a page > from heap for `shared_info` to store initial meta data. When guest is > trying to setup `shared_info`, it will allocate a free gfn and use a > hypercall to setup P2M mapping between gfn and `shared_info`. > > For direct-mapping VM, this will break the direct-mapping concept. > And on an MPU based system, like Armv8-R system, this operation will > be very unfriendly. Xen need to pop `shared_info` page from Xen heap > and insert it to VM P2M pages. If this page is in the middle of > Xen heap, this means Xen need to split current heap and use extra > MPU regions. Also for the P2M part, this page is unlikely to form > a new continuous memory region with the existing p2m pages, and Xen > is likely to need another additional MPU region to set it up, which > is obviously a waste for limited MPU regions. And This kind of dynamic > is quite hard to imagine on an MPU system. Yeah, it doesn't make any sense for MPU systems > For `vcpu_info`, in current implementation, Xen will store `vcpu_info` > meta data for all vCPUs in `shared_info`. When guest is trying to setup > `vcpu_info`, it will allocate memory for `vcpu_info` from guest side. > And then guest will use hypercall to copy meta data from `shared_info` > to guest page. After that both Xen `vcpu_info` and guest `vcpu_info` > are pointed to the same page that allocated by guest. > > This implementation has serval benifits: > 1. There is no waste memory. No extra memory will be allocated from Xen > heap. > 2. There is no P2M remap. This will not break the direct-mapping, and > is MPU system friendly. > So, on Armv8-R system, we can still keep current implementation for > per-cpu `vcpu_info`. > > So, our proposal is that, can we reuse current implementation idea of > `vcpu_info` for `shared_info`? We still allocate one page for > `d->shared_info` at domain construction for holding some initial > meta-data, > using alloc_domheap_pages instead of alloc_xenheap_pages and > share_xen_page_with_guest. And when guest allocates a page for > `shared_info` and use hypercall to setup it, We copy the initial data > from > `d->shared_info` to it. And after copy we can update `d->shared_info` to > point > to guest allocated 'shared_info' page. In this case, we don't have to > think > about the fragmentation of Xen heap and p2m and the extra MPU regions. Yes, I think that would work. Also I think it should be possible to get rid of the initial d->shared_info allocation in Xen, given that d->shared_info is for the benefit of the guest and the guest cannot access it until it makes the XENMAPSPACE_shared_info hypercall. > But here still has some concerns: > `d->shared_info` in Xen is accessed without any lock. So it will not be > that simple to update `d->shared_info`. It might be possible to protect > d->shared_info (or other structure) with a read-write lock. > > Do we need to add PGT_xxx flags to make it global and stay as much the > same with the original op, a simple investigation tells us that it only > be referred in `get_page_type`. Since ARM doesn't care about typecounts > and always return 1, it doesn't have too much impact. > > ### 3.3. Xen Partial PIC/PIE > As we have described in `XEN_START_ADDRESS` section. PIC/PIE can solve > different platforms have different `XEN_START_ADDRESS` issue. But we > also describe some issues to use PIC/PIE in real time systems like > Armv8-R platforms. > > But a partial PIC/PIE support may be needed for Armv8-R. Because Arm > [EBBR](https://arm-software.github.io/ebbr/index.html) require Xen > on Armv8-R to support EFI boot service. Due to lack of relocation > capability, EFI loader could not launch xen.efi on Armv8-R. So maybe > we still need a partially supported PIC/PIE. Only some boot code > support PIC/PIE to make EFI relocation happy. This boot code will > help Xen to check its loaded address and relocate Xen image to Xen's > run-time address if need. > > ### 3.4. A tool to generate Armv8-R Xen device tree > 1. Use a tool to generate above device tree property. > This tool will have some similar inputs as below: > --- > DEVICE_TREE="fvp_baremetal.dtb" > XEN="4.16-2022.1/xen" > > NUM_DOMUS=1 > DOMU_KERNEL[0]="4.16-2022.1/Image-domU" > DOMU_RAMDISK[0]="4.16-2022.1/initrd.cpio" > DOMU_PASSTHROUGH_DTB[0]="4.16-2022.1/passthrough-example-dev.dtb" > DOMU_RAM_BASE[0]=0x30000000 > DOMU_RAM_SIZE[0]=0x1f000000 > --- > Using above inputs, the tool can generate a device tree similar as > we have described in sample. > > - `mpu,guest-memory-section`: > This section will cover all guests' RAM (`xen,static-mem` defined regions > in all DomU nodes). All guest RAM must be located within this section. > In the best case, we can only have one MPU protection region to map all > guests' RAM for Xen. > > If users set `DOMU_RAM_BASE` and `DOMU_RAM_SIZE`, these will be converted > to the base and size of `xen,static-mem`. This tool will scan all > `xen, static-mem` in DomU nodes to determin the base and size of > `mpu,guest-memory-section`. If there is any other kind of memory usage > has been detected in this section, this tool can report an error. > Except build time check, Xen also need to do runtime check to prevent a > bad device tree that generated by malicious tools. > > If users set `DOMU_RAM_SIZE` only, this will be converted to the size of > `xen,static-mem` only. Xen will allocate the guest memory in runtime, but > not from Xen heap. `mpu,guest-memory-section` will be caculated in runtime > too. The property in device tree doesn't need or will be ignored by Xen. I am fine with this. You should also know that there was a recent discussion about adding something like: # address size address size ... DOMU_STATIC_MEM_RANGES[0]="0xe000000 0x1000000 0xa0000000 0x30000000" to the ImageBuilder config file. > - `mpu,boot-module-section`: > This section will be used to store the boot modules like DOMU_KERNEL, > DOMU_RAMDISK, and DOMU_PASSTHROUGH_DTB. Xen keeps all boot modules in > this section to meet the requirment of DomU restart on Armv8-R. In > current stage, we don't have a privilege domain like Dom0 that can > access filesystem to reload DomU images. > > And in current Xen code, the base and size are mandatory for boot modules > If users don't specify the base of each boot module, the tool will > allocte a base for each module. And the tool will generate the > `mpu,boot-module-section` region, when it finishs boot module memory > allocation. > > Users also can specify the base and size of each boot module, these will > be converted to the base and size of module's `reg` directly. The tool > will scan all modules `reg` in DomU nodes to generate the base and size of > `mpu,boot-module-section`. If there is any kind of other memory usage > has been detected in this section, this tool can report an error. > Except build time check, Xen also need to do runtime check to prevent a > bad device tree that generated by malicious tools. Xen should always check for the validity of its input. However I should point out that there is no "malicious tool" in this picture because a malicious entity with access to the tool would also have access to Xen directly, so they might as well replace the Xen binary. > - `mpu,device-memory-section`: > This section will cover all device memory that will be used in Xen. Like > `UART`, `GIC`, `SMMU` and other devices. We haven't considered multiple > `mpu,device-memory-section` scenarios. The devices' memory and RAM are > interleaving in physical address space, it would be required to use > multiple `mpu,device-memory-section` to cover all devices. This layout > is common on Armv8-A system, especially in server. But it's rare in > Armv8-R. So in current stage, we don't want to allow multiple > `mpu,device-memory-section`. The tool can scan baremetal device tree > to sort all devices' memory ranges. And calculate a proper region for > `mpu,device-memory-section`. If it find Xen need multiple > `mpu,device-memory-section`, it can report an unsupported error. > > 2. Use a tool to generate device tree property and platform files > This opinion still uses the same inputs as opinion#1. But this tool only > generates `xen,static-mem` and `module` nodes in DomU nodes, it will not > generate `mpu,guest-memory-section`, `mpu,boot-module-section` and > `mpu,device-memory-section` properties in device tree. This will > generate following macros: > `MPU_GUEST_MEMORY_SECTION_BASE`, `MPU_GUEST_MEMORY_SECTION_SIZE` > `MPU_BOOT_MODULE_SECTION_BASE`, `MPU_BOOT_MODULE_SECTION_SIZE` > `MPU_DEVICE_MEMORY_SECTION_BASE`, `MPU_DEVICE_MEMORY_SECTION_SIZE` > in platform files in build time. In runtime, Xen will skip the device > tree parsing for `mpu,guest-memory-section`, `mpu,boot-module-section` > and `mpu,device-memory-section`. And instead Xen will use these macros > to do runtime check. > But, this also means these macros only exist in local build system, > these macros will not be maintained in Xen repo. Yes this makes sense to me. I think we should add both scripts to the imagebuilder repository. This way, they could share code easily, and we can keep the documentation in a single place.