On 16.12.2022 12:48, Julien Grall wrote:
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1648,6 +1648,22 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  
>      numa_initmem_init(0, raw_max_page);
>  
> +    /*
> +     * When we do not have a direct map, memory for metadata of heap nodes in
> +     * init_node_heap() is allocated from xenheap, which needs to be mapped 
> and
> +     * unmapped on demand. However, we cannot just take memory from the boot
> +     * allocator to create the PTEs while we are passing memory to the heap
> +     * allocator during end_boot_allocator().
> +     *
> +     * To solve this race, we need to leave early boot before
> +     * end_boot_allocator() so that Xen PTE pages are allocated from the heap
> +     * instead of the boot allocator. We can do this because the metadata for
> +     * the 1st node is statically allocated, and by the time we need memory 
> to
> +     * create mappings for the 2nd node, we already have enough memory in the
> +     * heap allocator in the 1st node.
> +     */

Is this "enough" guaranteed, or merely a hope (and true in the common case,
but maybe not when the 1st node ends up having very little memory)?

> +    system_state = SYS_STATE_boot;
> +
>      if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) )
>      {
>          unsigned long limit = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
> @@ -1677,8 +1693,6 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>      else
>          end_boot_allocator();
>  
> -    system_state = SYS_STATE_boot;

I'm afraid I don't view this as viable - there are assumptions not just in
the page table allocation functions that SYS_STATE_boot (or higher) means
that end_boot_allocator() has run (e.g. acpi_os_map_memory()). You also do
this for x86 only. I think system_state wants leaving alone here, and an
arch specific approach wants creating for the page table allocation you
talk of.

Jan

Reply via email to