On 16.08.2021 21:25, Andrew Cooper wrote:
> On 16/08/2021 16:29, Jan Beulich wrote:
>> About every time I look at dom0_construct_pv()'s "calculation" of
>> nr_pt_pages I question (myself) whether the result is precise or merely
>> an upper bound. I think it is meant to be precise, but I think we would
>> be better off having some checking in place. Hence add ASSERT()s to
>> verify that
>> - all pages have a valid L1...Ln (currently L4) page table type and
>> - no other bits are set, in particular the type refcount is still zero.
>>
>> Signed-off-by: Jan Beulich <jbeul...@suse.com>
>> ---
>> There are (at least) two factors supporting my uncertainty about the
>> "calculation" being precise: The loop starting from 2 (which clearly is
>> too small for a possible result)
> 
> 2 was the correct absolute minimum for 2-level guests.

Which has been history for how many years? The minimum for the
current implementation is 4 afaict, and ...

> XTF kernels don't exceed the 2M boundary (at least, not currently), so
> they can be mapped with only 3 or 4 pagetables, except:
> 
> * 3-level guests are created with 4 L2's for no obvious reason.  This is
> nothing to do with legacy PAE paging, nor with how a typical Linux/BSD
> kernel works.  The requirement to make 3-level guests work (and even
> then, only under 32bit Xen) is to create a PGT_pae_xen_l2 if not already
> covered by the other mappings.  Any non-toy kernel discards these
> pagetables in favour of its own idea of pagetables.

... could be 3 for 32-bit Dom0.

> * v_end is rounded up to 4MB.
> 
> Most XTF guests will operate entirely happily in a few hundred kb of
> space, and the same will be true of other microservices.  The rounding
> up of memory might be helpful for the traditional big VMs case, but it
> isn't correct or useful for other usecases.
> 
>> and an apparently wrong comment stating
>> that not only v_end but also v_start would be superpage aligned
> 
> Which comment?  The only one I see about 4M has nothing to do with
> superpages.

The one immediately ahead of the related variable declarations:

    /*
     * This fully describes the memory layout of the initial domain. All
     * *_start address are page-aligned, except v_start (and v_end) which are
     * superpage-aligned.
     */

I see nothing forcing v_start to be superpage-aligned, while I
do suspect that the "calculation" of the number of page tables
will be wrong when it isn't.

>>  (in fact
>> v_end is 4MiB aligned, which is the superpage size only on long
>> abandoned [by us] non-PAE x86-32).
> 
> Tangentially, that code needs some serious work to use ROUNDUP/DOWN
> macros for clarity.

Agreed.

>> --- a/xen/arch/x86/pv/dom0_build.c
>> +++ b/xen/arch/x86/pv/dom0_build.c
>> @@ -59,6 +59,10 @@ static __init void mark_pv_pt_pages_rdon
>>          l1e_remove_flags(*pl1e, _PAGE_RW);
>>          page = mfn_to_page(l1e_get_mfn(*pl1e));
>>  
>> +        ASSERT(page->u.inuse.type_info & PGT_type_mask);
>> +        ASSERT((page->u.inuse.type_info & PGT_type_mask) <= 
>> PGT_root_page_table);
> 
> This is an obfuscated
> 
> ASSERT((page->u.inuse.type_info & PGT_type_mask) >= PGT_l1_page_table &&
>        (page->u.inuse.type_info & PGT_type_mask) <= PGT_root_page_table);

I can certainly switch to this yet longer piece of code, and ...

> and
> 
>> +        ASSERT(!(page->u.inuse.type_info & ~PGT_type_mask));
> 
> this has no context.
> 
> At a bare minimum, you need a comment stating what properties we're
> looking for, so anyone suffering an assertion failure has some clue as
> to what may have gone wrong.

... I can certainly transform the respective parts of the
description into a code comment.

Jan


Reply via email to