On Mon, 2023-07-24 at 16:11 +0200, Jan Beulich wrote:
> On 24.07.2023 11:42, Oleksii Kurochko wrote:
> > @@ -35,8 +36,10 @@ unsigned long __ro_after_init phys_offset;
> >   *
> >   * It might be needed one more page table in case when Xen load
> > address
> >   * isn't 2 MB aligned.
> > + *
> > + * CONFIG_PAGING_LEVELS page tables are needed for identity
> > mapping.
> >   */
> > -#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) + 1)
> > +#define PGTBL_INITIAL_COUNT ((CONFIG_PAGING_LEVELS - 1) * 2) + 1
> 
> Comment addition and code change are at least apparently out of sync:
> With such a comment and without thinking much one would expect the
> constant to be bumped by CONFIG_PAGING_LEVELS. It is true that you
> only need CONFIG_PAGING_LEVELS - 1, because the root table is shared,
> but that would then be nice to also clarify in the comment. E.g.
> 
> "CONFIG_PAGING_LEVELS page tables are needed for the identity
> mapping,
>  except that the root page table is shared with the initial mapping."
Thanks. I'll take into account in the next patch version.

> 
> Also - where did the outermost pair of parentheses go? (Really you
> don't need to parenthesize the multiplication, so the last closing
> one can simply move last.)
Missed it. Thanks. I'll move last parentheses to the end.
> 
> > @@ -75,10 +78,11 @@ static void __init setup_initial_mapping(struct
> > mmu_desc *mmu_desc,
> >      unsigned int index;
> >      pte_t *pgtbl;
> >      unsigned long page_addr;
> > +    bool is_identity_mapping = map_start == pa_start;
> >  
> > -    if ( (unsigned long)_start % XEN_PT_LEVEL_SIZE(0) )
> > +    if ( !IS_ALIGNED((unsigned long)_start, KB(4)) )
> >      {
> > -        early_printk("(XEN) Xen should be loaded at 4k
> > boundary\n");
> > +        early_printk("(XEN) Xen should be loaded at 4KB
> > boundary\n");
> 
> The change to the message looks unrelated.
Yes, you are right. I'll remove that change to the message.

> 
> > @@ -255,25 +261,44 @@ void __init noreturn noinline enable_mmu()
> >      csr_write(CSR_SATP,
> >                PFN_DOWN((unsigned long)stage1_pgtbl_root) |
> >                RV_STAGE1_MODE << SATP_MODE_SHIFT);
> > +}
> >  
> > -    asm volatile ( ".p2align 2" );
> > - mmu_is_enabled:
> > -    /*
> > -     * Stack should be re-inited as:
> > -     * 1. Right now an address of the stack is relative to load
> > time
> > -     *    addresses what will cause an issue in case of load start
> > address
> > -     *    isn't equal to linker start address.
> > -     * 2. Addresses in stack are all load time relative which can
> > be an
> > -     *    issue in case when load start address isn't equal to
> > linker
> > -     *    start address.
> > -     *
> > -     * We can't return to the caller because the stack was reseted
> > -     * and it may have stash some variable on the stack.
> > -     * Jump to a brand new function as the stack was reseted
> > -     */
> > +void __init remove_identity_mapping(void)
> > +{
> > +    static pte_t *pgtbl = stage1_pgtbl_root;
> > +    static unsigned long load_start = XEN_VIRT_START;
> > +    static unsigned int pt_level = CONFIG_PAGING_LEVELS - 1;
> 
> These all want to be __initdata, but personally I find this way of
> recursing a little odd. Let's see what the maintainers say.
I'm not completely happy either. Initially I thought that it would be
better to pass all this stuff as function's arguments.

But then it is needed to provide an access to stage1_pgtbl_root (
get_root_pt() function ? ). So remove_identity_mapping() will be called
as remove_identity_mapping(get_root_pt(), _start, CONFIG_PAGING_LELVELS
-1 ) or remove_identity_mapping(NULL, _start, CONFIG_PAGING_LELVELS -1
) and then check if first argument is NULL then initialize it with
stage1_pgtbl_root.
Also I am not sure that an 'user' should provide all this information
to such function.

Could you recommend something better?

> 
> > +    unsigned long load_end = LINK_TO_LOAD(_end);
> > +    unsigned long xen_size;
> > +    unsigned long pt_level_size = XEN_PT_LEVEL_SIZE(pt_level);
> > +    unsigned long pte_nums;
> > +
> > +    unsigned long virt_indx = pt_index(pt_level, XEN_VIRT_START);
> > +    unsigned long indx;
> > +
> > +    if ( load_start == XEN_VIRT_START )
> > +        load_start = LINK_TO_LOAD(_start);
> > +
> > +    xen_size = load_end - load_start;
> 
> When you come here recursively, don't you need to limit this
> instance of the function to a single page table's worth of address
> space (at the given level), using load_end only if that's yet
> lower?
Do you mean a case when load_start > load_end? If yes then I missed
that.
> 
> > +    pte_nums = ROUNDUP(xen_size, pt_level_size) / pt_level_size;
> > +
> > +    while ( pte_nums-- )
> > +    {
> > +        indx = pt_index(pt_level, load_start);
> >  
> > -    switch_stack_and_jump((unsigned long)cpu0_boot_stack +
> > STACK_SIZE,
> > -                          cont_after_mmu_is_enabled);
> > +        if ( virt_indx != indx )
> > +        {
> > +            pgtbl[indx].pte = 0;
> > +            load_start += XEN_PT_LEVEL_SIZE(pt_level);
> > +        }
> > +        else
> > +        {
> > +            pgtbl =  (pte_t
> > *)LOAD_TO_LINK(pte_to_paddr(pgtbl[indx]));
> 
> Nit: Stray double blank.
Thanks.
> 
> > +            pt_level--;
> > +            remove_identity_mapping();
> 
> Don't you need to restore pgtbl and pt_level here before the loop
> can continue? And because of depending on load_start, which would
> have moved, xen_size also needs suitably reducing, I think. Plus
> pte_nums as well, since that in turn was calculated from xen_size.
I forgot to restore pgtbl and pt_level because initially I used a
function arguments to pass that information so it wasn't needed to
restore them.

Regarding xen_size and pte_nums it looks like it is needed to init only
once on each page table level.
For example we have the following situation:
  ----------------------
   non-identity-mapping
   identity-mapping
  ---------------------- C
   identity-mapping
  ---------------------- B
   identity-mapping
  ---------------------- A
So we calculated that we need to remove 3 ptes, for first two ptes ( as
only identity mapping is there) are removed without any issue, then
move load_addr to C and run recursion
for the pte 'C' to go to next page table level.
At new level we are calculating how many ptes are needed to be removed
and remove only necessary amount of ptes.
When we will back to prev page table level pte_num will be 1 then we
will go to the head of the cycle, decrease pte_num to 0 and exit.

The same is with the case when non-idenitity-mapping is lower than
identity mapping ( but it looks like it is not a case becase
XEN_VIRT_START addr is the highest address by its defintion. Probably
it is needed to add a check ):
we know that pte_num = 3 at some level. Then we go to the next level
and remove there only identity map ptes, back to previous level,
decrease pte_num to 2 and remove only 2 remaining ptes.

~ Oleksii


Reply via email to