On 16.12.2022 21:17, Andrew Cooper wrote: > Partly for clarity because there is a lot of subtle magic at work here. > Expand the commentary of what's going on. > > Also, because there is no need to double copy the stack (32kB) to retrieve > local values spilled to the stack under the old alias, when all of the > aforementioned local variables are going out of scope anyway. > > There is also a logic change when walking l2_xenmap[]. The old logic would > skip recursing into 4k mappings; this would corrupt Xen if it were used. > There are no 4k mappings in l2_xenmap[] this early on boot, so assert the > property instead of leaving a latent breakage. > > Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> > --- > CC: Jan Beulich <jbeul...@suse.com> > CC: Roger Pau Monné <roger....@citrix.com> > CC: Wei Liu <w...@xen.org> > > We probably want to drop PGE from XEN_MINIMAL_CR4. It will simplify several > boot paths which don't need to care about global pages, and PGE is conditional > anyway now with the PCID support.
Perhaps, especially if - as you say - this allows for simplification. > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -467,6 +467,113 @@ static void __init move_memory( > } > } > > +static void __init noinline move_xen(void) > +{ > + l4_pgentry_t *pl4e; > + l3_pgentry_t *pl3e; > + l2_pgentry_t *pl2e; > + unsigned long tmp; > + int i, j, k; Can these become "unsigned int" please at this occasion? (Perhaps as another reason to make the change, mention that the old instances of i and j did shadow outer scope variables in the same function?) > + /* > + * The caller has selected xen_phys_start, ensuring that the old and new > + * locations do not overlap. Make it so. As a non-native speaker I'm struggling with what "Make it so" is supposed to tell me here. > + * Prevent the compiler from reordering anything across this point. Such > + * things will end badly. > + */ > + barrier(); > + > + /* > + * Copy out of the current alias, into the directmap at the new location. > + * This includes a snapshot of the current stack. > + */ > + memcpy(__va(__pa(_start)), _start, _end - _start); > + > + /* > + * We are now in a critical region. Any write (modifying a global, > + * spilling a local to the stack, etc) via the current alias will get > lost > + * when we switch to the new pagetables. This even means no printk()'s > + * debugging purposes. Nit: Missing "for"? > + * Walk the soon-to-be-used pagetables in the new location, relocating > all > + * intermediate (non-leaf) entries to point to their new-location > + * equivalents. All writes are via the directmap alias. > + */ > + pl4e = __va(__pa(idle_pg_table)); > + for ( i = 0 ; i < L4_PAGETABLE_ENTRIES; i++, pl4e++ ) > + { > + if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) ) > + continue; > + > + *pl4e = l4e_from_intpte(l4e_get_intpte(*pl4e) + xen_phys_start); > + pl3e = __va(l4e_get_paddr(*pl4e)); > + for ( j = 0; j < L3_PAGETABLE_ENTRIES; j++, pl3e++ ) > + { > + if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) || > + (l3e_get_flags(*pl3e) & _PAGE_PSE) ) > + continue; > + > + *pl3e = l3e_from_intpte(l3e_get_intpte(*pl3e) + xen_phys_start); > + pl2e = __va(l3e_get_paddr(*pl3e)); > + for ( k = 0; k < L2_PAGETABLE_ENTRIES; k++, pl2e++ ) > + { > + if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) || > + (l2e_get_flags(*pl2e) & _PAGE_PSE) ) > + continue; > + > + *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) + > xen_phys_start); > + } > + } > + } > + > + /* > + * Walk the soon-to-be-used l2_xenmap[], relocating all the leaf mappings > + * so text/data/bss etc refer to the new location in memory. > + */ > + pl2e = __va(__pa(l2_xenmap)); > + for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++, pl2e++ ) > + { > + if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) ) > + continue; > + > + /* > + * We don't have 4k mappings in l2_xenmap[] at this point in boot, > nor > + * is this anticipated to change. Simply assert the fact, rather > than > + * introducing dead logic to decend into l1 tables. Nit: "descend"? > + */ > + ASSERT(l2e_get_flags(*pl2e) & _PAGE_PSE); The warning about the use of printk() around here could make one think that using ASSERT() (or BUG()) is similarly bad. However, aiui using printk() isn't by itself a problem, just that the log message would be lost as far as the circular buffer goes. The message would be observable on the serial con sole though, at least with "sync_console". It is on this basis that using ASSERT() here makes sense. Perhaps the printk() related comment could be slightly adjusted to express this? > + *pl2e = l2e_from_intpte(l2e_get_intpte(*pl2e) + xen_phys_start); > + } > + > + /* > + * Switch to relocated pagetables, shooting down global mappings as we > go. > + */ > + asm volatile ( > + "mov %%cr4, %[cr4]\n\t" > + "andb $~%c[pge], %b[cr4]\n\t" > + "mov %[cr4], %%cr4\n\t" /* CR4.PGE = 0 */ > + "mov %[cr3], %%cr3\n\t" /* CR3 = new pagetables */ > + "orb $%c[pge], %b[cr4]\n\t" I understand you need %c to apply the ~ in the operand of ANDB above. But could I talk you into keeping things as simple as possible here by using plain %[pge]? > + "mov %[cr4], %%cr4\n\t" /* CR4.PGE = 1 */ > + : [cr4] "=&a" (tmp) /* Could be "r", but "a" makes better asm */ > + : [cr3] "r" (__pa(idle_pg_table)), > + [pge] "i" (X86_CR4_PGE) > + : "memory" ); The removed stack copying worries me, to be honest. Yes, for local variables of ours it doesn't matter because they are about to go out of scope. But what if the compiler instantiates any for its own use, e.g. ... > + /* > + * End of the critical region. Updates to locals and globals now work as > + * expected. > + * > + * Updates to local variables which have been spilled to the stack since > + * the memcpy() have been lost. But we don't care, because they're all > + * going out of scope imminently. > + */ > + > + printk("New Xen image base address: %#lx\n", xen_phys_start); ... the result of the address calculation for the string literal here? Such auxiliary calculations can happen at any point in the function, and won't necessarily (hence the example chosen) become impossible for the compiler to do with the memory clobber in the asm(). And while the string literal's address is likely cheap enough to calculate right in the function invocation sequence (and an option would also be to retain the printk() in the caller), other instrumentation options could be undermined by this as well. Jan