Re: [PATCH 08/22] x86/pv: rewrite how building PV dom0 handles domheap mappings

2024-01-10 Thread El Yandouzi, Elias

Hi Jan,

I have been looking at this series recently and tried my best
to address your comments. I'll shortly to the other patches too.

On 22/12/2022 11:48, Jan Beulich wrote:

On 16.12.2022 12:48, Julien Grall wrote:

From: Hongyan Xia 

Building a PV dom0 is allocating from the domheap but uses it like the
xenheap. This is clearly wrong. Fix.


"Clearly wrong" would mean there's a bug here, at lest under certain
conditions. But there isn't: Even on huge systems, due to running on
idle page tables, all memory is mapped at present.


I agree with you, I'll rephrase the commit message.




@@ -711,22 +715,32 @@ int __init dom0_construct_pv(struct domain *d,
  v->arch.pv.event_callback_cs= FLAT_COMPAT_KERNEL_CS;
  }
  
+#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \

+do {\
+UNMAP_DOMAIN_PAGE(virt_var);\


Not much point using the macro when ...


+mfn_var = maddr_to_mfn(maddr);  \
+maddr += PAGE_SIZE; \
+virt_var = map_domain_page(mfn_var);\


... the variable gets reset again to non-NULL unconditionally right
away.


Sure, I'll change that.




+} while ( false )


This being a local macro and all use sites passing mpt_alloc as the
last argument, I think that parameter wants dropping, which would
improve readability.


I have to disagree. It wouldn't improve readability but make only make 
things more obscure. I'll keep the macro as is.





@@ -792,9 +808,9 @@ int __init dom0_construct_pv(struct domain *d,
  if ( !l3e_get_intpte(*l3tab) )
  {
  maddr_to_page(mpt_alloc)->u.inuse.type_info = 
PGT_l2_page_table;
-l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
-clear_page(l2tab);
-*l3tab = l3e_from_paddr(__pa(l2tab), L3_PROT);
+UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
+clear_page(l2start);
+*l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);


The l2start you map on the last iteration here can be re-used ...


@@ -805,9 +821,17 @@ int __init dom0_construct_pv(struct domain *d,
  unmap_domain_page(l2t);
  }


... in the code the tail of which is visible here, eliminating a
redundant map/unmap pair.


Good catch, I'll remove the redundant pair.




@@ -977,8 +1001,12 @@ int __init dom0_construct_pv(struct domain *d,
   * !CONFIG_VIDEO case so the logic here can be simplified.
   */
  if ( pv_shim )
+{
+l4start = map_domain_page(l4start_mfn);
  pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, 
vconsole_start,
vphysmap_start, si);
+UNMAP_DOMAIN_PAGE(l4start);
+}


The, at the first glance, redundant re-mapping of the L4 table here could
do with explaining in the description. However, I further wonder in how
far in shim mode eliminating the direct map is actually useful. Which is
to say that I question the need for this change in the first place. Or
wait - isn't this (unlike the rest of this patch) actually a bug fix? At
this point we're on the domain's page tables, which may not cover the
page the L4 is allocated at (if a truly huge shim was configured). So I
guess the change is needed but wants breaking out, allowing to at least
consider whether to backport it.



I will create a separate patch for this change.


Jan





Re: [PATCH 08/22] x86/pv: rewrite how building PV dom0 handles domheap mappings

2022-12-22 Thread Jan Beulich
On 16.12.2022 12:48, Julien Grall wrote:
> From: Hongyan Xia 
> 
> Building a PV dom0 is allocating from the domheap but uses it like the
> xenheap. This is clearly wrong. Fix.

"Clearly wrong" would mean there's a bug here, at lest under certain
conditions. But there isn't: Even on huge systems, due to running on
idle page tables, all memory is mapped at present.

> @@ -711,22 +715,32 @@ int __init dom0_construct_pv(struct domain *d,
>  v->arch.pv.event_callback_cs= FLAT_COMPAT_KERNEL_CS;
>  }
>  
> +#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \
> +do {\
> +UNMAP_DOMAIN_PAGE(virt_var);\

Not much point using the macro when ...

> +mfn_var = maddr_to_mfn(maddr);  \
> +maddr += PAGE_SIZE; \
> +virt_var = map_domain_page(mfn_var);\

... the variable gets reset again to non-NULL unconditionally right
away.

> +} while ( false )

This being a local macro and all use sites passing mpt_alloc as the
last argument, I think that parameter wants dropping, which would
improve readability.

> @@ -792,9 +808,9 @@ int __init dom0_construct_pv(struct domain *d,
>  if ( !l3e_get_intpte(*l3tab) )
>  {
>  maddr_to_page(mpt_alloc)->u.inuse.type_info = 
> PGT_l2_page_table;
> -l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
> -clear_page(l2tab);
> -*l3tab = l3e_from_paddr(__pa(l2tab), L3_PROT);
> +UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
> +clear_page(l2start);
> +*l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);

The l2start you map on the last iteration here can be re-used ...

> @@ -805,9 +821,17 @@ int __init dom0_construct_pv(struct domain *d,
>  unmap_domain_page(l2t);
>  }

... in the code the tail of which is visible here, eliminating a
redundant map/unmap pair.

> @@ -977,8 +1001,12 @@ int __init dom0_construct_pv(struct domain *d,
>   * !CONFIG_VIDEO case so the logic here can be simplified.
>   */
>  if ( pv_shim )
> +{
> +l4start = map_domain_page(l4start_mfn);
>  pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, 
> vconsole_start,
>vphysmap_start, si);
> +UNMAP_DOMAIN_PAGE(l4start);
> +}

The, at the first glance, redundant re-mapping of the L4 table here could
do with explaining in the description. However, I further wonder in how
far in shim mode eliminating the direct map is actually useful. Which is
to say that I question the need for this change in the first place. Or
wait - isn't this (unlike the rest of this patch) actually a bug fix? At
this point we're on the domain's page tables, which may not cover the
page the L4 is allocated at (if a truly huge shim was configured). So I
guess the change is needed but wants breaking out, allowing to at least
consider whether to backport it.

Jan



[PATCH 08/22] x86/pv: rewrite how building PV dom0 handles domheap mappings

2022-12-16 Thread Julien Grall
From: Hongyan Xia 

Building a PV dom0 is allocating from the domheap but uses it like the
xenheap. This is clearly wrong. Fix.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 



Changes since Hongyan's version:
* Rebase
* Remove spurious newline
---
 xen/arch/x86/pv/dom0_build.c | 56 +++-
 1 file changed, 42 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index c837b2d96f89..cd60f259d1b7 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -383,6 +383,10 @@ int __init dom0_construct_pv(struct domain *d,
 l3_pgentry_t *l3tab = NULL, *l3start = NULL;
 l2_pgentry_t *l2tab = NULL, *l2start = NULL;
 l1_pgentry_t *l1tab = NULL, *l1start = NULL;
+mfn_t l4start_mfn = INVALID_MFN;
+mfn_t l3start_mfn = INVALID_MFN;
+mfn_t l2start_mfn = INVALID_MFN;
+mfn_t l1start_mfn = INVALID_MFN;
 
 /*
  * This fully describes the memory layout of the initial domain. All
@@ -711,22 +715,32 @@ int __init dom0_construct_pv(struct domain *d,
 v->arch.pv.event_callback_cs= FLAT_COMPAT_KERNEL_CS;
 }
 
+#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \
+do {\
+UNMAP_DOMAIN_PAGE(virt_var);\
+mfn_var = maddr_to_mfn(maddr);  \
+maddr += PAGE_SIZE; \
+virt_var = map_domain_page(mfn_var);\
+} while ( false )
+
 if ( !compat )
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
-l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l4start_mfn, l4start, mpt_alloc);
+l4tab = l4start;
 clear_page(l4tab);
-init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
-  d, INVALID_MFN, true);
-v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
+init_xen_l4_slots(l4tab, l4start_mfn, d, INVALID_MFN, true);
+v->arch.guest_table = pagetable_from_mfn(l4start_mfn);
 }
 else
 {
 /* Monitor table already created by switch_compat(). */
-l4start = l4tab = __va(pagetable_get_paddr(v->arch.guest_table));
+l4start_mfn = pagetable_get_mfn(v->arch.guest_table);
+l4start = l4tab = map_domain_page(l4start_mfn);
 /* See public/xen.h on why the following is needed. */
 maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
 l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc);
 }
 
 l4tab += l4_table_offset(v_start);
@@ -736,14 +750,16 @@ int __init dom0_construct_pv(struct domain *d,
 if ( !((unsigned long)l1tab & (PAGE_SIZE-1)) )
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l1_page_table;
-l1start = l1tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l1start_mfn, l1start, mpt_alloc);
+l1tab = l1start;
 clear_page(l1tab);
 if ( count == 0 )
 l1tab += l1_table_offset(v_start);
 if ( !((unsigned long)l2tab & (PAGE_SIZE-1)) )
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info = 
PGT_l2_page_table;
-l2start = l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
+l2tab = l2start;
 clear_page(l2tab);
 if ( count == 0 )
 l2tab += l2_table_offset(v_start);
@@ -753,19 +769,19 @@ int __init dom0_construct_pv(struct domain *d,
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info =
 PGT_l3_page_table;
-l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc);
 }
 l3tab = l3start;
 clear_page(l3tab);
 if ( count == 0 )
 l3tab += l3_table_offset(v_start);
-*l4tab = l4e_from_paddr(__pa(l3start), L4_PROT);
+*l4tab = l4e_from_mfn(l3start_mfn, L4_PROT);
 l4tab++;
 }
-*l3tab = l3e_from_paddr(__pa(l2start), L3_PROT);
+*l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);
 l3tab++;
 }
-*l2tab = l2e_from_paddr(__pa(l1start), L2_PROT);
+*l2tab = l2e_from_mfn(l1start_mfn, L2_PROT);
 l2tab++;
 }
 if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) )
@@ -792,9 +808,9 @@ int __init dom0_construct_pv(struct domain *d,
 if ( !l3e_get_intpte(*l3tab) )