RE: [PATCH 11/37] xen/x86: abstract neutral code from acpi_numa_memory_affinity_init

2022-01-26 Thread Wei Chen
Hi Jan,

> -Original Message-
> From: Jan Beulich 
> Sent: 2022年1月25日 0:51
> To: Wei Chen 
> Cc: Bertrand Marquis ; xen-
> de...@lists.xenproject.org; sstabell...@kernel.org; jul...@xen.org
> Subject: Re: [PATCH 11/37] xen/x86: abstract neutral code from
> acpi_numa_memory_affinity_init
> 
> On 23.09.2021 14:02, Wei Chen wrote:
> > There is some code in acpi_numa_memory_affinity_init to update node
> > memory range and update node_memblk_range array. This code is not
> > ACPI specific, it can be shared by other NUMA implementation, like
> > device tree based NUMA implementation.
> >
> > So in this patch, we abstract this memory range and blocks relative
> > code to a new function. This will avoid exporting static variables
> > like node_memblk_range. And the PXM in neutral code print messages
> > have been replaced by NODE, as PXM is ACPI specific.
> >
> > Signed-off-by: Wei Chen 
> 
> SRAT is an ACPI concept, which I assume has no meaning with DT. Hence
> any generically usable logic here wants, I think, separating out into
> a file which is not SRAT-specific (peeking ahead, specifically not a
> file named "numa_srat.c"). This may in turn require some more though

When I created the file, I wanted to place non-ACPI/DT specific code in
a new file. But I was confused about how to name it. I chose numa_srat.c
as the file name because I thought the device tree is also a static
resource table. But it seems this name is still misleading, because
ACPI SRAT is well known. 

> regarding the proper split between the stuff remaining in srat.c and
> the stuff becoming kind of library code. In particular this may mean
> moving some of the static variables as well, and with them perhaps
> some further functions (while I did peek ahead, I didn't look closely
> at the later patch doing the actual movement). And it is then hard to
> see why the separation needs to happen in two steps - you could move
> the generically usable code to a new file right away.
> 

OK, I will reduce the steps. And I think the "new file" can be common/numa.c.
Because the generically usable code are some logical functions to check numa
memory blocks/ranges and update nodes, we don't need a "numa_srat.c".

> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -104,6 +104,14 @@ nodeid_t setup_node(unsigned pxm)
> > return node;
> >  }
> >
> > +bool __init numa_memblks_available(void)
> > +{
> > +   if (num_node_memblks < NR_NODE_MEMBLKS)
> > +   return true;
> > +
> > +   return false;
> > +}
> 
> Please can you avoid expressing things in more complex than necessary
> ways? Here I don't see why it can't just be

OK, I will simplify it.

> 
> bool __init numa_memblks_available(void)
> {
>   return num_node_memblks < NR_NODE_MEMBLKS;
> }
> 
> > @@ -301,69 +309,35 @@ static bool __init
> is_node_memory_continuous(nodeid_t nid,
> > return true;
> >  }
> >
> > -/* Callback for parsing of the Proximity Domain <-> Memory Area
> mappings */
> > -void __init
> > -acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
> > +/* Neutral NUMA memory affinity init function for ACPI and DT */
> > +int __init numa_update_node_memblks(nodeid_t node,
> > +   paddr_t start, paddr_t size, bool hotplug)
> 
> Indentation.

OK.

> 
> >  {
> > -   paddr_t start, end;
> > -   unsigned pxm;
> > -   nodeid_t node;
> > +   paddr_t end = start + size;
> > int i;
> >
> > -   if (srat_disabled())
> > -   return;
> > -   if (ma->header.length != sizeof(struct acpi_srat_mem_affinity)) {
> > -   bad_srat();
> > -   return;
> > -   }
> > -   if (!(ma->flags & ACPI_SRAT_MEM_ENABLED))
> > -   return;
> > -
> > -   start = ma->base_address;
> > -   end = start + ma->length;
> > -   /* Supplement the heuristics in l1tf_calculations(). */
> > -   l1tf_safe_maddr = max(l1tf_safe_maddr, ROUNDUP(end, PAGE_SIZE));
> > -
> > -   if (num_node_memblks >= NR_NODE_MEMBLKS)
> > -   {
> > -   dprintk(XENLOG_WARNING,
> > -"Too many numa entry, try bigger NR_NODE_MEMBLKS \n");
> > -   bad_srat();
> > -   return;
> > -   }
> > -
> > -   pxm = ma->proximity_domain;
> > -   if (srat_rev < 2)
> > -   pxm &= 0xff;
> > -   node = setup_node(pxm);
> > -   if (node == NUMA_NO_NODE) {
> > -   bad_srat();
> > -   return;
> > -   }
&g

Re: [PATCH 11/37] xen/x86: abstract neutral code from acpi_numa_memory_affinity_init

2022-01-24 Thread Jan Beulich
On 23.09.2021 14:02, Wei Chen wrote:
> There is some code in acpi_numa_memory_affinity_init to update node
> memory range and update node_memblk_range array. This code is not
> ACPI specific, it can be shared by other NUMA implementation, like
> device tree based NUMA implementation.
> 
> So in this patch, we abstract this memory range and blocks relative
> code to a new function. This will avoid exporting static variables
> like node_memblk_range. And the PXM in neutral code print messages
> have been replaced by NODE, as PXM is ACPI specific.
> 
> Signed-off-by: Wei Chen 

SRAT is an ACPI concept, which I assume has no meaning with DT. Hence
any generically usable logic here wants, I think, separating out into
a file which is not SRAT-specific (peeking ahead, specifically not a
file named "numa_srat.c"). This may in turn require some more though
regarding the proper split between the stuff remaining in srat.c and
the stuff becoming kind of library code. In particular this may mean
moving some of the static variables as well, and with them perhaps
some further functions (while I did peek ahead, I didn't look closely
at the later patch doing the actual movement). And it is then hard to
see why the separation needs to happen in two steps - you could move
the generically usable code to a new file right away.

> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -104,6 +104,14 @@ nodeid_t setup_node(unsigned pxm)
>   return node;
>  }
>  
> +bool __init numa_memblks_available(void)
> +{
> + if (num_node_memblks < NR_NODE_MEMBLKS)
> + return true;
> +
> + return false;
> +}

Please can you avoid expressing things in more complex than necessary
ways? Here I don't see why it can't just be

bool __init numa_memblks_available(void)
{
return num_node_memblks < NR_NODE_MEMBLKS;
}

> @@ -301,69 +309,35 @@ static bool __init is_node_memory_continuous(nodeid_t 
> nid,
>   return true;
>  }
>  
> -/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
> -void __init
> -acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
> +/* Neutral NUMA memory affinity init function for ACPI and DT */
> +int __init numa_update_node_memblks(nodeid_t node,
> + paddr_t start, paddr_t size, bool hotplug)

Indentation.

>  {
> - paddr_t start, end;
> - unsigned pxm;
> - nodeid_t node;
> + paddr_t end = start + size;
>   int i;
>  
> - if (srat_disabled())
> - return;
> - if (ma->header.length != sizeof(struct acpi_srat_mem_affinity)) {
> - bad_srat();
> - return;
> - }
> - if (!(ma->flags & ACPI_SRAT_MEM_ENABLED))
> - return;
> -
> - start = ma->base_address;
> - end = start + ma->length;
> - /* Supplement the heuristics in l1tf_calculations(). */
> - l1tf_safe_maddr = max(l1tf_safe_maddr, ROUNDUP(end, PAGE_SIZE));
> -
> - if (num_node_memblks >= NR_NODE_MEMBLKS)
> - {
> - dprintk(XENLOG_WARNING,
> -"Too many numa entry, try bigger NR_NODE_MEMBLKS \n");
> - bad_srat();
> - return;
> - }
> -
> - pxm = ma->proximity_domain;
> - if (srat_rev < 2)
> - pxm &= 0xff;
> - node = setup_node(pxm);
> - if (node == NUMA_NO_NODE) {
> - bad_srat();
> - return;
> - }
> - /* It is fine to add this area to the nodes data it will be used later*/
> + /* It is fine to add this area to the nodes data it will be used later 
> */
>   i = conflicting_memblks(start, end);
>   if (i < 0)
>   /* everything fine */;
>   else if (memblk_nodeid[i] == node) {
> - bool mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
> - !test_bit(i, memblk_hotplug);
> + bool mismatch = !hotplug != !test_bit(i, memblk_hotplug);
>  
> - printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with 
> itself (%"PRIpaddr"-%"PRIpaddr")\n",
> -mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
> + printk("%sSRAT: NODE %u (%"PRIpaddr"-%"PRIpaddr") overlaps with 
> itself (%"PRIpaddr"-%"PRIpaddr")\n",

Nit: Unlike PXM, which is an acronym, "node" doesn't want to be all upper
case.

Also did you check that the node <-> PXM association is known to a reader
of a log at this point in time?

> +mismatch ? KERN_ERR : KERN_WARNING, node, start, end,
>  node_memblk_range[i].start, node_memblk_range[i].end);
>   if (mismatch) {
> - bad_srat();
> - return;
> + return -1;
>   }
>   } else {
>   printk(KERN_ERR
> -"SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with 
> PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
> -pxm, start, end, node_to_pxm(memblk_nodeid[i]),
> +

Re: [PATCH 11/37] xen/x86: abstract neutral code from acpi_numa_memory_affinity_init

2021-09-23 Thread Stefano Stabellini
+x86 maintainers


On Thu, 23 Sep 2021, Wei Chen wrote:
> There is some code in acpi_numa_memory_affinity_init to update node
> memory range and update node_memblk_range array. This code is not
> ACPI specific, it can be shared by other NUMA implementation, like
> device tree based NUMA implementation.
> 
> So in this patch, we abstract this memory range and blocks relative
> code to a new function. This will avoid exporting static variables
> like node_memblk_range. And the PXM in neutral code print messages
> have been replaced by NODE, as PXM is ACPI specific.
> 
> Signed-off-by: Wei Chen 
> ---
>  xen/arch/x86/srat.c| 131 +
>  xen/include/asm-x86/numa.h |   3 +
>  2 files changed, 77 insertions(+), 57 deletions(-)
> 
> diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> index 3334ede7a5..18bc6b19bb 100644
> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -104,6 +104,14 @@ nodeid_t setup_node(unsigned pxm)
>   return node;
>  }
>  
> +bool __init numa_memblks_available(void)
> +{
> + if (num_node_memblks < NR_NODE_MEMBLKS)
> + return true;
> +
> + return false;
> +}
> +
>  int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node)
>  {
>   int i;
> @@ -301,69 +309,35 @@ static bool __init is_node_memory_continuous(nodeid_t 
> nid,
>   return true;
>  }
>  
> -/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
> -void __init
> -acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
> +/* Neutral NUMA memory affinity init function for ACPI and DT */
> +int __init numa_update_node_memblks(nodeid_t node,
> + paddr_t start, paddr_t size, bool hotplug)
>  {
> - paddr_t start, end;
> - unsigned pxm;
> - nodeid_t node;
> + paddr_t end = start + size;
>   int i;
>  
> - if (srat_disabled())
> - return;
> - if (ma->header.length != sizeof(struct acpi_srat_mem_affinity)) {
> - bad_srat();
> - return;
> - }
> - if (!(ma->flags & ACPI_SRAT_MEM_ENABLED))
> - return;
> -
> - start = ma->base_address;
> - end = start + ma->length;
> - /* Supplement the heuristics in l1tf_calculations(). */
> - l1tf_safe_maddr = max(l1tf_safe_maddr, ROUNDUP(end, PAGE_SIZE));
> -
> - if (num_node_memblks >= NR_NODE_MEMBLKS)
> - {
> - dprintk(XENLOG_WARNING,
> -"Too many numa entry, try bigger NR_NODE_MEMBLKS \n");
> - bad_srat();
> - return;
> - }
> -
> - pxm = ma->proximity_domain;
> - if (srat_rev < 2)
> - pxm &= 0xff;
> - node = setup_node(pxm);
> - if (node == NUMA_NO_NODE) {
> - bad_srat();
> - return;
> - }
> - /* It is fine to add this area to the nodes data it will be used later*/
> + /* It is fine to add this area to the nodes data it will be used later 
> */
>   i = conflicting_memblks(start, end);
>   if (i < 0)
>   /* everything fine */;
>   else if (memblk_nodeid[i] == node) {
> - bool mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
> - !test_bit(i, memblk_hotplug);
> + bool mismatch = !hotplug != !test_bit(i, memblk_hotplug);
>  
> - printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with 
> itself (%"PRIpaddr"-%"PRIpaddr")\n",
> -mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
> + printk("%sSRAT: NODE %u (%"PRIpaddr"-%"PRIpaddr") overlaps with 
> itself (%"PRIpaddr"-%"PRIpaddr")\n",
> +mismatch ? KERN_ERR : KERN_WARNING, node, start, end,
>  node_memblk_range[i].start, node_memblk_range[i].end);
>   if (mismatch) {
> - bad_srat();
> - return;
> + return -1;
>   }
>   } else {
>   printk(KERN_ERR
> -"SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with 
> PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
> -pxm, start, end, node_to_pxm(memblk_nodeid[i]),
> +"SRAT: NODE %u (%"PRIpaddr"-%"PRIpaddr") overlaps with 
> NODE %u (%"PRIpaddr"-%"PRIpaddr")\n",
> +node, start, end, memblk_nodeid[i],
>  node_memblk_range[i].start, node_memblk_range[i].end);
> - bad_srat();
> - return;
> + return -1;
>   }
> - if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
> +
> + if (!hotplug) {
>   struct node *nd = [node];
>  
>   if (!node_test_and_set(node, memory_nodes_parsed)) {
> @@ -375,26 +349,69 @@ acpi_numa_memory_affinity_init(const struct 
> acpi_srat_mem_affinity *ma)
>   if (nd->end < end)
>   nd->end = end;
>  
> - /* Check whether this range contains memory for 

[PATCH 11/37] xen/x86: abstract neutral code from acpi_numa_memory_affinity_init

2021-09-23 Thread Wei Chen
There is some code in acpi_numa_memory_affinity_init to update node
memory range and update node_memblk_range array. This code is not
ACPI specific, it can be shared by other NUMA implementation, like
device tree based NUMA implementation.

So in this patch, we abstract this memory range and blocks relative
code to a new function. This will avoid exporting static variables
like node_memblk_range. And the PXM in neutral code print messages
have been replaced by NODE, as PXM is ACPI specific.

Signed-off-by: Wei Chen 
---
 xen/arch/x86/srat.c| 131 +
 xen/include/asm-x86/numa.h |   3 +
 2 files changed, 77 insertions(+), 57 deletions(-)

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 3334ede7a5..18bc6b19bb 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -104,6 +104,14 @@ nodeid_t setup_node(unsigned pxm)
return node;
 }
 
+bool __init numa_memblks_available(void)
+{
+   if (num_node_memblks < NR_NODE_MEMBLKS)
+   return true;
+
+   return false;
+}
+
 int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node)
 {
int i;
@@ -301,69 +309,35 @@ static bool __init is_node_memory_continuous(nodeid_t nid,
return true;
 }
 
-/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
-void __init
-acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
+/* Neutral NUMA memory affinity init function for ACPI and DT */
+int __init numa_update_node_memblks(nodeid_t node,
+   paddr_t start, paddr_t size, bool hotplug)
 {
-   paddr_t start, end;
-   unsigned pxm;
-   nodeid_t node;
+   paddr_t end = start + size;
int i;
 
-   if (srat_disabled())
-   return;
-   if (ma->header.length != sizeof(struct acpi_srat_mem_affinity)) {
-   bad_srat();
-   return;
-   }
-   if (!(ma->flags & ACPI_SRAT_MEM_ENABLED))
-   return;
-
-   start = ma->base_address;
-   end = start + ma->length;
-   /* Supplement the heuristics in l1tf_calculations(). */
-   l1tf_safe_maddr = max(l1tf_safe_maddr, ROUNDUP(end, PAGE_SIZE));
-
-   if (num_node_memblks >= NR_NODE_MEMBLKS)
-   {
-   dprintk(XENLOG_WARNING,
-"Too many numa entry, try bigger NR_NODE_MEMBLKS \n");
-   bad_srat();
-   return;
-   }
-
-   pxm = ma->proximity_domain;
-   if (srat_rev < 2)
-   pxm &= 0xff;
-   node = setup_node(pxm);
-   if (node == NUMA_NO_NODE) {
-   bad_srat();
-   return;
-   }
-   /* It is fine to add this area to the nodes data it will be used later*/
+   /* It is fine to add this area to the nodes data it will be used later 
*/
i = conflicting_memblks(start, end);
if (i < 0)
/* everything fine */;
else if (memblk_nodeid[i] == node) {
-   bool mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
-   !test_bit(i, memblk_hotplug);
+   bool mismatch = !hotplug != !test_bit(i, memblk_hotplug);
 
-   printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with 
itself (%"PRIpaddr"-%"PRIpaddr")\n",
-  mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
+   printk("%sSRAT: NODE %u (%"PRIpaddr"-%"PRIpaddr") overlaps with 
itself (%"PRIpaddr"-%"PRIpaddr")\n",
+  mismatch ? KERN_ERR : KERN_WARNING, node, start, end,
   node_memblk_range[i].start, node_memblk_range[i].end);
if (mismatch) {
-   bad_srat();
-   return;
+   return -1;
}
} else {
printk(KERN_ERR
-  "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with 
PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
-  pxm, start, end, node_to_pxm(memblk_nodeid[i]),
+  "SRAT: NODE %u (%"PRIpaddr"-%"PRIpaddr") overlaps with 
NODE %u (%"PRIpaddr"-%"PRIpaddr")\n",
+  node, start, end, memblk_nodeid[i],
   node_memblk_range[i].start, node_memblk_range[i].end);
-   bad_srat();
-   return;
+   return -1;
}
-   if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
+
+   if (!hotplug) {
struct node *nd = [node];
 
if (!node_test_and_set(node, memory_nodes_parsed)) {
@@ -375,26 +349,69 @@ acpi_numa_memory_affinity_init(const struct 
acpi_srat_mem_affinity *ma)
if (nd->end < end)
nd->end = end;
 
-   /* Check whether this range contains memory for other 
nodes */
-   if (!is_node_memory_continuous(node, nd->start, 
nd->end)) {
-   bad_srat();
-