>>> On 14.04.17 at 17:37, <boris.ostrov...@oracle.com> wrote: > --- a/xen/common/page_alloc.c > +++ b/xen/common/page_alloc.c > @@ -1035,16 +1035,82 @@ merge_and_free_buddy(struct page_info *pg, unsigned > int node, > return pg; > } > > -static void scrub_free_pages(unsigned int node) > +static nodemask_t node_scrubbing; > + > +static unsigned int node_to_scrub(bool get_node) > +{ > + nodeid_t node = cpu_to_node(smp_processor_id()), local_node; > + nodeid_t closest = NUMA_NO_NODE; > + u8 dist, shortest = 0xff; > + > + if ( node == NUMA_NO_NODE ) > + node = 0; > + > + if ( node_need_scrub[node] && > + (!get_node || !node_test_and_set(node, node_scrubbing)) ) > + return node; > + > + /* > + * See if there are memory-only nodes that need scrubbing and choose > + * the closest one. > + */ > + local_node = node; > + while ( 1 ) > + { > + do { > + node = cycle_node(node, node_online_map); > + } while ( !cpumask_empty(&node_to_cpumask(node)) && > + (node != local_node) ); > + > + if ( node == local_node ) > + break; > + > + if ( node_need_scrub[node] ) > + { > + if ( !get_node ) > + return node;
I think the function parameter name is not / no longer suitable. The caller wants to get _some_ node in either case. The difference is whether it wants to just know whether there's _any_ needing scrub work done, or whether it wants _the one_ to actually scrub on. So how about "get_any" or "get_any_node" or just "any"? > + if ( !node_test_and_set(node, node_scrubbing) ) > + { > + dist = __node_distance(local_node, node); > + if ( (dist < shortest) || (dist == NUMA_NO_DISTANCE) ) > + { > + /* Release previous node. */ > + if ( closest != NUMA_NO_NODE ) > + node_clear(closest, node_scrubbing); > + shortest = dist; > + closest = node; > + } > + else > + node_clear(node, node_scrubbing); > + } Wouldn't it be better to check distance before setting the bit in node_scrubbing, avoiding the possible need to clear it again later (potentially misguiding other CPUs)? And why would NUMA_NO_DISTANCE lead to a switch of nodes? I can see that being needed when closest == NUMA_NO_NODE, but once you've picked one I think you should switch only when you've found another that's truly closer. > + } > + } > + > + return closest; > +} > + > +bool scrub_free_pages(void) > { > struct page_info *pg; > unsigned int zone, order; > unsigned long i; > + unsigned int cpu = smp_processor_id(); > + bool preempt = false; > + nodeid_t node; > > - ASSERT(spin_is_locked(&heap_lock)); > + /* > + * Don't scrub while dom0 is being constructed since we may > + * fail trying to call map_domain_page() from scrub_one_page(). > + */ > + if ( system_state < SYS_STATE_active ) > + return false; I assume that's because of the mapcache vcpu override? That's x86 specific though, so the restriction here ought to be arch specific. Even better would be to find a way to avoid this restriction altogether, as on bigger systems only one CPU is actually busy while building Dom0, so all others could be happily scrubbing. Could that override become a per-CPU one perhaps? Otoh there's not much to scrub yet until Dom0 had all its memory allocated, and we know which pages truly remain free (wanting what is currently the boot time scrubbing done on them). But that point in time may still be earlier than when we switch to SYS_STATE_active. > @@ -1065,16 +1131,29 @@ static void scrub_free_pages(unsigned int node) > pg[i].count_info &= ~PGC_need_scrub; > node_need_scrub[node]--; > } > + if ( softirq_pending(cpu) ) > + { > + preempt = true; > + break; > + } Isn't this a little too eager, especially if you didn't have to scrub the page on this iteration? > @@ -1141,9 +1220,6 @@ static void free_heap_pages( > if ( tainted ) > reserve_offlined_page(pg); > > - if ( need_scrub ) > - scrub_free_pages(node); I'd expect this eliminates the need for the need_scrub variable. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel