On 08/04/16 08:35, Dario Faggioli wrote: > On Fri, 2016-04-08 at 06:18 +0200, Juergen Gross wrote: >> On 08/04/16 03:24, Dario Faggioli wrote: >>> >>> In fact, credit2 uses CPU topology to decide how to arrange >>> its internal runqueues. Before this change, only 'one runqueue >>> per socket' was allowed. However, experiments have shown that, >>> for instance, having one runqueue per physical core improves >>> performance, especially in case hyperthreading is available. >>> >>> In general, it makes sense to allow users to pick one runqueue >>> arrangement at boot time, so that: >>> - more experiments can be easily performed to even better >>> assess and improve performance; >>> - one can select the best configuration for his specific >>> use case and/or hardware. >>> >>> This patch enables the above. >>> >>> Note that, for correctly arranging runqueues to be per-core, >>> just checking cpu_to_core() on the host CPUs is not enough. >>> In fact, cores (and hyperthreads) on different sockets, can >>> have the same core (and thread) IDs! We, therefore, need to >>> check whether the full topology of two CPUs matches, for >>> them to be put in the same runqueue. >>> >>> Note also that the default (although not functional) for >>> credit2, since now, has been per-socket runqueue. This patch >>> leaves things that way, to avoid mixing policy and technical >>> changes. >>> >>> Finally, it would be a nice feature to be able to select >>> a particular runqueue arrangement, even when creating a >>> Credit2 cpupool. This is left as future work. >>> >>> Signed-off-by: Dario Faggioli <dario.faggi...@citrix.com> >>> Signed-off-by: Uma Sharma <uma.sharma...@gmail.com> >> >> Some nits below. >> > Thanks for the quick review! > > A revised version of this patch is provided here (both inlined and > attached), and a branch with the remaining to be committed patches of > this series, and with this patch changed as you suggest, is available > at: > > git://xenbits.xen.org/people/dariof/xen.git > rel/sched/credit2/fix-runq-and-haff-v4 > > http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/rel/sched/credit2/fix-runq-and-haff-v4 > > Regards, > Dario > --- > commit 7f491488bbff1cc3af021cd29fca7e0fba321e02 > Author: Dario Faggioli <dario.faggi...@citrix.com> > Date: Tue Sep 29 14:05:09 2015 +0200 > > xen: sched: allow for choosing credit2 runqueues configuration at boot > > In fact, credit2 uses CPU topology to decide how to arrange > its internal runqueues. Before this change, only 'one runqueue > per socket' was allowed. However, experiments have shown that, > for instance, having one runqueue per physical core improves > performance, especially in case hyperthreading is available. > > In general, it makes sense to allow users to pick one runqueue > arrangement at boot time, so that: > - more experiments can be easily performed to even better > assess and improve performance; > - one can select the best configuration for his specific > use case and/or hardware. > > This patch enables the above. > > Note that, for correctly arranging runqueues to be per-core, > just checking cpu_to_core() on the host CPUs is not enough. > In fact, cores (and hyperthreads) on different sockets, can > have the same core (and thread) IDs! We, therefore, need to > check whether the full topology of two CPUs matches, for > them to be put in the same runqueue. > > Note also that the default (although not functional) for > credit2, since now, has been per-socket runqueue. This patch > leaves things that way, to avoid mixing policy and technical > changes. > > Finally, it would be a nice feature to be able to select > a particular runqueue arrangement, even when creating a > Credit2 cpupool. This is left as future work. > > Signed-off-by: Dario Faggioli <dario.faggi...@citrix.com> > Signed-off-by: Uma Sharma <uma.sharma...@gmail.com>
Reviewed-by: George Dunlap <george.dun...@citrix.com> > --- > Cc: George Dunlap <george.dun...@eu.citrix.com> > Cc: Uma Sharma <uma.sharma...@gmail.com> > Cc: Juergen Gross <jgr...@suse.com> > --- > Changes from v3: > * fix type and other issue in comments; > use ARRAY_SIZE when iterating the parameter string array. > > Changes from v2: > * valid strings are now in an array, that we scan during > parameter parsing, as suggested during review. > > Cahnges from v1: > * fix bug in parameter parsing, and start using strcmp() > for that, as requested during review. > > diff --git a/docs/misc/xen-command-line.markdown > b/docs/misc/xen-command-line.markdown > index ca77e3b..0047f94 100644 > --- a/docs/misc/xen-command-line.markdown > +++ b/docs/misc/xen-command-line.markdown > @@ -469,6 +469,25 @@ combination with the `low_crashinfo` command line option. > ### credit2\_load\_window\_shift > > `= <integer>` > > +### credit2\_runqueue > +> `= core | socket | node | all` > + > +> Default: `socket` > + > +Specify how host CPUs are arranged in runqueues. Runqueues are kept > +balanced with respect to the load generated by the vCPUs running on > +them. Smaller runqueues (as in with `core`) means more accurate load > +balancing (for instance, it will deal better with hyperthreading), > +but also more overhead. > + > +Available alternatives, with their meaning, are: > +* `core`: one runqueue per each physical core of the host; > +* `socket`: one runqueue per each physical socket (which often, > + but not always, matches a NUMA node) of the host; > +* `node`: one runqueue per each NUMA node of the host; > +* `all`: just one runqueue shared by all the logical pCPUs of > + the host > + > ### dbgp > > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]` > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index a61a45a..d43f67a 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -81,10 +81,6 @@ > * Credits are "reset" when the next vcpu in the runqueue is less than > * or equal to zero. At that point, everyone's credits are "clipped" > * to a small value, and a fixed credit is added to everyone. > - * > - * The plan is for all cores that share an L2 will share the same > - * runqueue. At the moment, there is one global runqueue for all > - * cores. > */ > > /* > @@ -193,6 +189,63 @@ static int __read_mostly opt_overload_balance_tolerance > = -3; > integer_param("credit2_balance_over", opt_overload_balance_tolerance); > > /* > + * Runqueue organization. > + * > + * The various cpus are to be assigned each one to a runqueue, and we > + * want that to happen basing on topology. At the moment, it is possible > + * to choose to arrange runqueues to be: > + * > + * - per-core: meaning that there will be one runqueue per each physical > + * core of the host. This will happen if the opt_runqueue > + * parameter is set to 'core'; > + * > + * - per-socket: meaning that there will be one runqueue per each physical > + * socket (AKA package, which often, but not always, also > + * matches a NUMA node) of the host; This will happen if > + * the opt_runqueue parameter is set to 'socket'; > + * > + * - per-node: meaning that there will be one runqueue per each physical > + * NUMA node of the host. This will happen if the opt_runqueue > + * parameter is set to 'node'; > + * > + * - global: meaning that there will be only one runqueue to which all the > + * (logical) processors of the host belong. This will happen if > + * the opt_runqueue parameter is set to 'all'. > + * > + * Depending on the value of opt_runqueue, therefore, cpus that are part of > + * either the same physical core, the same physical socket, the same NUMA > + * node, or just all of them, will be put together to form runqueues. > + */ > +#define OPT_RUNQUEUE_CORE 0 > +#define OPT_RUNQUEUE_SOCKET 1 > +#define OPT_RUNQUEUE_NODE 2 > +#define OPT_RUNQUEUE_ALL 3 > +static const char *const opt_runqueue_str[] = { > + [OPT_RUNQUEUE_CORE] = "core", > + [OPT_RUNQUEUE_SOCKET] = "socket", > + [OPT_RUNQUEUE_NODE] = "node", > + [OPT_RUNQUEUE_ALL] = "all" > +}; > +static int __read_mostly opt_runqueue = OPT_RUNQUEUE_SOCKET; > + > +static void parse_credit2_runqueue(const char *s) > +{ > + unsigned int i; > + > + for ( i = 0; i < ARRAY_SIZE(opt_runqueue_str); i++ ) > + { > + if ( !strcmp(s, opt_runqueue_str[i]) ) > + { > + opt_runqueue = i; > + return; > + } > + } > + > + printk("WARNING, unrecognized value of credit2_runqueue option!\n"); > +} > +custom_param("credit2_runqueue", parse_credit2_runqueue); > + > +/* > * Per-runqueue data > */ > struct csched2_runqueue_data { > @@ -1974,6 +2027,22 @@ static void deactivate_runqueue(struct csched2_private > *prv, int rqi) > cpumask_clear_cpu(rqi, &prv->active_queues); > } > > +static inline bool_t same_node(unsigned int cpua, unsigned int cpub) > +{ > + return cpu_to_node(cpua) == cpu_to_node(cpub); > +} > + > +static inline bool_t same_socket(unsigned int cpua, unsigned int cpub) > +{ > + return cpu_to_socket(cpua) == cpu_to_socket(cpub); > +} > + > +static inline bool_t same_core(unsigned int cpua, unsigned int cpub) > +{ > + return same_socket(cpua, cpub) && > + cpu_to_core(cpua) == cpu_to_core(cpub); > +} > + > static unsigned int > cpu_to_runqueue(struct csched2_private *prv, unsigned int cpu) > { > @@ -2006,7 +2075,10 @@ cpu_to_runqueue(struct csched2_private *prv, unsigned > int cpu) > BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID || > cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID); > > - if ( cpu_to_socket(cpumask_first(&rqd->active)) == > cpu_to_socket(cpu) ) > + if ( opt_runqueue == OPT_RUNQUEUE_ALL || > + (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) > || > + (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, > cpu)) || > + (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)) > ) > break; > } > > @@ -2170,6 +2242,7 @@ csched2_init(struct scheduler *ops) > printk(" load_window_shift: %d\n", opt_load_window_shift); > printk(" underload_balance_tolerance: %d\n", > opt_underload_balance_tolerance); > printk(" overload_balance_tolerance: %d\n", > opt_overload_balance_tolerance); > + printk(" runqueues arrangement: %s\n", opt_runqueue_str[opt_runqueue]); > > if ( opt_load_window_shift < LOADAVG_WINDOW_SHIFT_MIN ) > { > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel