Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-02 Thread Dario Faggioli
On Sun, 2015-02-08 at 17:45 -1000, Justin T. Weaver wrote:
> From: "Justin T. Weaver" 
> 
> by making sure that vcpus only run on the pcpu(s) they are allowed to
> run on based on their hard affinity cpu masks.
> 
Ok, here I am reviewing this, at last... sorry for the delay! :-(

> Signed-off-by: Justin T. Weaver 
> ---
> Changes in v2:
>  * Added dynamically allocated cpu masks to avoid putting them on the stack;
>replaced temp masks from v1 throughout
>  * Added helper function for code suggested in v1 review and called it in two
>locations in function choose_cpu
>  * Removed v1 change to comment in the beginning of choose_cpu
>  * Replaced two instances of cpumask_and/cpumask_empty with cpumask_intersects
>  * Removed v1 re-work of code in function migrate; only change in migrate in
>v2 is the assignment of a valid pcpu from the destination run queue to
>vc->processor
>  * In function csched2_vcpu_migrate: removed change from v1 that called
>function migrate even if cur and dest run queues were the same in order
>to get a runq_tickle call; added processor assignment to new_cpu to fix
>the real underlying issue which was the vcpu not getting a call to
>sched_move_irqs
>
Aha, so that was the issue! Glad you figured this out. :-)

>  * Removed the looping added in v1 in function balance_load; may be added back
>later because it would help to have balance_load be more aware of hard
>affinity, but adding it does not affect credit2's current inability to
>respect hard affinity.
>  * Removed coding style fix in function balance_load
>  * Improved comment in function runq_candidate
>
Thanks for putting this changes recap here. :-)

> ---
>  xen/common/sched_credit2.c |  122 
> +++-
>  1 file changed, 108 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index cf53770..de8fb5a 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -194,6 +194,12 @@ int opt_overload_balance_tolerance=-3;
>  integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>  
>  /*
> + * Use this to avoid having too many cpumask_t structs on the stack
> + */
> +static cpumask_t **cpumask = NULL;
>
Not just 'cpumask', please... It's too generic a name. Let's pick up
something that makes it easier to understand what's the purpose of this.

I'm not really sure right now what something like that could be...
Credit has balance_mask, but we're not going as far as introducing
multiple step load balancing here (not with this patch, at least).

Maybe affinity_cpumask, or hard_affinity_cpumask (and we'll rename to
something else when introducing soft affinity, if needed).

> +#define csched2_cpumask cpumask[smp_processor_id()]
> +
I like the idea, but put the right side between parentheses. Of course,
what just said about the name applies here too. :-)

> @@ -268,6 +274,23 @@ struct csched2_dom {
>  uint16_t nr_vcpus;
>  };
>  
> +/*
> + * When a hard affinity change occurs, we may not be able to check some or
> + * all of the other run queues for a valid new processor for the given vcpu.
> + * Return svc's current pcpu if valid, otherwise return a safe pcpu.
> + */
>
Add at least a very quick mention on why this is (could be) necessary.

> +static int get_safe_pcpu(struct csched2_vcpu *svc)
> +{
>
I also don't like the name... __choose_cpu() maybe ?

> +cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity, 
> &svc->rqd->active);
> +if ( unlikely(cpumask_empty(csched2_cpumask)) )
> +cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity,
> +cpupool_online_cpumask(svc->vcpu->domain->cpupool));
>
VCPU2ONLINE(svc->vcpu) would make the line shorter.

Also, I know I'm the one that suggested this form for the code, but
thinking again about it, I'm not sure the first if is worth:

cpumask_and(csched2_cpumask, &svc->rqd->active, VCPU2ONLINE(svc->vcpu));
cpumask_and(csched2_cpumask, csched2_cpumask, svc->vcpu->cpu_hard_affinity);

Oh, and, with either yours or my variant... can csched2_cpumask end up
empty after the two &&-s? That's important or, below, cpumask_any will
return garbage! :-/

From a quick inspection, it does not seem it can, in which case, I'd put
down an ASSERT() about this somewhere. OTOH, if it can, I'd look for
ways of preventing that to happen before getting here... It seems easier
and more correct to do that, rather than trying to figure out what to do
here if the mask is empty. :-O

> +
> +if ( cpumask_test_cpu(svc->vcpu->processor, csched2_cpumask) )
> +return svc->vcpu->processor;
> +else
> +return cpumask_any(csched2_cpumask);
>
And, perhaps, we could put back the likely/unlikely hint here (provided
we think the then branch is actually likely):

if ( likely(svc->vcpu->processor, csched2_cpumask) )
return svc->vcpu->processor;
else
return cpumask_any(csched2_cp

Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-03 Thread Jan Beulich
>>> On 03.03.15 at 04:15,  wrote:
> On Sun, 2015-02-08 at 17:45 -1000, Justin T. Weaver wrote:
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -194,6 +194,12 @@ int opt_overload_balance_tolerance=-3;
>>  integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>>  
>>  /*
>> + * Use this to avoid having too many cpumask_t structs on the stack
>> + */
>> +static cpumask_t **cpumask = NULL;
>>
> Not just 'cpumask', please... It's too generic a name. Let's pick up
> something that makes it easier to understand what's the purpose of this.
> 
> I'm not really sure right now what something like that could be...
> Credit has balance_mask, but we're not going as far as introducing
> multiple step load balancing here (not with this patch, at least).
> 
> Maybe affinity_cpumask, or hard_affinity_cpumask (and we'll rename to
> something else when introducing soft affinity, if needed).
> 
>> +#define csched2_cpumask cpumask[smp_processor_id()]
>> +
> I like the idea, but put the right side between parentheses.

Parentheses? Why? There's no operator with higher precedence
than postfix ones.

> Of course,
> what just said about the name applies here too. :-)

Right. Suitably done, variable and macro can actually share names.

>> +static int get_safe_pcpu(struct csched2_vcpu *svc)
>> +{
>>
> I also don't like the name... __choose_cpu() maybe ?

Why is everyone liking these double underscore prefixed names so
much? They're in conflict with the library name space and hence
should be avoided. Single underscore prefixed names (and the
underscore not followed by an upper case letter) is what the
standard sets aside for file scope (i.e. static) identifiers.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-04 Thread Dario Faggioli
On Tue, 2015-03-03 at 09:12 +, Jan Beulich wrote:
> >>> On 03.03.15 at 04:15,  wrote:
> > On Sun, 2015-02-08 at 17:45 -1000, Justin T. Weaver wrote:

> >> +#define csched2_cpumask cpumask[smp_processor_id()]
> >> +
> > I like the idea, but put the right side between parentheses.
> 
> Parentheses? Why? There's no operator with higher precedence
> than postfix ones.
> 
There certainly isn't. The why is my personal taste, mostly, which does
not count much, I know, so I grep-ed around the sources and found other
similar examples which have parentheses, and so I went ahead and asked.

However, I can certainly live without them. :-)


> >> +static int get_safe_pcpu(struct csched2_vcpu *svc)
> >> +{
> >>
> > I also don't like the name... __choose_cpu() maybe ?
> 
> Why is everyone liking these double underscore prefixed names so
> much? They're in conflict with the library name space and hence
> should be avoided. Single underscore prefixed names (and the
> underscore not followed by an upper case letter) is what the
> standard sets aside for file scope (i.e. static) identifiers.
> 
Well, I'm not sure I know why, but --from a purely aesthetic point of
view-- I actually like __foo more than _foo. However, the main reason
why I'm suggesting it here, is to follow suit, since __foo is what is
always used (in sched_credit2.c, but also in most of other files AFAICT)
for similar functions.

I see the point you're making, and I can live with _choose_cpu(), but
the result would look a bit inconsistent, IMO.

Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-04 Thread Jan Beulich
>>> On 04.03.15 at 12:03,  wrote:
> I see the point you're making, and I can live with _choose_cpu(), but
> the result would look a bit inconsistent, IMO.

I'm tempted to ask for a cleanup patch then.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-04 Thread Dario Faggioli
On Wed, 2015-03-04 at 12:50 +, Jan Beulich wrote:
> >>> On 04.03.15 at 12:03,  wrote:
> > I see the point you're making, and I can live with _choose_cpu(), but
> > the result would look a bit inconsistent, IMO.
> 
> I'm tempted to ask for a cleanup patch then.
> 
Yep, and I'd be fine with that. If Justin is up for this, as a pre-patch
(not part of the affinity series), fine, if not, I can do it.

I've checked and sched_credit.c does both (it mostly does __foo, the
only exception being _csched_cpu_pick). sched_rt.c does __foo too,
should we take the chance and (in separate patches) cleanup all of
these?

Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-04 Thread Jan Beulich
>>> On 04.03.15 at 14:08,  wrote:
> On Wed, 2015-03-04 at 12:50 +, Jan Beulich wrote:
>> >>> On 04.03.15 at 12:03,  wrote:
>> > I see the point you're making, and I can live with _choose_cpu(), but
>> > the result would look a bit inconsistent, IMO.
>> 
>> I'm tempted to ask for a cleanup patch then.
>> 
> Yep, and I'd be fine with that. If Justin is up for this, as a pre-patch
> (not part of the affinity series), fine, if not, I can do it.
> 
> I've checked and sched_credit.c does both (it mostly does __foo, the
> only exception being _csched_cpu_pick). sched_rt.c does __foo too,
> should we take the chance and (in separate patches) cleanup all of
> these?

I'd say yes, but in the end it's all George's call.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-06 Thread George Dunlap
On 03/03/2015 03:15 AM, Dario Faggioli wrote:
> On Sun, 2015-02-08 at 17:45 -1000, Justin T. Weaver wrote:
>> From: "Justin T. Weaver" 
>>
>> by making sure that vcpus only run on the pcpu(s) they are allowed to
>> run on based on their hard affinity cpu masks.
>>
> Ok, here I am reviewing this, at last... sorry for the delay! :-(
> 
>> Signed-off-by: Justin T. Weaver 
>> ---
>> Changes in v2:
>>  * Added dynamically allocated cpu masks to avoid putting them on the stack;
>>replaced temp masks from v1 throughout
>>  * Added helper function for code suggested in v1 review and called it in two
>>locations in function choose_cpu
>>  * Removed v1 change to comment in the beginning of choose_cpu
>>  * Replaced two instances of cpumask_and/cpumask_empty with 
>> cpumask_intersects
>>  * Removed v1 re-work of code in function migrate; only change in migrate in
>>v2 is the assignment of a valid pcpu from the destination run queue to
>>vc->processor
>>  * In function csched2_vcpu_migrate: removed change from v1 that called
>>function migrate even if cur and dest run queues were the same in order
>>to get a runq_tickle call; added processor assignment to new_cpu to fix
>>the real underlying issue which was the vcpu not getting a call to
>>sched_move_irqs
>>
> Aha, so that was the issue! Glad you figured this out. :-)
> 
>>  * Removed the looping added in v1 in function balance_load; may be added 
>> back
>>later because it would help to have balance_load be more aware of hard
>>affinity, but adding it does not affect credit2's current inability to
>>respect hard affinity.
>>  * Removed coding style fix in function balance_load
>>  * Improved comment in function runq_candidate
>>
> Thanks for putting this changes recap here. :-)
> 
>> ---
>>  xen/common/sched_credit2.c |  122 
>> +++-
>>  1 file changed, 108 insertions(+), 14 deletions(-)
>>
>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index cf53770..de8fb5a 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -194,6 +194,12 @@ int opt_overload_balance_tolerance=-3;
>>  integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>>  
>>  /*
>> + * Use this to avoid having too many cpumask_t structs on the stack
>> + */
>> +static cpumask_t **cpumask = NULL;
>>
> Not just 'cpumask', please... It's too generic a name. Let's pick up
> something that makes it easier to understand what's the purpose of this.
> 
> I'm not really sure right now what something like that could be...
> Credit has balance_mask, but we're not going as far as introducing
> multiple step load balancing here (not with this patch, at least).
> 
> Maybe affinity_cpumask, or hard_affinity_cpumask (and we'll rename to
> something else when introducing soft affinity, if needed).

Well I think it's just being used as scratch space, right?  Why not
scratch_mask or something like that?

>> +#define csched2_cpumask cpumask[smp_processor_id()]
>> +
> I like the idea, but put the right side between parentheses. Of course,
> what just said about the name applies here too. :-)
> 
>> @@ -268,6 +274,23 @@ struct csched2_dom {
>>  uint16_t nr_vcpus;
>>  };
>>  
>> +/*
>> + * When a hard affinity change occurs, we may not be able to check some or
>> + * all of the other run queues for a valid new processor for the given vcpu.
>> + * Return svc's current pcpu if valid, otherwise return a safe pcpu.
>> + */
>>
> Add at least a very quick mention on why this is (could be) necessary.
> 
>> +static int get_safe_pcpu(struct csched2_vcpu *svc)
>> +{
>>
> I also don't like the name... __choose_cpu() maybe ?

I'm not 100% happy with the name, but I think "get_safe_pcpu" is more
descriptive than just "__choose_cpu".

> 
>> +cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity, 
>> &svc->rqd->active);
>> +if ( unlikely(cpumask_empty(csched2_cpumask)) )
>> +cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity,
>> +cpupool_online_cpumask(svc->vcpu->domain->cpupool));
>>
> VCPU2ONLINE(svc->vcpu) would make the line shorter.
> 
> Also, I know I'm the one that suggested this form for the code, but
> thinking again about it, I'm not sure the first if is worth:
> 
> cpumask_and(csched2_cpumask, &svc->rqd->active, VCPU2ONLINE(svc->vcpu));
> cpumask_and(csched2_cpumask, csched2_cpumask, 
> svc->vcpu->cpu_hard_affinity);

Actually, I was going to say is there any reason not to start with:

if ( likely(cpumask_test_cpu(svc->vcpu->processor,
svc->vcpu->cpu_hard_affinity))
 return svc->vcpu->processor;

And then go through doing the unions and what-not once we've determined
that the current processor isn't suitable?

> Oh, and, with either yours or my variant... can csched2_cpumask end up
> empty after the two &&-s? That's important or, below, cpumask_any will
> return garbage! :-/
> 
> From a quick inspection, it does not seem it can,

Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-06 Thread Dario Faggioli
On Fri, 2015-03-06 at 15:18 +, George Dunlap wrote:
> On 03/03/2015 03:15 AM, Dario Faggioli wrote:
> > On Sun, 2015-02-08 at 17:45 -1000, Justin T. Weaver wrote:

> >>  /*
> >> + * Use this to avoid having too many cpumask_t structs on the stack
> >> + */
> >> +static cpumask_t **cpumask = NULL;
> >>
> > Not just 'cpumask', please... It's too generic a name. Let's pick up
> > something that makes it easier to understand what's the purpose of this.
> > 
> > I'm not really sure right now what something like that could be...
> > Credit has balance_mask, but we're not going as far as introducing
> > multiple step load balancing here (not with this patch, at least).
> > 
> > Maybe affinity_cpumask, or hard_affinity_cpumask (and we'll rename to
> > something else when introducing soft affinity, if needed).
> 
> Well I think it's just being used as scratch space, right?  Why not
> scratch_mask or something like that?
> 
Fine with me.

> >> +static int get_safe_pcpu(struct csched2_vcpu *svc)
> >> +{
> >>
> > I also don't like the name... __choose_cpu() maybe ?
> 
> I'm not 100% happy with the name, but I think "get_safe_pcpu" is more
> descriptive than just "__choose_cpu".
>
I don't like the "_safe_" part, but that is not a big deal, I certainly
can live with it.

> >> +cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity, 
> >> &svc->rqd->active);
> >> +if ( unlikely(cpumask_empty(csched2_cpumask)) )
> >> +cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity,
> >> +cpupool_online_cpumask(svc->vcpu->domain->cpupool));
> >>
> > VCPU2ONLINE(svc->vcpu) would make the line shorter.
> > 
> > Also, I know I'm the one that suggested this form for the code, but
> > thinking again about it, I'm not sure the first if is worth:
> > 
> > cpumask_and(csched2_cpumask, &svc->rqd->active, VCPU2ONLINE(svc->vcpu));
> > cpumask_and(csched2_cpumask, csched2_cpumask, 
> > svc->vcpu->cpu_hard_affinity);
> 
> Actually, I was going to say is there any reason not to start with:
> 
> if ( likely(cpumask_test_cpu(svc->vcpu->processor,
> svc->vcpu->cpu_hard_affinity))
>  return svc->vcpu->processor;
> 
> And then go through doing the unions and what-not once we've determined
> that the current processor isn't suitable?
> 
I like the idea, and I think it actually makes sense. I'm trying to
think to a scenario where this can be bugous, and where we actually need
to do the filtering against rqd->active and online up front, but I can't
imagine one.

I think the possible source of trouble is affinity being changed, but
even then, if we were on vcpu->processor, and that still is in our hard
affinity mask, it ought to be right to stay there (and hence George's
suggestion should be fine)... Justin, what do you think?

> > Oh, and, with either yours or my variant... can csched2_cpumask end up
> > empty after the two &&-s? That's important or, below, cpumask_any will
> > return garbage! :-/
> > 
> > From a quick inspection, it does not seem it can, in which case, I'd put
> > down an ASSERT() about this somewhere. OTOH, if it can, I'd look for
> > ways of preventing that to happen before getting here... It seems easier
> > and more correct to do that, rather than trying to figure out what to do
> > here if the mask is empty. :-O
> 
> Yes, we need to go through the code and make sure that we understand
> what our assumptions are, so that people can't crash Xen by doing
> irrational things like making the hard affinity not intersect with the
> cpupool cpus.
> 
True.

Something like that can be figured out and either forbidden or, in
general, addressed in other places, rather than requiring us to care
here. In fact, this seems to me to be what's happening already (see
below).

> If we make this an "unusual slow path", I don't see any reason to make
> the code a bit more resilient by handling cases we don't expect to
> happen.
>
Well, again, true, in a general fashion. However --perhaps because I'm
not sure I'm getting what "unusual slow path" means in this context-- if
we say that we support hard affinity, I see the fact that vCPUs can be
found on pCPUs outside of their hard affinity as a very bad thing.

For instance, people may rely on this for various reasons, e.g., to
achieve a high level of isolation between domains, by partitioning the
hard affinity of their vCPUs accordingly, and this isolation not being
enforced can screw things arbitrarily bad for them, I think.

Whether this is worth exploding it probably debatable (and workload and
use case dependent), but it definitely falls in the "I want to catch
this during testing" (so ==> ASSERT) category, IMO.

> It would be good to try to make sure we don't allow a situation
> where union(hard_affinity, domain->cpupool) is empty, but I'd rather the
> result be something not too bizarre (e.g., picking a random cpu in the
> cpupool) than having the host crash or something.
> 
Yeah, I like robustness... but only up to a certain extent, I think, or
you end 

Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-09 Thread Justin Weaver
Thanks to all for the comments! I've implemented most of the changes
recommended here in the v2 review. I should have a new patch set ready
this week (with an updated soft affinity patch as well). I'll just
address the comments where you asked for my feedback...

>>  rqd_avgload = rqd->b_avgload - svc->avgload;
>>  }
>>  else if ( spin_trylock(&rqd->lock) )
>>  {
>> +if ( !cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
>> +{
>> +spin_unlock(&rqd->lock);
>> +continue;
>> +}
>>  rqd_avgload = rqd->b_avgload;
>>  spin_unlock(&rqd->lock);
>>  }
>>
> Something like this would be easier to read, IMO:
>
>if ( rqd == svc->rqd )
>{
>if ( cpumask_intersects(vc->cpu_hard_affinity,
>&rqd->active) )
>rqd_avgload = rqd->b_avgload - svc->avgload;
>}
>else if ( spin_trylock(&rqd->lock) )
>{
>if ( cpumask_intersects(vc->cpu_hard_affinity,
>&rqd->active) )
>rqd_avgload = rqd->b_avgload;
>
>spin_unlock(rqd->lock);
>}
>else
>continue;
>
> Semantic should be the same, provided we initialize rqd_avgload to
> MAX_LOAD, I would say.
>
> In fact, without the two 'continue;' statements you were introducing,
> we'll execute the if() that follows this block even if there was no
> intersection with the hard affinity mask but, in that case, no chance we
> have updated rqd_avgload, so it really should behave the same, what do
> you think?

I like it. I implemented your changes along with initializing
rqd_avgload to MAX_LOAD. I think it's definitely easier to read
without the continue statements and without multiple spin_unlock
statements.

>> @@ -1396,6 +1459,15 @@ csched2_vcpu_migrate(
>>
>>  /* Check if new_cpu is valid */
>>  BUG_ON(!cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
>> +BUG_ON(!cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
>>
> Not sure.. An ASSERT() seems more suitable than a BUG_ON here... What's
> the reasoning behind this?

I was just trying to identify a state that this function should never
be in. After reading through the discussion in the review I understand
that ASSERT is more appropriate. It's changed for v3.

> Oh, and this is what was causing you troubles, in case source and
> destination runqueue were the same... Help me understand, which call to
> sched_move_irqs() in schedule.c were we missing? I'd say it is the one
> in vcpu_migrate(), but that does not seem to care about vc->processor
> (much rater about new_cpu)... what am I missing?
>
> However, if they are not the same, the call to migrate() will override
> this right away, won't it? What I mean to say is, if this is required
> only in case trqd and svc->rqd are equal, can we tweak this part of
> csched2_vcpu_migrate() as follows?
>
> if ( trqd != svc->rqd )
> migrate(ops, svc, trqd, NOW());
> else
> vc->processor = new_cpu;

You are right; it does not have anything to do with sched_move_irqs()
not being called (like you said it doesn't care about vc->processor).
You are never going to believe any of my explanations now! :) Well
third time's a charm hopefully... Without the processor assignment
here the vcpu might go on being assigned to a processor it no longer
is allowed to run on. In that case, function runq_candidate may only
get called for the vcpu's old processor, and runq_candidate will no
longer let a vcpu run on a processor that it's not allowed to run on
(because of the hard affinity check first introduced in v1 of this
patch). So in that condition the vcpu never gets to run. That's still
somewhat of a vague explanation, but I have observed that that is what
happens. Anyway I think everyone agrees that the processor assignment
needs to be here, and I did move it to an else block for v3 like you
recommended above.

>> >> +static int get_safe_pcpu(struct csched2_vcpu *svc)
>> >> +{
>> >>
>> > I also don't like the name... __choose_cpu() maybe ?
>>
>> I'm not 100% happy with the name, but I think "get_safe_pcpu" is more
>> descriptive than just "__choose_cpu".
>>
> I don't like the "_safe_" part, but that is not a big deal, I certainly
> can live with it.

I changed it to _choose_valid_pcpu ... discuss! (Also, I can send out
a pre-patch to change the double underscores in the whole file)

>> >> +cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity, 
>> >> &svc->rqd->active);
>> >> +if ( unlikely(cpumask_empty(csched2_cpumask)) )
>> >> +cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity,
>> >> +cpupool_online_cpumask(svc->vcpu->domain->cpupool));
>> >>
>> > VCPU2ONLINE(svc->vcpu) would make the line shorter.

I agree. VCPU2ONLINE is defined in 

Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-09 Thread George Dunlap
On 03/09/2015 07:11 AM, Justin Weaver wrote:
> +static int get_safe_pcpu(struct csched2_vcpu *svc)
> +{
>
 I also don't like the name... __choose_cpu() maybe ?
>>>
>>> I'm not 100% happy with the name, but I think "get_safe_pcpu" is more
>>> descriptive than just "__choose_cpu".
>>>
>> I don't like the "_safe_" part, but that is not a big deal, I certainly
>> can live with it.
> 
> I changed it to _choose_valid_pcpu ... discuss! (Also, I can send out
> a pre-patch to change the double underscores in the whole file)

What about "get_fallback_cpu()"?  That's basically what we want when
this function is called -- the runqueue we wanted wasn't available, so
we just want somewhere else reasonable to put it.

 Oh, and, with either yours or my variant... can csched2_cpumask end up
 empty after the two &&-s? That's important or, below, cpumask_any will
 return garbage! :-/

 From a quick inspection, it does not seem it can, in which case, I'd put
 down an ASSERT() about this somewhere. OTOH, if it can, I'd look for
 ways of preventing that to happen before getting here... It seems easier
 and more correct to do that, rather than trying to figure out what to do
 here if the mask is empty. :-O
>>>
>>> Yes, we need to go through the code and make sure that we understand
>>> what our assumptions are, so that people can't crash Xen by doing
>>> irrational things like making the hard affinity not intersect with the
>>> cpupool cpus.
>>>
>> True.
>>
>> Something like that can be figured out and either forbidden or, in
>> general, addressed in other places, rather than requiring us to care
>> here. In fact, this seems to me to be what's happening already (see
>> below).
> 
> I don't think there's any way the mask can be empty after the
> cpumask_and with the cpu_hard_affinity and the VCPU2ONLINE. In
> schedule.c:vcpu_set_hard_affinity, if the intersection of the new hard
> mask and VCPU2ONLINE is empty, the function returns -EINVAL. I took
> into account all the discussion here and modified the function for v3.

What about this:

* Create a cpu pool with cpus 0 and 1.  online_cpus is now [0,1].
* Set a hard affinity of [1].  This succeeds.
* Move cpu 1 to a different cpupool.

After a quick look I don't see anything that updates the hard affinity
when cpus are removed from pools.

And, even if it does *now*, it's possible that something might be
changed in the future that would forget it; this ASSERT() isn't exactly
next to that code.

It seems like handling the case where hard_affinity and cpus_online
don't overlap would be fairly simple; and if there was ever a bug such
that this was possible, handling the case would change that bug from
"hit an ASSERT" (or "have undefined behavior") to "have well-defined
behavior".  So it seems to me like handling that case makes the software
more robust for little cost.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-09 Thread Dario Faggioli
On Mon, 2015-03-09 at 11:45 +, George Dunlap wrote:
> On 03/09/2015 07:11 AM, Justin Weaver wrote:

> > I don't think there's any way the mask can be empty after the
> > cpumask_and with the cpu_hard_affinity and the VCPU2ONLINE. In
> > schedule.c:vcpu_set_hard_affinity, if the intersection of the new hard
> > mask and VCPU2ONLINE is empty, the function returns -EINVAL. I took
> > into account all the discussion here and modified the function for v3.
> 
> What about this:
> 
> * Create a cpu pool with cpus 0 and 1.  online_cpus is now [0,1].
> * Set a hard affinity of [1].  This succeeds.
>
So, I decided to try this:

# xl cpupool-list -c
Name   CPU list
Pool-0  0,1

# xl list -c
NameID   Mem VCPUs  State   Time(s) 
Cpupool
Domain-0 0   507 4 r-  19.9 
  Pool0
test-pv  1  2000 8 -b  19.2 
  Pool0

# xl vcpu-list test-pv
NameID  VCPU   CPU State   Time(s) Affinity 
(Hard / Soft)
test-pv  1 01   -b-   5.5  1 / 1
test-pv  1 11   -b-   2.4  1 / 1
test-pv  1 21   -b-   2.9  1 / 1
test-pv  1 31   -b-   1.9  1 / 1
test-pv  1 41   -b-   1.0  1 / 1
test-pv  1 51   -b-   2.0  1 / 1
test-pv  1 61   -b-   1.2  1 / 1
test-pv  1 71   -b-   2.4  1 / 1

# xl cpupool-cpu-remove Pool0 1
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Assertion '!cpumask_empty(dom_cpumask)' failed at domain.c:460
(XEN) 

i.e., surprise surprise, there's already an assert guarding this... and
it triggers!! :-(

It seems to have been added in v4 of the per-vcpu soft affinity work:
http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=commit;h=4d4e999637f38e0bbd4318ad8e0c92fd1e580430

So, we must have had this discussion before! :-)

I've done a bit of archeology and the ASSERT() in
domain_update_node_affinity() was introduced by me (upon request) while
working on implementing per-vcpu soft affinity. Then, cs 93be8285 is
what causes the assert to trigger. Time seems not to match, but that's
because soft affinity was put on hold when it was decided not to include
it in 4.4, and I probably forgot to retest with a use case similar to
the above when resubmitting it! :-(

A patch to fix things is attached to this email, for convenience (I'll
submit it properly, with changelog and everything, right away).

After seeing this, I'm even more convinced that
!empty(online && hard_affinity) is really something we want and, as we
ASSERT() it in domain.c, the same should be done in sched_credit2.c.
OTOH, if we don't want to ASSERT() in the specific scheduler code, then
I'd remove the one in domain.c too (and, perhaps, just use online as a
fallback), or things would look inconsistet.

Of course, this can also be seen as a proof of George's point, that this
is something not obvious and not easy to catch. Still, I think that if
we say that we support hard vcpu affinity (a.k.a. pinning), we should
not allow vcpus outside their hard affinity, and the code must reflect
this.

> * Move cpu 1 to a different cpupool.
>
> After a quick look I don't see anything that updates the hard affinity
> when cpus are removed from pools.
> 
cpupool_unassign_cpu() calls cpupool_unassign_cpu_helper()that calls
cpu_disable_scheduler() which, if it finds that empty(online &&
hard_affinity)==true, it resets hard_affinity to "all".

Note that this is reported in the log, as a confirmation that this is
really something exceptional:

  (XEN) Breaking affinity for d1v0
  (XEN) Breaking affinity for d1v1
  (XEN) Breaking affinity for d1v2
  (XEN) Breaking affinity for d1v3
  (XEN) Breaking affinity for d1v4
  (XEN) Breaking affinity for d1v5
  (XEN) Breaking affinity for d1v6
  (XEN) Breaking affinity for d1v7

And also note that, as a consequence of fiddling with cpupools, the
affinity has been altered by Xen (i.e., vcpus still always run within
their hard affinity masks):


# xl vcpu-pin 1 all 1 1
# xl cpupool-cpu-remove Pool-0 1
# xl cpupool-list -c
Name   CPU list
Pool-0 0,2,3,4,5,6,7,8,9,10,11,12,13,14,15

# xl vcpu-list test-pv
NameID  VCPU   CPU State   Time(s) Affinity 
(Hard / Soft)
test-pv  1 02   -b-   6.1  all / 1
test-pv  1 16   -b-   1.6  all / 1
test-pv  1 24   -b-   1.6  all / 1
test-pv  1 35   -b-   3.1  all / 1
test-pv  1

Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-10 Thread Dario Faggioli
On Sun, 2015-03-08 at 21:11 -1000, Justin Weaver wrote:
> Thanks to all for the comments! I've implemented most of the changes
> recommended here in the v2 review. I should have a new patch set ready
> this week (with an updated soft affinity patch as well). 
>
Great! :-)

> > Oh, and this is what was causing you troubles, in case source and
> > destination runqueue were the same... Help me understand, which call to
> > sched_move_irqs() in schedule.c were we missing? I'd say it is the one
> > in vcpu_migrate(), but that does not seem to care about vc->processor
> > (much rater about new_cpu)... what am I missing?
> >
> > However, if they are not the same, the call to migrate() will override
> > this right away, won't it? What I mean to say is, if this is required
> > only in case trqd and svc->rqd are equal, can we tweak this part of
> > csched2_vcpu_migrate() as follows?
> >
> > if ( trqd != svc->rqd )
> > migrate(ops, svc, trqd, NOW());
> > else
> > vc->processor = new_cpu;
> 
> You are right; it does not have anything to do with sched_move_irqs()
> not being called (like you said it doesn't care about vc->processor).
>
Ah, ok. :-)

> You are never going to believe any of my explanations now! :) 
>
EhEh... If I'd do that to people who fail to understand how things works
at the first or second attempt, I would stop believing myself! :-D :-D

> Without the processor assignment
> here the vcpu might go on being assigned to a processor it no longer
> is allowed to run on. 
>
Ok.

> In that case, function runq_candidate may only
> get called for the vcpu's old processor, and runq_candidate will no
> longer let a vcpu run on a processor that it's not allowed to run on
> (because of the hard affinity check first introduced in v1 of this
> patch). 
>
It mostly makes sense. Out of the top of my head, it still looks like
there should be a pCPU that, when scheduling, would pick it up... I need
to think more about this...

> So in that condition the vcpu never gets to run. That's still
> somewhat of a vague explanation, but I have observed that that is what
> happens. 
>
Do you mean you _actually_ saw this, with some debugging printk-s, or
tracing, or something like this?

> Anyway I think everyone agrees that the processor assignment
> needs to be here, and I did move it to an else block for v3 like you
> recommended above.
>
Yes, that's the point, the assignement above is correct, IMO, so it
should be there, whether or not it is the cause of the issue :-)

> > I don't like the "_safe_" part, but that is not a big deal, I certainly
> > can live with it.
> 
> I changed it to _choose_valid_pcpu ... discuss! 
>
I'm fine with what Goerge proposes in his email.

> (Also, I can send out
> a pre-patch to change the double underscores in the whole file)
> 
For static symbols, yes. As Jan says, it's George's call. If you're up
for it, I think it would be an improvement.

> >> > VCPU2ONLINE(svc->vcpu) would make the line shorter.
> 
> I agree. VCPU2ONLINE is defined in schedule.c ... do you want me to
> move it to a common header along with the other parts we discussed
> (__vcpu_has_soft_affinity, etc.)? 
>
Either that or, if you only need it once, just open code it.

> Okay to move them to sched-if.h, or
> should I put them in a new header file?
> 
sched-if.h is ok for the step-wise load balancing macros, and it would
be the proper place where to move this too, if we go for moving it.

Thanks and Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-13 Thread Dario Faggioli
Hey, I came across this patch again for other reasons, and I realized
I've a few more (minor) comments:

On Sun, 2015-02-08 at 17:45 -1000, Justin T. Weaver wrote:
> From: "Justin T. Weaver" 

> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index cf53770..de8fb5a 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c

> @@ -2127,16 +2212,25 @@ csched2_init(struct scheduler *ops)
>  
>  prv->load_window_shift = opt_load_window_shift;
>  
> +cpumask = xzalloc_bytes(nr_cpu_ids * sizeof(cpumask_t *));
> +if ( cpumask == NULL )
> +return -ENOMEM;
> +
This can probably be xzalloc_array(), or even xmalloc_array(), if we
don't care zeroing all elements (see below about why I actually don't
think we need).

>  return 0;
>  }
>  
>  static void
>  csched2_deinit(const struct scheduler *ops)
>  {
> +int i;
>  struct csched2_private *prv;
>  
>  prv = CSCHED2_PRIV(ops);
>  xfree(prv);
> +
> +for ( i = 0; i < nr_cpu_ids; i++ )
> +free_cpumask_var(cpumask[i]);
> +xfree(cpumask);
>
Do we need the loop? I mean, all the pcpus go through
csched2_alloc_pdata(), when being activated under this scheduler, which
allocates their scratch masks. They then go through
csched2_free_pdata(), when deactivated from this scheduler, which frees
their scratch masks.

I would then expect that, when we get to here, all the elemtns of the
scratch mask array that were allocated, have also been freed, and hence
only freeing the array is necessary.

Am I missing something?

Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] sched: credit2: respect per-vcpu hard affinity

2015-03-13 Thread Justin Weaver
Dario,

>> Without the processor assignment
>> here the vcpu might go on being assigned to a processor it no longer
>> is allowed to run on.
>>
> Ok.
>
>> In that case, function runq_candidate may only
>> get called for the vcpu's old processor, and runq_candidate will no
>> longer let a vcpu run on a processor that it's not allowed to run on
>> (because of the hard affinity check first introduced in v1 of this
>> patch).
>>
> It mostly makes sense. Out of the top of my head, it still looks like
> there should be a pCPU that, when scheduling, would pick it up... I need
> to think more about this...
>
>> So in that condition the vcpu never gets to run. That's still
>> somewhat of a vague explanation, but I have observed that that is what
>> happens.
>>
> Do you mean you _actually_ saw this, with some debugging printk-s, or
> tracing, or something like this?

Yes, I saw the above behavior using printk-s. I commented out the line
that does the processor assignment if the run queues are the same. I
put some printks in key spots. For the test, I had only one guest vcpu
running; it had hard affinity with all cpus and I used xl vcpu-list to
see that it was on cpu 15 (two runqs 0-7,8-15). I then pinned it to 9
and it became unresponsive (vcpu-list showed ---), and printk output
only showed over and over and over again output from runq_candidate
saying something like 'vcpu can't run on cpu 15, not in hard affinity
mask'. A call to runq_candidate for cpu 9 never came up. Pinning it
back to 15 (or all) started it running again.


>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index cf53770..de8fb5a 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>
>> @@ -2127,16 +2212,25 @@ csched2_init(struct scheduler *ops)
>>
>>  prv->load_window_shift = opt_load_window_shift;
>>
>> +cpumask = xzalloc_bytes(nr_cpu_ids * sizeof(cpumask_t *));
>> +if ( cpumask == NULL )
>> +return -ENOMEM;
>> +
> This can probably be xzalloc_array(), or even xmalloc_array(), if we
> don't care zeroing all elements (see below about why I actually don't
> think we need).

OK, I'll make this change. Yes, they don't have to be zeroed based on
what you point out below.

>>  return 0;
>>  }
>>
>>  static void
>>  csched2_deinit(const struct scheduler *ops)
>>  {
>> +int i;
>>  struct csched2_private *prv;
>>
>>  prv = CSCHED2_PRIV(ops);
>>  xfree(prv);
>> +
>> +for ( i = 0; i < nr_cpu_ids; i++ )
>> +free_cpumask_var(cpumask[i]);
>> +xfree(cpumask);
>>
> Do we need the loop? I mean, all the pcpus go through
> csched2_alloc_pdata(), when being activated under this scheduler, which
> allocates their scratch masks. They then go through
> csched2_free_pdata(), when deactivated from this scheduler, which frees
> their scratch masks.
>
> I would then expect that, when we get to here, all the elemtns of the
> scratch mask array that were allocated, have also been freed, and hence
> only freeing the array is necessary.
>
> Am I missing something?

I agree; what we allocate in csched2_init is what should be
deallocated in csched2_deinit. They should match. I'll make the
change.

Thanks,
Justin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel