On 03/03/2015 03:15 AM, Dario Faggioli wrote:
> On Sun, 2015-02-08 at 17:45 -1000, Justin T. Weaver wrote:
>> From: "Justin T. Weaver" <jtwea...@hawaii.edu>
>>
>> by making sure that vcpus only run on the pcpu(s) they are allowed to
>> run on based on their hard affinity cpu masks.
>>
> Ok, here I am reviewing this, at last... sorry for the delay! :-(
> 
>> Signed-off-by: Justin T. Weaver <jtwea...@hawaii.edu>
>> ---
>> Changes in v2:
>>  * Added dynamically allocated cpu masks to avoid putting them on the stack;
>>    replaced temp masks from v1 throughout
>>  * Added helper function for code suggested in v1 review and called it in two
>>    locations in function choose_cpu
>>  * Removed v1 change to comment in the beginning of choose_cpu
>>  * Replaced two instances of cpumask_and/cpumask_empty with 
>> cpumask_intersects
>>  * Removed v1 re-work of code in function migrate; only change in migrate in
>>    v2 is the assignment of a valid pcpu from the destination run queue to
>>    vc->processor
>>  * In function csched2_vcpu_migrate: removed change from v1 that called
>>    function migrate even if cur and dest run queues were the same in order
>>    to get a runq_tickle call; added processor assignment to new_cpu to fix
>>    the real underlying issue which was the vcpu not getting a call to
>>    sched_move_irqs
>>
> Aha, so that was the issue! Glad you figured this out. :-)
> 
>>  * Removed the looping added in v1 in function balance_load; may be added 
>> back
>>    later because it would help to have balance_load be more aware of hard
>>    affinity, but adding it does not affect credit2's current inability to
>>    respect hard affinity.
>>  * Removed coding style fix in function balance_load
>>  * Improved comment in function runq_candidate
>>
> Thanks for putting this changes recap here. :-)
> 
>> ---
>>  xen/common/sched_credit2.c |  122 
>> +++++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 108 insertions(+), 14 deletions(-)
>>
>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index cf53770..de8fb5a 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -194,6 +194,12 @@ int opt_overload_balance_tolerance=-3;
>>  integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>>  
>>  /*
>> + * Use this to avoid having too many cpumask_t structs on the stack
>> + */
>> +static cpumask_t **cpumask = NULL;
>>
> Not just 'cpumask', please... It's too generic a name. Let's pick up
> something that makes it easier to understand what's the purpose of this.
> 
> I'm not really sure right now what something like that could be...
> Credit has balance_mask, but we're not going as far as introducing
> multiple step load balancing here (not with this patch, at least).
> 
> Maybe affinity_cpumask, or hard_affinity_cpumask (and we'll rename to
> something else when introducing soft affinity, if needed).

Well I think it's just being used as scratch space, right?  Why not
scratch_mask or something like that?

>> +#define csched2_cpumask cpumask[smp_processor_id()]
>> +
> I like the idea, but put the right side between parentheses. Of course,
> what just said about the name applies here too. :-)
> 
>> @@ -268,6 +274,23 @@ struct csched2_dom {
>>      uint16_t nr_vcpus;
>>  };
>>  
>> +/*
>> + * When a hard affinity change occurs, we may not be able to check some or
>> + * all of the other run queues for a valid new processor for the given vcpu.
>> + * Return svc's current pcpu if valid, otherwise return a safe pcpu.
>> + */
>>
> Add at least a very quick mention on why this is (could be) necessary.
> 
>> +static int get_safe_pcpu(struct csched2_vcpu *svc)
>> +{
>>
> I also don't like the name... __choose_cpu() maybe ?

I'm not 100% happy with the name, but I think "get_safe_pcpu" is more
descriptive than just "__choose_cpu".

> 
>> +    cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity, 
>> &svc->rqd->active);
>> +    if ( unlikely(cpumask_empty(csched2_cpumask)) )
>> +        cpumask_and(csched2_cpumask, svc->vcpu->cpu_hard_affinity,
>> +            cpupool_online_cpumask(svc->vcpu->domain->cpupool));
>>
> VCPU2ONLINE(svc->vcpu) would make the line shorter.
> 
> Also, I know I'm the one that suggested this form for the code, but
> thinking again about it, I'm not sure the first if is worth:
> 
>     cpumask_and(csched2_cpumask, &svc->rqd->active, VCPU2ONLINE(svc->vcpu));
>     cpumask_and(csched2_cpumask, csched2_cpumask, 
> svc->vcpu->cpu_hard_affinity);

Actually, I was going to say is there any reason not to start with:

if ( likely(cpumask_test_cpu(svc->vcpu->processor,
svc->vcpu->cpu_hard_affinity))
 return svc->vcpu->processor;

And then go through doing the unions and what-not once we've determined
that the current processor isn't suitable?

> Oh, and, with either yours or my variant... can csched2_cpumask end up
> empty after the two &&-s? That's important or, below, cpumask_any will
> return garbage! :-/
> 
> From a quick inspection, it does not seem it can, in which case, I'd put
> down an ASSERT() about this somewhere. OTOH, if it can, I'd look for
> ways of preventing that to happen before getting here... It seems easier
> and more correct to do that, rather than trying to figure out what to do
> here if the mask is empty. :-O

Yes, we need to go through the code and make sure that we understand
what our assumptions are, so that people can't crash Xen by doing
irrational things like making the hard affinity not intersect with the
cpupool cpus.

If we make this an "unusual slow path", I don't see any reason to make
the code a bit more resilient by handling cases we don't expect to
happen.  It would be good to try to make sure we don't allow a situation
where union(hard_affinity, domain->cpupool) is empty, but I'd rather the
result be something not too bizarre (e.g., picking a random cpu in the
cpupool) than having the host crash or something.

>>      }
>>  
>> -    /* FIXME: Pay attention to cpu affinity */                              
>>                                                         
>> -

Nice to see those disappear. :-)

>> @@ -1396,6 +1459,15 @@ csched2_vcpu_migrate(
>>  
>>      /* Check if new_cpu is valid */
>>      BUG_ON(!cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
>> +    BUG_ON(!cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
>>
> Not sure.. An ASSERT() seems more suitable than a BUG_ON here... What's
> the reasoning behind this?

Yes, BUG_ON() should be used where something is almost certainly going
to cause a crash anyway; so you want to cause it somewhere you'll get
the most information about what caused the crash.

ASSERT() should be used for "I believe this should always be the case,
and if it's not, I want to find out in testing".

>> +    /*
>> +     * Assign new_cpu to vc->processor here to get a call to sched_move_irqs
>> +     * in schedule.c in case there was a hard affinity change within the 
>> same
>> +     * run queue. vc will not be able to run in certain situations without
>> +     * this call.
>> +     */
>> +    vc->processor = new_cpu;
>>
> Oh, and this is what was causing you troubles, in case source and
> destination runqueue were the same... Help me understand, which call to
> sched_move_irqs() in schedule.c were we missing? I'd say it is the one
> in vcpu_migrate(), but that does not seem to care about vc->processor
> (much rater about new_cpu)... what am I missing?
> 
> However, if they are not the same, the call to migrate() will override
> this right away, won't it? What I mean to say is, if this is required
> only in case trqd and svc->rqd are equal, can we tweak this part of
> csched2_vcpu_migrate() as follows?
> 
>     if ( trqd != svc->rqd )
>         migrate(ops, svc, trqd, NOW());
>     else
>         vc->processor = new_cpu;

It does seem a bit weird to me looking at it now that when the generic
scheduler does vcpu_migrate(), we go through all the hassle of calling
pick_cpu, and then (if the runqueue happens to change) we end up picking
*yet another* random cpu within that runqueue.

But that's a fix for another time I think.  I agree with Dario's
suggestion here.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to