On 29/09/15 22:40, Dario Faggioli wrote:
> On Tue, 2015-09-29 at 18:31 +0100, Andrew Cooper wrote:
>> On 29/09/15 17:55, Dario Faggioli wrote:
>>> The insert_vcpu() scheduler hook is called with an
>>> inconsistent locking strategy. In fact, it is sometimes
>>> invoked while holding the runqueue lock and sometimes
>>> when that is not the case.
>>>
>>> In other words, some call sites seems to imply that
>>> locking should be handled in the callers, in schedule.c
>>> --e.g., in schedule_cpu_switch(), which acquires the
>>> runqueue lock before calling the hook; others that
>>> specific schedulers should be responsible for locking
>>> themselves --e.g., in sched_move_domain(), which does
>>> not acquire any lock for calling the hook.
>>>
>>> The right thing to do seems to always defer locking to
>>> the specific schedulers, as it's them that know what, how
>>> and when it is best to lock (as in: runqueue locks, vs.
>>> private scheduler locks, vs. both, etc.)
>>>
>>> This patch, therefore:
>>>  - removes any locking around insert_vcpu() from
>>>    generic code (schedule.c);
>>>  - add the _proper_ locking in the hook implementations,
>>>    depending on the scheduler (for instance, credit2
>>>    does that already, credit1 and RTDS need to grab
>>>    the runqueue lock while manipulating runqueues).
>>>
>>> In case of credit1, remove_vcpu() handling needs some
>>> fixing remove_vcpu() too, i.e.:
>>>  - it manipulates runqueues, so the runqueue lock must
>>>    be acquired;
>>>  - *_lock_irq() is enough, there is no need to do
>>>    _irqsave()
>> Nothing in any of generic scheduling code should need interrupts
>> disabled at all.
>>
> That's a really, really, really interesting point.
>
> I think I see what you mean. However, currently, pretty much **all**
> scheduling related locks are acquired via _irq or _irqsave primitives,
> and that is true for schedule.c and for all the sched_*.c files.
>
>> One of the problem-areas identified by Jenny during the ticketlock
>> performance work was that the SCHEDULE_SOFTIRQ was a large consumer
>> of
>> time with interrupts disabled.
>>
> Right, and I am very much up for investigating whether this can
> improve. However, this seems to me the topic for a different series.
>
>> Is the use of _lock_irq() here to cover another issue expecting
>> interrupts to be disabled, or could it be replaced with a plain
>> spin_lock()?
>>
> As said, it is probably the case that spin_lock() would be ok, here as
> well as elsewhere. This is being done like this in this patch for
> consistency, as that is what happens **everywhere** else in scheduling
> code. In fact, I haven't tried, but it may well be the case that,
> converting only one (or a subset) of locks to non _irq* variants, we'd
> make check_lock() complain.
>
> So, can we just allow this patch to follow suit, and then overhaul and
> change/fix (if it reveals feasible) all locking at once, in a dedicated
> series? This seems the best approach to me...

This seems reasonable to me.  I just wanted to check that you were not
using the _irq() variants "just because".  I did suspect that the _irq()
variants were in use because everything else uses _irq()/_irqsave().

This change doesn't make the matter worse, and fixing that can of worms
is probably going to be a whole series in itself, therefore no further
objections from me.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to