On 29/09/15 22:40, Dario Faggioli wrote: > On Tue, 2015-09-29 at 18:31 +0100, Andrew Cooper wrote: >> On 29/09/15 17:55, Dario Faggioli wrote: >>> The insert_vcpu() scheduler hook is called with an >>> inconsistent locking strategy. In fact, it is sometimes >>> invoked while holding the runqueue lock and sometimes >>> when that is not the case. >>> >>> In other words, some call sites seems to imply that >>> locking should be handled in the callers, in schedule.c >>> --e.g., in schedule_cpu_switch(), which acquires the >>> runqueue lock before calling the hook; others that >>> specific schedulers should be responsible for locking >>> themselves --e.g., in sched_move_domain(), which does >>> not acquire any lock for calling the hook. >>> >>> The right thing to do seems to always defer locking to >>> the specific schedulers, as it's them that know what, how >>> and when it is best to lock (as in: runqueue locks, vs. >>> private scheduler locks, vs. both, etc.) >>> >>> This patch, therefore: >>> - removes any locking around insert_vcpu() from >>> generic code (schedule.c); >>> - add the _proper_ locking in the hook implementations, >>> depending on the scheduler (for instance, credit2 >>> does that already, credit1 and RTDS need to grab >>> the runqueue lock while manipulating runqueues). >>> >>> In case of credit1, remove_vcpu() handling needs some >>> fixing remove_vcpu() too, i.e.: >>> - it manipulates runqueues, so the runqueue lock must >>> be acquired; >>> - *_lock_irq() is enough, there is no need to do >>> _irqsave() >> Nothing in any of generic scheduling code should need interrupts >> disabled at all. >> > That's a really, really, really interesting point. > > I think I see what you mean. However, currently, pretty much **all** > scheduling related locks are acquired via _irq or _irqsave primitives, > and that is true for schedule.c and for all the sched_*.c files. > >> One of the problem-areas identified by Jenny during the ticketlock >> performance work was that the SCHEDULE_SOFTIRQ was a large consumer >> of >> time with interrupts disabled. >> > Right, and I am very much up for investigating whether this can > improve. However, this seems to me the topic for a different series. > >> Is the use of _lock_irq() here to cover another issue expecting >> interrupts to be disabled, or could it be replaced with a plain >> spin_lock()? >> > As said, it is probably the case that spin_lock() would be ok, here as > well as elsewhere. This is being done like this in this patch for > consistency, as that is what happens **everywhere** else in scheduling > code. In fact, I haven't tried, but it may well be the case that, > converting only one (or a subset) of locks to non _irq* variants, we'd > make check_lock() complain. > > So, can we just allow this patch to follow suit, and then overhaul and > change/fix (if it reveals feasible) all locking at once, in a dedicated > series? This seems the best approach to me...
This seems reasonable to me. I just wanted to check that you were not using the _irq() variants "just because". I did suspect that the _irq() variants were in use because everything else uses _irq()/_irqsave(). This change doesn't make the matter worse, and fixing that can of worms is probably going to be a whole series in itself, therefore no further objections from me. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel