Re: [Xen-devel] [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked

2018-01-12 Thread Roger Pau Monné
On Fri, Jan 12, 2018 at 12:16:47PM +0100, Dario Faggioli wrote:
> On Fri, 2018-01-12 at 10:45 +, Roger Pau Monné wrote:
> > On Fri, Jan 12, 2018 at 10:54:03AM +0100, Dario Faggioli wrote:
> >
> > > Err... yes. BTW, either there are a couple of typos in the above
> > > paragraph, or it's me that can't read it well. Anyway, just to be
> > > clear, if we have 4 pCPUs, and 6 VMs, with 1 vCPU each, this might
> > > be
> > > the situation:
> > > 
> > > CPU0 <-- d1v0
> > > CPU1 <-- d2v0
> > > CPU2 <-- d3v0
> > > CPU3 <-- d4v0
> > > 
> > > Waitqueue: d5v0,d6v0
> > > 
> > > Then, if d2 leaves/dies/etc, leaving CPU1 idle, d5v0 is picked up
> > > from
> > > the waitqueue and assigned to CPU1.
> > 
> > I think the above example is not representative of what happens
> > inside
> > of the shim, 
> >
> Indeed it's not. I was just trying to clarify, via an example, George's
> explanation of how null works in general.
> 
> > since there's only one domain that runs on the shim, so
> > the picture is something like:
> > 
> > CPU0 <-- d1v0
> > CPU1 <-- d1v1
> > 
> > waitqueue: d1v2 (down), d1v3 (down)
> > 
> Right. So, how about we change this in such a way that d1v2 and d1v3,
> since they're offline, won't end up in the waitqueue?

Sounds fine. I have to admit this is the first time I play with the
scheduler code, so it's quite likely that whatever you say will seem
OK to me :).

> > Then if the guest brings up another vCPU, let's assume it's vCPU#3
> > pCPU#3 will be bring up form the shim PoV, and the null scheduler
> > will
> > assign the first vCPU on the waitqueue:
> > 
> > CPU0 <-- d1v0
> > CPU1 <-- d1v1
> > CPU3 <-- d1v2 (down)
> > NULL <-- d1v3 (up)
> > 
> > Hence d1v2 which is still down will get assigned to CPU#3, and d1v3
> > which is up won't get assigned to any pCPU, and hence won't run.
> > 
> Exactly. While, if d1v2 and d1v3 were not in the waitqueue, while
> offline, at all, whould would (should) happen is:
> 
> - CPU3 comes online ("in" the shim)
> - CPU3 stays idle, as there's nothing in the waitqueue
> - d1v3 comes online and is added to the shim's null scheduler
> - as CPU3 does not have any vCPU assigned, d1v3 is assigned to it

Yes, that's what I'm aiming for :).

> > So using the scenario from before:
> > 
> > CPU0 <-- d1v0
> > CPU1 <-- d1v1
> > 
> > waitqueue: d1v2 (down), d1v3 (down)
> > 
> > Guest decided to hotplug vCPU#2, and hence the shim first hotplugs
> > CPU#2, but at the point CPU2 is added to the pool of CPUs vCPU2 is
> > still not up, hence we get the following:
> > 
> > CPU0 <-- d1v0
> > CPU1 <-- d1v1
> > CPU2 <-- NULL
> > 
> > waitqueue: d1v2 (down), d1v3 (down)
> > 
> > Then d1v2 is brought up, but since the null scheduler doesn't react
> > to
> > wakeup the picture stays the same:
> > 
> > CPU0 <-- d1v0
> > CPU1 <-- d1v1
> > CPU2 <-- NULL
> > 
> > waitqueue: d1v2 (up), d1v3 (down)
> > 
> > And d1v2 doesn't get scheduled.
> > 
> > Hope this makes sense :)
> > 
> Yeah, and I see that it works.
> 
> What I'm saying is that I'd prefer, instead than having the null
> scheduler reacting to wakeups of vCPUs in the waitqueue, to avoid
> having the offline vCPUs in the waitqueue all together.
> 
> At which point, when d1v2 hotplug happens, there has to be a
> null_vcpu_insert() (or something equivalent), to which the null
> scheduler should react already.

That seems fine to me, I will try to take a look at implementing this.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked

2018-01-12 Thread Dario Faggioli
On Fri, 2018-01-12 at 10:45 +, Roger Pau Monné wrote:
> On Fri, Jan 12, 2018 at 10:54:03AM +0100, Dario Faggioli wrote:
>
> > Err... yes. BTW, either there are a couple of typos in the above
> > paragraph, or it's me that can't read it well. Anyway, just to be
> > clear, if we have 4 pCPUs, and 6 VMs, with 1 vCPU each, this might
> > be
> > the situation:
> > 
> > CPU0 <-- d1v0
> > CPU1 <-- d2v0
> > CPU2 <-- d3v0
> > CPU3 <-- d4v0
> > 
> > Waitqueue: d5v0,d6v0
> > 
> > Then, if d2 leaves/dies/etc, leaving CPU1 idle, d5v0 is picked up
> > from
> > the waitqueue and assigned to CPU1.
> 
> I think the above example is not representative of what happens
> inside
> of the shim, 
>
Indeed it's not. I was just trying to clarify, via an example, George's
explanation of how null works in general.

> since there's only one domain that runs on the shim, so
> the picture is something like:
> 
> CPU0 <-- d1v0
> CPU1 <-- d1v1
> 
> waitqueue: d1v2 (down), d1v3 (down)
> 
Right. So, how about we change this in such a way that d1v2 and d1v3,
since they're offline, won't end up in the waitqueue?

> Then if the guest brings up another vCPU, let's assume it's vCPU#3
> pCPU#3 will be bring up form the shim PoV, and the null scheduler
> will
> assign the first vCPU on the waitqueue:
> 
> CPU0 <-- d1v0
> CPU1 <-- d1v1
> CPU3 <-- d1v2 (down)
> NULL <-- d1v3 (up)
> 
> Hence d1v2 which is still down will get assigned to CPU#3, and d1v3
> which is up won't get assigned to any pCPU, and hence won't run.
> 
Exactly. While, if d1v2 and d1v3 were not in the waitqueue, while
offline, at all, whould would (should) happen is:

- CPU3 comes online ("in" the shim)
- CPU3 stays idle, as there's nothing in the waitqueue
- d1v3 comes online and is added to the shim's null scheduler
- as CPU3 does not have any vCPU assigned, d1v3 is assigned to it

> > Mmm, wait. In case of a domain which specifies both maxvcpus and
> > curvcpus, how many vCPUs does the domain in which the shim run?
> 
> Regardless of the values of maxvcpus and curvcpus PV guests are
> always
> started with only the BSP online, and then the guest itself brings up
> other vCPUs.
> 
> In the shim case vCPU hotplug is tied to pCPU hotplug, so everytime
> the guest hotplugs or unplugs a vCPU the shim does exactly the same
> with it's CPUs.
> 
Sure, what I was asking was much rather this: if the guest config file
has "maxvcpus=4;vcpus=1", at the end of domain creation, and before any
`xl vcpu-set' or anything that would bring online other guest vCPU,
what's the output of `vl vcpu-list'. :-)

Anyway, I think you've answered to this below.

> > I'm not sure how an offline vCPU can end up there... but maybe I
> > need
> > to look at the code better, with the shim use case in mind.
> > 
> > Anyway, I'm fine with checks that prevent offline vCPUs to be
> > assigned
> > to either pCPUs (like, the CPUs of L0 Xen) or shim's vCPUs (so, the
> > CPUs of L1 Xen). I'm less fine with rescheduling everyone at every
> > wakeup.
> 
> So using the scenario from before:
> 
> CPU0 <-- d1v0
> CPU1 <-- d1v1
> 
> waitqueue: d1v2 (down), d1v3 (down)
> 
> Guest decided to hotplug vCPU#2, and hence the shim first hotplugs
> CPU#2, but at the point CPU2 is added to the pool of CPUs vCPU2 is
> still not up, hence we get the following:
> 
> CPU0 <-- d1v0
> CPU1 <-- d1v1
> CPU2 <-- NULL
> 
> waitqueue: d1v2 (down), d1v3 (down)
> 
> Then d1v2 is brought up, but since the null scheduler doesn't react
> to
> wakeup the picture stays the same:
> 
> CPU0 <-- d1v0
> CPU1 <-- d1v1
> CPU2 <-- NULL
> 
> waitqueue: d1v2 (up), d1v3 (down)
> 
> And d1v2 doesn't get scheduled.
> 
> Hope this makes sense :)
> 
Yeah, and I see that it works.

What I'm saying is that I'd prefer, instead than having the null
scheduler reacting to wakeups of vCPUs in the waitqueue, to avoid
having the offline vCPUs in the waitqueue all together.

At which point, when d1v2 hotplug happens, there has to be a
null_vcpu_insert() (or something equivalent), to which the null
scheduler should react already.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked

2018-01-12 Thread Roger Pau Monné
On Fri, Jan 12, 2018 at 10:54:03AM +0100, Dario Faggioli wrote:
> Hi!
> 
> First of all, my filters somehow failed to highlight this for me, so
> sorry if I did not notice it earlier (and now, I need new filters
> anyway, as the email I'm using is different :-D).
> 
> I'll have a look at the patch ASAP.
> 
> On Mon, 2018-01-08 at 11:12 +, George Dunlap wrote:
> > On 01/08/2018 10:37 AM, Jan Beulich wrote:
> >
> > > I don't understand: Isn't the null scheduler not moving around
> > > vCPU-s at all? At least that's what the comment at the top of the
> > > file says, unless I'm mis-interpreting it. If so, how can "some CPU
> > > (...) pick this vCPU"?
> > 
> > There's no current way to prevent a user from adding more vcpus to a
> > pool than there are pcpus (if nothing else, by creating a new VM in a
> > given pool), or from taking pcpus from a pool in which #vcpus >=
> > #pcpus.
> > 
> Exactly. And something that checks for that is all but easy to
> introduce (let's just avoid even mentioning enforcing!).
> 
> > The null scheduler deals with this by having a queue of "unassigned"
> > vcpus that are waiting for a free pcpu.  When a pcpu becomes
> > available,
> > it will do the assignment.  When a pcpu that has a vcpu is assigned
> > is
> > removed from the pool, that vcpu is assigned to a different pcpu if
> > one
> > is available; if not, it is put on the list.
> > 
> Err... yes. BTW, either there are a couple of typos in the above
> paragraph, or it's me that can't read it well. Anyway, just to be
> clear, if we have 4 pCPUs, and 6 VMs, with 1 vCPU each, this might be
> the situation:
> 
> CPU0 <-- d1v0
> CPU1 <-- d2v0
> CPU2 <-- d3v0
> CPU3 <-- d4v0
> 
> Waitqueue: d5v0,d6v0
> 
> Then, if d2 leaves/dies/etc, leaving CPU1 idle, d5v0 is picked up from
> the waitqueue and assigned to CPU1.

I think the above example is not representative of what happens inside
of the shim, since there's only one domain that runs on the shim, so
the picture is something like:

CPU0 <-- d1v0
CPU1 <-- d1v1

waitqueue: d1v2 (down), d1v3 (down)

Then if the guest brings up another vCPU, let's assume it's vCPU#3
pCPU#3 will be bring up form the shim PoV, and the null scheduler will
assign the first vCPU on the waitqueue:

CPU0 <-- d1v0
CPU1 <-- d1v1
CPU3 <-- d1v2 (down)
NULL <-- d1v3 (up)

Hence d1v2 which is still down will get assigned to CPU#3, and d1v3
which is up won't get assigned to any pCPU, and hence won't run.

> > In the case of shim mode, this also seems to happen whenever curvcpus
> > <
> > maxvcpus: The L1 hypervisor (shim) only sees curvcpus cpus on which
> > to
> > schedule L2 vcpus, but the L2 guest has maxvcpus vcpus to schedule,
> > of
> > which (maxvcpus-curvcpus) are  marked 'down'.  
> >
> Mmm, wait. In case of a domain which specifies both maxvcpus and
> curvcpus, how many vCPUs does the domain in which the shim run?

Regardless of the values of maxvcpus and curvcpus PV guests are always
started with only the BSP online, and then the guest itself brings up
other vCPUs.

In the shim case vCPU hotplug is tied to pCPU hotplug, so everytime
the guest hotplugs or unplugs a vCPU the shim does exactly the same
with it's CPUs.

> > In this case, it also
> > seems that the null scheduler sometimes schedules a "down" vcpu when
> > there are "up" vcpus on the list; meaning that the "up" vcpus are
> > never
> > scheduled.
> > 
> I'm not sure how an offline vCPU can end up there... but maybe I need
> to look at the code better, with the shim use case in mind.
> 
> Anyway, I'm fine with checks that prevent offline vCPUs to be assigned
> to either pCPUs (like, the CPUs of L0 Xen) or shim's vCPUs (so, the
> CPUs of L1 Xen). I'm less fine with rescheduling everyone at every
> wakeup.

So using the scenario from before:

CPU0 <-- d1v0
CPU1 <-- d1v1

waitqueue: d1v2 (down), d1v3 (down)

Guest decided to hotplug vCPU#2, and hence the shim first hotplugs
CPU#2, but at the point CPU2 is added to the pool of CPUs vCPU2 is
still not up, hence we get the following:

CPU0 <-- d1v0
CPU1 <-- d1v1
CPU2 <-- NULL


waitqueue: d1v2 (down), d1v3 (down)

Then d1v2 is brought up, but since the null scheduler doesn't react to
wakeup the picture stays the same:

CPU0 <-- d1v0
CPU1 <-- d1v1
CPU2 <-- NULL


waitqueue: d1v2 (up), d1v3 (down)

And d1v2 doesn't get scheduled.

Hope this makes sense :)

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked

2018-01-12 Thread Dario Faggioli
On Thu, 2018-01-04 at 13:05 +, Wei Liu wrote:
> From: Roger Pau Monne 
> 
> Avoid scheduling vCPUs that are blocked, there's no point in
> assigning
> them to a pCPU because they are not going to run anyway.
> 
> Since blocked vCPUs are not assigned to pCPUs after this change,
> force
> a rescheduling when a vCPU is brought up if it's on the waitqueue.
> Also when scheduling try to pick a vCPU from the runqueue if the pCPU
> is running idle.
> 
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: George Dunlap 
> Cc: Dario Faggioli 
> ---
> Changes since v1:
>  - Force a rescheduling when a vCPU is brought up.
>  - Try to pick a vCPU from the runqueue if running the idle vCPU.
>
As noted by Jan already, there's a mixing of "blocked" and "down" (or
offline).

In the null scheduler, a vCPU that is assigned to a pCPU, is free to
block and wake-up as many time as it wants (quite obviously). And when
it blocks, the pCPU will just stay idle.

There's no such thing of pulling on the CPU another vCPU, either from
the waitqueue or from anywhere else. That's the whole point of the
scheduler, actually.

Now, I'm not quite sure whether or not this can be a problem in the
"shim scenario". If it is, we have to think of a solution that does not
totally defeat the purpose of the scheduler when used baremetal.

Or use another scheduler, perhaps configuring static 1:1 pinning. Null
seems a great fit for this use case to me, so, I'd say, let's try to
find a nice and cool way to use it. :-)

> ---
>  xen/common/sched_null.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
> index b4a24baf8e..bacfb31cb3 100644
> --- a/xen/common/sched_null.c
> +++ b/xen/common/sched_null.c
> @@ -574,6 +574,8 @@ static void null_vcpu_wake(const struct scheduler
> *ops, struct vcpu *v)
>  {
>  /* Not exactly "on runq", but close enough for reusing the
> counter */
>  SCHED_STAT_CRANK(vcpu_wake_onrunq);
> +/* Force a rescheduling in case some CPU is idle can pick
> this vCPU */
> +cpumask_raise_softirq(&cpu_online_map, SCHEDULE_SOFTIRQ);
>  return;
>
This needs to become 'the cpus of vcpu->domain 's cpupool'.

I appreciate that this is fine, when running as shim, where you
certainly don't use cpupools. But when this run as baremetal, if we use
cpu_online_map, basically _all_ the online CPUs --even the ones that
are in another pool, under a different scheduler-- will be forced to
reschedule. And we don't want that.

I'm not also 100% convinced that this must/can live here. Basially,
you're saying that vcpu_wake() is called on a vCPU that happens to be
in the waitqueue, we should reschedule. And, AFAIUI, this is to cover
the case of a vCPU of the L2 guest comes online.

Well, it may even be technically fine. Still, if what we want to deal
with is vCPU onlining, I would prefer to at least trying find a place
which is more related to the onlining path, than to the wakeup path.

If you confirm your intent, I can have a look at the code and try to
identify such better place...

> @@ -761,9 +763,10 @@ static struct task_slice null_schedule(const
> struct scheduler *ops,
>  /*
>   * We may be new in the cpupool, or just coming back online. In
> which
>   * case, there may be vCPUs in the waitqueue that we can assign
> to us
> - * and run.
> + * and run. Also check whether this CPU is running idle, in
> which case try
> + * to pick a vCPU from the waitqueue.
>   */
> -if ( unlikely(ret.task == NULL) )
> +if ( unlikely(ret.task == NULL || ret.task == idle_vcpu[cpu]) )
>
I don't think I understand this. I may be a bit rusty, but are you sure
that, on an idle pCPU, ret.task is idle_vcpu at this point in this
function? I don't think it is.

Also, I'm quite sure this may mess up things for tasklets. In fact, one
case when ret.task is idle_vcpu here, if I have just forced it to be
so, in order to run a tasklet. But with this, we scan the waitqueue
instead, and may end up running something else.

> @@ -781,6 +784,10 @@ static struct task_slice null_schedule(const
> struct scheduler *ops,
>  {
>  list_for_each_entry( wvc, &prv->waitq, waitq_elem )
>  {
> +if ( test_bit(_VPF_down, &wvc->vcpu->pause_flags) )
> +/* Skip vCPUs that are down. */
> +continue;
> +
So, yes, I think things like this are what we want. As said above for
the wakeup case, though, I'd prefer to find a way to avoid that offline
vCPUs ends up in the waitqueue, rather than having to skip them.

Side note, is_vcpu_online() can be used for the test.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part

Re: [Xen-devel] [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked

2018-01-12 Thread Dario Faggioli
Hi!

First of all, my filters somehow failed to highlight this for me, so
sorry if I did not notice it earlier (and now, I need new filters
anyway, as the email I'm using is different :-D).

I'll have a look at the patch ASAP.

On Mon, 2018-01-08 at 11:12 +, George Dunlap wrote:
> On 01/08/2018 10:37 AM, Jan Beulich wrote:
>
> > I don't understand: Isn't the null scheduler not moving around
> > vCPU-s at all? At least that's what the comment at the top of the
> > file says, unless I'm mis-interpreting it. If so, how can "some CPU
> > (...) pick this vCPU"?
> 
> There's no current way to prevent a user from adding more vcpus to a
> pool than there are pcpus (if nothing else, by creating a new VM in a
> given pool), or from taking pcpus from a pool in which #vcpus >=
> #pcpus.
> 
Exactly. And something that checks for that is all but easy to
introduce (let's just avoid even mentioning enforcing!).

> The null scheduler deals with this by having a queue of "unassigned"
> vcpus that are waiting for a free pcpu.  When a pcpu becomes
> available,
> it will do the assignment.  When a pcpu that has a vcpu is assigned
> is
> removed from the pool, that vcpu is assigned to a different pcpu if
> one
> is available; if not, it is put on the list.
> 
Err... yes. BTW, either there are a couple of typos in the above
paragraph, or it's me that can't read it well. Anyway, just to be
clear, if we have 4 pCPUs, and 6 VMs, with 1 vCPU each, this might be
the situation:

CPU0 <-- d1v0
CPU1 <-- d2v0
CPU2 <-- d3v0
CPU3 <-- d4v0

Waitqueue: d5v0,d6v0

Then, if d2 leaves/dies/etc, leaving CPU1 idle, d5v0 is picked up from
the waitqueue and assigned to CPU1.


> In the case of shim mode, this also seems to happen whenever curvcpus
> <
> maxvcpus: The L1 hypervisor (shim) only sees curvcpus cpus on which
> to
> schedule L2 vcpus, but the L2 guest has maxvcpus vcpus to schedule,
> of
> which (maxvcpus-curvcpus) are  marked 'down'.  
>
Mmm, wait. In case of a domain which specifies both maxvcpus and
curvcpus, how many vCPUs does the domain in which the shim run?

> In this case, it also
> seems that the null scheduler sometimes schedules a "down" vcpu when
> there are "up" vcpus on the list; meaning that the "up" vcpus are
> never
> scheduled.
> 
I'm not sure how an offline vCPU can end up there... but maybe I need
to look at the code better, with the shim use case in mind.

Anyway, I'm fine with checks that prevent offline vCPUs to be assigned
to either pCPUs (like, the CPUs of L0 Xen) or shim's vCPUs (so, the
CPUs of L1 Xen). I'm less fine with rescheduling everyone at every
wakeup.

Roger, Wei, if/when you want to talk a bit about this, to explain the
situation a bit better, so I'll be able to help, feel free to ping me
 (email or IRC). :-)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked

2018-01-08 Thread George Dunlap
On 01/08/2018 10:37 AM, Jan Beulich wrote:
 On 04.01.18 at 14:05,  wrote:
>> From: Roger Pau Monne 
>>
>> Avoid scheduling vCPUs that are blocked, there's no point in assigning
>> them to a pCPU because they are not going to run anyway.
>>
>> Since blocked vCPUs are not assigned to pCPUs after this change, force
>> a rescheduling when a vCPU is brought up if it's on the waitqueue.
>> Also when scheduling try to pick a vCPU from the runqueue if the pCPU
>> is running idle.
> 
> I don't think the description adequately describes the changes,
> perhaps (in part) because ...
> 
>> Changes since v1:
>>  - Force a rescheduling when a vCPU is brought up.
>>  - Try to pick a vCPU from the runqueue if running the idle vCPU.
> 
> ... it wasn't updated after making these adjustments.
> 
>> --- a/xen/common/sched_null.c
>> +++ b/xen/common/sched_null.c
>> @@ -574,6 +574,8 @@ static void null_vcpu_wake(const struct scheduler *ops, 
>> struct vcpu *v)
>>  {
>>  /* Not exactly "on runq", but close enough for reusing the counter 
>> */
>>  SCHED_STAT_CRANK(vcpu_wake_onrunq);
>> +/* Force a rescheduling in case some CPU is idle can pick this vCPU 
>> */
>> +cpumask_raise_softirq(&cpu_online_map, SCHEDULE_SOFTIRQ);
>>  return;
>>  }
> 
> I don't understand: Isn't the null scheduler not moving around
> vCPU-s at all? At least that's what the comment at the top of the
> file says, unless I'm mis-interpreting it. If so, how can "some CPU
> (...) pick this vCPU"?

There's no current way to prevent a user from adding more vcpus to a
pool than there are pcpus (if nothing else, by creating a new VM in a
given pool), or from taking pcpus from a pool in which #vcpus >= #pcpus.

The null scheduler deals with this by having a queue of "unassigned"
vcpus that are waiting for a free pcpu.  When a pcpu becomes available,
it will do the assignment.  When a pcpu that has a vcpu is assigned is
removed from the pool, that vcpu is assigned to a different pcpu if one
is available; if not, it is put on the list.

In the case of shim mode, this also seems to happen whenever curvcpus <
maxvcpus: The L1 hypervisor (shim) only sees curvcpus cpus on which to
schedule L2 vcpus, but the L2 guest has maxvcpus vcpus to schedule, of
which (maxvcpus-curvcpus) are  marked 'down'.  In this case, it also
seems that the null scheduler sometimes schedules a "down" vcpu when
there are "up" vcpus on the list; meaning that the "up" vcpus are never
scheduled.

(This is just my understanding from conversations with Roger; I haven't
actually looked at the code to verify a number of the statements in the
previous paragraph.)

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked

2018-01-08 Thread Jan Beulich
>>> On 04.01.18 at 14:05,  wrote:
> From: Roger Pau Monne 
> 
> Avoid scheduling vCPUs that are blocked, there's no point in assigning
> them to a pCPU because they are not going to run anyway.
> 
> Since blocked vCPUs are not assigned to pCPUs after this change, force
> a rescheduling when a vCPU is brought up if it's on the waitqueue.
> Also when scheduling try to pick a vCPU from the runqueue if the pCPU
> is running idle.

I don't think the description adequately describes the changes,
perhaps (in part) because ...

> Changes since v1:
>  - Force a rescheduling when a vCPU is brought up.
>  - Try to pick a vCPU from the runqueue if running the idle vCPU.

... it wasn't updated after making these adjustments.

> --- a/xen/common/sched_null.c
> +++ b/xen/common/sched_null.c
> @@ -574,6 +574,8 @@ static void null_vcpu_wake(const struct scheduler *ops, 
> struct vcpu *v)
>  {
>  /* Not exactly "on runq", but close enough for reusing the counter */
>  SCHED_STAT_CRANK(vcpu_wake_onrunq);
> +/* Force a rescheduling in case some CPU is idle can pick this vCPU 
> */
> +cpumask_raise_softirq(&cpu_online_map, SCHEDULE_SOFTIRQ);
>  return;
>  }

I don't understand: Isn't the null scheduler not moving around
vCPU-s at all? At least that's what the comment at the top of the
file says, unless I'm mis-interpreting it. If so, how can "some CPU
(...) pick this vCPU"?

> @@ -781,6 +784,10 @@ static struct task_slice null_schedule(const struct 
> scheduler *ops,
>  {
>  list_for_each_entry( wvc, &prv->waitq, waitq_elem )
>  {
> +if ( test_bit(_VPF_down, &wvc->vcpu->pause_flags) )
> +/* Skip vCPUs that are down. */
> +continue;

"Down" != "blocked" (as per the description).

Overall it's not really being made clear what problem there is that
this patch is intended to solve.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH RFC v1 42/74] sched/null: skip vCPUs on the waitqueue that are blocked

2018-01-04 Thread Wei Liu
From: Roger Pau Monne 

Avoid scheduling vCPUs that are blocked, there's no point in assigning
them to a pCPU because they are not going to run anyway.

Since blocked vCPUs are not assigned to pCPUs after this change, force
a rescheduling when a vCPU is brought up if it's on the waitqueue.
Also when scheduling try to pick a vCPU from the runqueue if the pCPU
is running idle.

Signed-off-by: Roger Pau Monné 
---
Cc: George Dunlap 
Cc: Dario Faggioli 
---
Changes since v1:
 - Force a rescheduling when a vCPU is brought up.
 - Try to pick a vCPU from the runqueue if running the idle vCPU.
---
 xen/common/sched_null.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index b4a24baf8e..bacfb31cb3 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -574,6 +574,8 @@ static void null_vcpu_wake(const struct scheduler *ops, 
struct vcpu *v)
 {
 /* Not exactly "on runq", but close enough for reusing the counter */
 SCHED_STAT_CRANK(vcpu_wake_onrunq);
+/* Force a rescheduling in case some CPU is idle can pick this vCPU */
+cpumask_raise_softirq(&cpu_online_map, SCHEDULE_SOFTIRQ);
 return;
 }
 
@@ -761,9 +763,10 @@ static struct task_slice null_schedule(const struct 
scheduler *ops,
 /*
  * We may be new in the cpupool, or just coming back online. In which
  * case, there may be vCPUs in the waitqueue that we can assign to us
- * and run.
+ * and run. Also check whether this CPU is running idle, in which case try
+ * to pick a vCPU from the waitqueue.
  */
-if ( unlikely(ret.task == NULL) )
+if ( unlikely(ret.task == NULL || ret.task == idle_vcpu[cpu]) )
 {
 spin_lock(&prv->waitq_lock);
 
@@ -781,6 +784,10 @@ static struct task_slice null_schedule(const struct 
scheduler *ops,
 {
 list_for_each_entry( wvc, &prv->waitq, waitq_elem )
 {
+if ( test_bit(_VPF_down, &wvc->vcpu->pause_flags) )
+/* Skip vCPUs that are down. */
+continue;
+
 if ( bs == BALANCE_SOFT_AFFINITY &&
  !has_soft_affinity(wvc->vcpu, 
wvc->vcpu->cpu_hard_affinity) )
 continue;
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel