Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2017-03-01 Thread Stefano Stabellini
CC'ing Julien.

On Wed, 1 Mar 2017, Dario Faggioli wrote:
> On Wed, 2017-03-01 at 02:05 +0200, Anastassios Nanos wrote:
> > On Wed, Dec 7, 2016 at 8:29 PM, Dario Faggioli
> >  wrote:
> > > 
> > > % Heterogeneous Multi Processing Support in Xen
> > > % Revision 1
> > > 
> > > [...]
> > Hi all,
> > 
> Hello,
> 
> > We are sending a branch[1] for comments on an initial implementation
> > of the above design document. Essentially it targets the ARM
> > big.LITTLE architecture. 
> >
> W00t ?!?! Just the fact that you did this, it is just great... thanks
> for that.

Yes, thank you for your work!


> > It would be great if you guys could comment
> > on the changes and provide some guidance for us to get it upstream.
> > 
> I'm sure up for that. I already know I won't have time to look at it
> until next week. But I'll make some space to look at the code then (I'm
> travelling, so I won't be furiously doing my own development anyway).
> 
> > We have tested it on an odroid xu4 [2] and we are able to boot guests
> > with mixed vcpu affinities (big and LITTLE).
> > 
> Great to hear this too.
> 
> > We are more than happy to submit patches once we address the issues
> > and come up with a review-able version of this implementation.
> > 
> Sure. So, from just a very quick glance, I can see an unique giant
> commit. This is ok for now, and I will look at it as it is.
> 
> But, for sure, the first step toward making things reviewable, is to
> split the big patch in a series of smaller patches, as you probably
> know yourself already. :-)
> 
> Since you're touching different component (as in, hypervisor,
> toolstack, build system, etc), splitting at the component boundaries is
> quite often something we want and ask for.
> 
> Another criteria, orthogonal to the one cited above, is to separate
> patches that change architecture specific code, from patches that
> touches common areas.
> 
> But, in general, the principle to follow is to split the patches at the
> "logical boundary", as this tries to explain:
> https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches#Break_down_your_patches

It would also be nice if you could summarize the design, and the main
architectural choices, in your introductory 0/N patch.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2017-03-01 Thread Dario Faggioli
On Wed, 2017-03-01 at 02:05 +0200, Anastassios Nanos wrote:
> On Wed, Dec 7, 2016 at 8:29 PM, Dario Faggioli
>  wrote:
> > 
> > % Heterogeneous Multi Processing Support in Xen
> > % Revision 1
> > 
> > [...]
> Hi all,
> 
Hello,

> We are sending a branch[1] for comments on an initial implementation
> of the above design document. Essentially it targets the ARM
> big.LITTLE architecture. 
>
W00t ?!?! Just the fact that you did this, it is just great... thanks
for that.

> It would be great if you guys could comment
> on the changes and provide some guidance for us to get it upstream.
> 
I'm sure up for that. I already know I won't have time to look at it
until next week. But I'll make some space to look at the code then (I'm
travelling, so I won't be furiously doing my own development anyway).

> We have tested it on an odroid xu4 [2] and we are able to boot guests
> with mixed vcpu affinities (big and LITTLE).
> 
Great to hear this too.

> We are more than happy to submit patches once we address the issues
> and come up with a review-able version of this implementation.
> 
Sure. So, from just a very quick glance, I can see an unique giant
commit. This is ok for now, and I will look at it as it is.

But, for sure, the first step toward making things reviewable, is to
split the big patch in a series of smaller patches, as you probably
know yourself already. :-)

Since you're touching different component (as in, hypervisor,
toolstack, build system, etc), splitting at the component boundaries is
quite often something we want and ask for.

Another criteria, orthogonal to the one cited above, is to separate
patches that change architecture specific code, from patches that
touches common areas.

But, in general, the principle to follow is to split the patches at the
"logical boundary", as this tries to explain:
https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches#Break_down_your_patches

It's a rather difficult call, especially for changes like this.
Therefore, as a first and fundamental step toward reviewability, I'd
suggest start thinking at how to do the splitup.

Anyway, I'll let you have my comments.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2017-02-28 Thread Anastassios Nanos
On Wed, Dec 7, 2016 at 8:29 PM, Dario Faggioli
 wrote:
> % Heterogeneous Multi Processing Support in Xen
> % Revision 1
>
> \clearpage
>
> # Basics
>
>  
>  Status: **Design Document**
>
> Architecture(s): x86, arm
>
>Component(s): Hypervisor and toolstack
>  
>
> # Overview
>
> HMP (Heterogeneous Multi Processing) and AMP (Asymmetric Multi Processing)
> refer to systems where physical CPUs are not exactly equal. It may be that
> they have different processing power, or capabilities, or that each is
> specifically designed to run a particular system component.
> Most of the times the CPUs have different Instruction Set Architectures (ISA)
> or Application Binary Interfaces (ABIs). But they may *just* be different
> implementations of the same ISA, in which case they typically differ in
> speed, power efficiency or handling of special things (e.g., erratas).
>
> An example is ARM big.LITTLE, which in fact, is the use case that got the
> discussion about HMP started. This document, however, is generic, and does
> not target only big.LITTLE.
>
> What need proper Xen support are systems and use cases where virtual CPUs
> can not be seamlessly moved around all the physical CPUs. In fact, in these
> cases, there must be a way to:
>
> * decide and specify on what (set of) physical CPU(s), each vCPU can execute 
> on;
> * enforce that a vCPU that can only run on a certain (set of) pCPUs, is never
>   actually run anywhere else.
>
> **N.B.:** it is becoming common to refer as AMP or HMP also to systems which
> have various kind of co-processors (from crypto engines to graphic hardware),
> integrated with the CPUs on the same chip. This is not what this design
> document is about.
>
> # Classes of CPUs
>
> A *class of CPUs* is defined as follows:
>
> 1. each pCPU in the system belongs to a class;
> 2. a class can consist of one or more pCPUs;
> 3. each pCPU can only be in one class;
> 4. CPUs belonging to the same class are homogeneous enough that a virtual
>CPU that blocks/is preempted while running on a pCPU of a class can,
>**seamlessly**, unblock/be scheduler on any pCPU of that same class;
> 5. when a virtual CPU is associated with a (set of) class(es) of CPUs, it
>means that the vCPU can run on all the pCPUs belonging to the said
>class(es).
>
> So, for instance, in architecture Foobar two classes of CPUs exist, class
> foo and class bar. If a virtual CPU running on a CPU 0, which is of class
> foo, blocks (or is preempted), it can, when it unblocks (or is selected by
> the scheduler to run  again), run on CPU 3, still of class foo, but not on
> CPU 6, which is of class bar.
>
> ## Defining classes
>
> How a class is defined, i.e., what are the specific characteristics that
> determine what CPUs belong to which class, is highly architecture specific.
>
> ### x86
>
> There is no HMP platform of relevance, for now, in x86 world. Therefore,
> only one class will exist, and all the CPUs will be set to belong to it.
> **TODO X86:** is this correct?
>
> ### ARM
>
> **TODO ARM:** I know nothing about what specifically should be used to
> form classes, so I'm deferring this to ARM people.
>
> So far, in the original thread the following ideas came up (well, there's
> more, but I don't know enough of ARM to judge what is really relevant about
> this topic):
>
> * 
> [Julien](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02153.html)
>   "I don't think an hardcoded list of processor in Xen is the right solution.
>There are many existing processors and combinations for big.LITTLE so it
>will nearly be impossible to keep updated."
> * 
> [Julien](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02256.html)
>   "Well, before trying to do something clever like that (i.e naming "big" and
>   "little"), we need to have upstreamed bindings available to acknowledge the
>   difference. AFAICT, it is not yet upstreamed for Device Tree and I don't
>   know any static ACPI tables providing the similar information."
> * 
> [Peng](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02194.html)
>   "For how to differentiate cpus, I am looking the linaro eas cpu topology 
> code"
>
> # User details
>
> ## Classes of CPUs for the users
>
> It will be possible, in a VM config file, to specify the (set of) class(es)
> of each vCPU. This allows creating HMP VMs.
>
> E.g., on ARM, it will be possible to create big.LITTLE VMs which, if run on
> big.LITTLE hosts, could leverage the big.LITTLE support of the guest OS kernel
> and tools.
>
> For such purpose, a new option will be added to xl config file:
>
> vcpus = "8"
> vcpuclass = ["0-2:class0", "3,4:class1,class3", "5:class0, class2", 
> "8:class4"]
>
> with the following meaning:
>
> * vCPUs 0, 1, 2 can only run on pcpus of class class0
> * vCPUs 3, 4 can run on pcpus of class class1 **and** on 

Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-16 Thread George Dunlap

> On Dec 16, 2016, at 4:05 PM, George Dunlap  wrote:
> 
> 
>> On Dec 8, 2016, at 2:29 AM, Dario Faggioli  wrote:
> 
>> For the vCPUs for which no class is specified, default behavior applies.
>> 
>> **TODO:** note that I think it must be possible to associate more than
>> one class to a vCPU. This is expressed in the example above, and assumed
>> to be true throughout the document. It might be, though, that, at least at
>> early stages (see implementation phases below), we will enable only 1-to-1
>> mapping.
>> 
>> **TODO:** default can be, either:
>> 
>> 1. the vCPU can run on any CPU of any class,
>> 2. the vCPU can only run on a specific, arbitrary decided, class (and I'd say
>>  that should be class 0).
> 
> I thought that one of the issues was that sometimes there are instructions 
> available on one pcpu and not on another; in which case once the kernel 
> initializes a particular vcpu on a particular pcpu it needs to stay there.

Sorry, this should say, “once a kernel initializes a particular vcpu on a 
particular *class of* pcpu, it needs to stay *within that class*.”

 -G

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-16 Thread George Dunlap

> On Dec 8, 2016, at 2:29 AM, Dario Faggioli  wrote:

> For the vCPUs for which no class is specified, default behavior applies.
> 
> **TODO:** note that I think it must be possible to associate more than
> one class to a vCPU. This is expressed in the example above, and assumed
> to be true throughout the document. It might be, though, that, at least at
> early stages (see implementation phases below), we will enable only 1-to-1
> mapping.
> 
> **TODO:** default can be, either:
> 
> 1. the vCPU can run on any CPU of any class,
> 2. the vCPU can only run on a specific, arbitrary decided, class (and I'd say
>   that should be class 0).

I thought that one of the issues was that sometimes there are instructions 
available on one pcpu and not on another; in which case once the kernel 
initializes a particular vcpu on a particular pcpu it needs to stay there.

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-16 Thread George Dunlap

> On Dec 9, 2016, at 5:09 PM, Jan Beulich  wrote:
> 
 On 09.12.16 at 09:29,  wrote:
>> On Fri, 2016-12-09 at 01:13 -0700, Jan Beulich wrote:
>> On 08.12.16 at 22:54,  wrote:
 Yeah, that was what was puzzling me too. Keeping them ordered has
 the
 nice property that if a user says the following in a config file:
 
 vcpuclass=["0-3:class0", "4-7:class1"]
 
 (assuming that class0 and class1 are the always available Xen
 names) it
>>> 
>>> This, btw, is another aspect I think has a basic problem: class0 and
>>> class1 say nothing about the properties of a class, and hence are
>>> tied to one particular host.
>>> 
>> The other way round, I'd say. I mean, since they say nothing, they're
>> _not_ host specific?
> 
> No, not really. Or perhaps we mean different things. The name
> itself of course can be anything, but what is relevant here is
> what it stands for. And "class0" may mean one thing on host 1
> and a completely different thing on host2. Yet we need a certain
> name to always mean the same thing (or else we'd need
> translation when moving VMs between hosts).
> 
>>> I think class names need to be descriptive
>>> and uniform across hosts. That would allow migration of such VMs as
>>> well as prevent starting them on a host not having suitable hardware.
>>> 
>> ...what George suggested (but please, George, when back, correct me if
>> I'm misrepresenting your ideas :-)) that:
>> - something generic, such as class0, class1 will always exist (well, 
>>   at least class0). They would basically constitute the Xen interface;
>> - toolstack will accept more specific names, such as 'big' and 
>>   'little', and also 'A57' and 'A43' (I'm making up the names), etc.
>> - a VM with vCPUs in class0 and class1 will always be created and run 
>>   on any 2 classes system;
> 
> How can that work, if you don't know what class1 represents?
> 
>> a VM with big and little vCPUs will only 
>>   run on an ARM big.LITTLE incarnation; a VM with A57 and A43 vCPUs 
>>   will only run on an host that has at least one A57 and one A43 
>>   pCPUs.
>> 
>> What's not clear to me is how to establish:
>> - the ordering among classes;
> 
> As said before - there's at best some partial ordering going to be
> possible.
> 
>> - the mapping between Xen's neuter names and the toolstack's (arch) 
>>   specific ones.
> 
> Perhaps it needs re-consideration whether class names make
> sense in the first place? What about, for example, making class
> names something entirely local to the domain config file, and
> besides specifying
> 
> vcpuclass=["0-3:class0", "4-7:class1"]
> 
> requiring for it to also specify the properties of the classes it
> uses:
> 
> class0=["..."]
> class1=["..."]
> 
> The specifiers then would be architecture specific, e.g.
> 
> class0=["arm64"]
> class1=["arm64.big"]
> 
> or on x86
> 
> class0=["x86-64"]
> class1=["x86.avx", "x86.avx2"]
> class2=["x86.XeonPhi"]
> 
> Of course this goes quite a bit in the direction of CPUID handling,
> so Andrew may have a word to say here.

So my goal when I made my suggestion was that:

1. People who knew exactly what they wanted and knew what their hardware would 
be could specify exactly what they wanted to happen.  This would probably 
include embedded chip vendors designing a custom system.

2. People who had a general preference but didn’t know the exact hardware could 
specify vague parameters (such as “large class” or “small class”) and get 
something approximating the vague parameters.  This might include people who 
were writing a generic piece of software to be run on a large class of 
potential devices (automotive, routers, ).

3. People who didn’t specify anything would get a default behavior which was 
sensible.

From what I remember of the last discussion, there is no “arm64.big”.  You 
might have an A15 core and an A7 core; and in that case the A15 core would be 
“big”.  But you also might have two A15 cores, one with a higher clock speed 
and/or more cache than the other.  So “arm64.big” and “arm64.little" isn't 
actually any more precise than “class0” and “class1”.

So my idea was (to re-iterate):
1. Sort them into classes by power
2. Allow the user to either specify the class number (class0 > class 1 > 
class2), *or* to make more specific requests (“arm64.A15”, ).

That way, people descibed by #3 can not specify anything, and the toolstack can 
decide whether to give it class 0/1 based on some heuristic and / or policy; 
people described by #2 can just say “class 0” or “class 1”, and people in class 
#1 can specify exactly what they want.

Now I understand that it may not always be clear which of two processors is 
“more powerful” — but to accomplish the above-stated goal, it turns out that’s 
not necessary.  If two processors are about equally powerful but in slightly 
different ways, then it doesn’t matter which one you get when you ask for “the 
bigger 

Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-15 Thread Juergen Gross
On 15/12/16 19:41, Dario Faggioli wrote:
> On Thu, 2016-12-08 at 11:38 +0100, Juergen Gross wrote:
>> So you really solved the following problem in credit2?
>>
>> You have three domains with 2 vcpus each and different weights. Run
>> them
>> on 3 physical cpus with following pinning:
>>
>> dom1: pcpu 1 and 2
>> dom2: pcpu 2 and 3
>> dom3: pcpu 1 and 3
>>
>> How do you decide which vcpu to run on which pcpu for how long?
>>
> Ok, back to this (sorry, a bit later than how I'd hoped). So, I tried
> to think a bit at the described scenario, but could not figure out what
> you are hinting at.
> 
> There are missing pieces of information, such as what the vcpus do, and
> what exactly are the weights (besides than being different).
> 
> Therefore, I decided to put together a quick eperiment. I've created
> the domains, sat up all their vcpus to run cpu-hog tasks, picked up a
> configuration of my choice for the weights, and run them under both
> Credit1 and Credit2.
> 
> It's a very simple tests, but it will hopefully be helpful in
> understanding the situation better.
> 
> Here's the result.
> 
> On Credit1, equal weigths, unpinned (i.e., plenty of pCPUs available):
>  NAME  CPU(%) [1]
>  vm1   199.9
>  vm2   199.9
>  vm3   199.9
> 
> Pinning as you suggest (i.e., to 3 pCPUs):
>  NAME  CPU(%) [2]
>  vm1   149.0
>  vm266.2
>  vm384.8
> 
> Changing the weights:
>  Name  ID Weight  Cap [3]
>  vm1   82560
>  vm2   95120
>  vm3   6   10240
>  NAME  CPU(%)
>  vm1   100.0
>  vm2   100.0
>  vm3   100.0
> 
> So, here in Credit1, things are ok when there's no pinning in place [1]. As 
> soon as we pin, _even_without_ touching the weights [2], things become 
> *crazy*. In fact, there's absolutely no reason why CPU% numbers would look 
> like how they look in [2].
> 
> This does not surprise me much, though. Credit1's load balancer basically 
> moves vcpus around in a pseudo random fashion, and having to enforce pinning 
> constraints make things even more unpredictable.
> 
> Then it comes the amusing part. At this point, I wonder if I haven't done 
> something wrong in setting up the experiments... Because things really looks 
> too funny. :-O
> In fact, for some reasons, changing the weights as shown [3] cause CPU% 
> numbers to fluctuate a bit (not visible above) and then to stabilize at 100%. 
> That may look like an improvement, but certainly does not reflect the chosen 
> set of weights.
> 
> So, I'd say you were right. Or, actually, things are even worse than what you 
> said: in Credit1, it's not only that pinning and weights does not play well 
> together, it's that even pinning alone works pretty bad.

I'd say: With credit1 pinning should be rather explicit in one of the
following ways:

- a vcpu should be pinned to only 1 pcpu, or
- a group of vcpus should be pinned to a group of pcpus no other
  vcpu is allowed to run on (cpupools seem to be the better choice
  in this case)

> Now, on Credit2, equal weigths, unpinned (i.e., plenty of pCPUs
> available):
>  NAME  CPU(%) [4]
>  vm1   199.9
>  vm2   199.9
>  vm3   199.9
> 
> Pinning as you suggest (i.e., to 3 pCPUs):
>  NAME  CPU(%) [5]
>  vm1   100.0
>  vm2   100.1
>  vm3   100.0
> 
> Changing the weights:
>  Name  ID Weight [6]
>  vm1   2256
>  vm2   3512
>  vm3   6   1024
>  NAME  CPU(%)
>  vm144.1
>  vm287.2
>  vm3   168.7
> 
> Which looks nearly *perfect* to me. :-)

_Really_ impressive!

> In fact, with no constraints [4], each VM gets the 200% share it's
> asking for.
> 
> When only 3 pCPUs can be used, by means of pinning [5], each VM gets
> its fair share of 100%.
> 
> When setting up weights in such a way that vm2 should get 2x CPU time
> than vm1 and vm3 should get 2x CPU time than vm2 [6], things looks,
> well, exactly like that! :-P
> 
> So, since I did not fully understand the problem, I'm not sure whether
> this really answers your question, but it look to me like it actually
> could! :-D
> 
> For sure, it puts Credit2 in rather a good light :-P.

Absolutely!


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-15 Thread Dario Faggioli
On Thu, 2016-12-08 at 11:38 +0100, Juergen Gross wrote:
> So you really solved the following problem in credit2?
> 
> You have three domains with 2 vcpus each and different weights. Run
> them
> on 3 physical cpus with following pinning:
> 
> dom1: pcpu 1 and 2
> dom2: pcpu 2 and 3
> dom3: pcpu 1 and 3
> 
> How do you decide which vcpu to run on which pcpu for how long?
> 
Ok, back to this (sorry, a bit later than how I'd hoped). So, I tried
to think a bit at the described scenario, but could not figure out what
you are hinting at.

There are missing pieces of information, such as what the vcpus do, and
what exactly are the weights (besides than being different).

Therefore, I decided to put together a quick eperiment. I've created
the domains, sat up all their vcpus to run cpu-hog tasks, picked up a
configuration of my choice for the weights, and run them under both
Credit1 and Credit2.

It's a very simple tests, but it will hopefully be helpful in
understanding the situation better.

Here's the result.

On Credit1, equal weigths, unpinned (i.e., plenty of pCPUs available):
 NAME  CPU(%) [1]
 vm1   199.9
 vm2   199.9
 vm3   199.9

Pinning as you suggest (i.e., to 3 pCPUs):
 NAME  CPU(%) [2]
 vm1   149.0
 vm2    66.2
 vm3    84.8

Changing the weights:
 Name  ID Weight  Cap [3]
 vm1   82560
 vm2   95120
 vm3   6   10240
 NAME  CPU(%)
 vm1   100.0
 vm2   100.0
 vm3   100.0

So, here in Credit1, things are ok when there's no pinning in place [1]. As 
soon as we pin, _even_without_ touching the weights [2], things become *crazy*. 
In fact, there's absolutely no reason why CPU% numbers would look like how they 
look in [2].

This does not surprise me much, though. Credit1's load balancer basically moves 
vcpus around in a pseudo random fashion, and having to enforce pinning 
constraints make things even more unpredictable.

Then it comes the amusing part. At this point, I wonder if I haven't done 
something wrong in setting up the experiments... Because things really looks 
too funny. :-O
In fact, for some reasons, changing the weights as shown [3] cause CPU% numbers 
to fluctuate a bit (not visible above) and then to stabilize at 100%. That may 
look like an improvement, but certainly does not reflect the chosen set of 
weights.

So, I'd say you were right. Or, actually, things are even worse than what you 
said: in Credit1, it's not only that pinning and weights does not play well 
together, it's that even pinning alone works pretty bad.


Now, on Credit2, equal weigths, unpinned (i.e., plenty of pCPUs
available):
 NAME  CPU(%) [4]
 vm1   199.9
 vm2   199.9
 vm3   199.9

Pinning as you suggest (i.e., to 3 pCPUs):
 NAME  CPU(%) [5]
 vm1   100.0
 vm2   100.1
 vm3   100.0

Changing the weights:
 Name  ID Weight [6]
 vm1   2256
 vm2   3512
 vm3   6   1024
 NAME  CPU(%)
 vm144.1
 vm287.2
 vm3   168.7

Which looks nearly *perfect* to me. :-)

In fact, with no constraints [4], each VM gets the 200% share it's
asking for.

When only 3 pCPUs can be used, by means of pinning [5], each VM gets
its fair share of 100%.

When setting up weights in such a way that vm2 should get 2x CPU time
than vm1 and vm3 should get 2x CPU time than vm2 [6], things looks,
well, exactly like that! :-P

So, since I did not fully understand the problem, I'm not sure whether
this really answers your question, but it look to me like it actually
could! :-D

For sure, it puts Credit2 in rather a good light :-P.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-09 Thread Stefano Stabellini
On Fri, 9 Dec 2016, Jan Beulich wrote:
> >>> On 09.12.16 at 09:29,  wrote:
> > On Fri, 2016-12-09 at 01:13 -0700, Jan Beulich wrote:
> >> > > > On 08.12.16 at 22:54,  wrote:
> >> > Yeah, that was what was puzzling me too. Keeping them ordered has
> >> > the
> >> > nice property that if a user says the following in a config file:
> >> > 
> >> >  vcpuclass=["0-3:class0", "4-7:class1"]
> >> > 
> >> > (assuming that class0 and class1 are the always available Xen
> >> > names) it
> >> 
> >> This, btw, is another aspect I think has a basic problem: class0 and
> >> class1 say nothing about the properties of a class, and hence are
> >> tied to one particular host.
> >>
> > The other way round, I'd say. I mean, since they say nothing, they're
> > _not_ host specific?
> 
> No, not really. Or perhaps we mean different things. The name
> itself of course can be anything, but what is relevant here is
> what it stands for. And "class0" may mean one thing on host 1
> and a completely different thing on host2. Yet we need a certain
> name to always mean the same thing (or else we'd need
> translation when moving VMs between hosts).
> 
> >>  I think class names need to be descriptive
> >> and uniform across hosts. That would allow migration of such VMs as
> >> well as prevent starting them on a host not having suitable hardware.
> >> 
> > ...what George suggested (but please, George, when back, correct me if
> > I'm misrepresenting your ideas :-)) that:
> >  - something generic, such as class0, class1 will always exist (well, 
> >at least class0). They would basically constitute the Xen interface;
> >  - toolstack will accept more specific names, such as 'big' and 
> >'little', and also 'A57' and 'A43' (I'm making up the names), etc.
> >  - a VM with vCPUs in class0 and class1 will always be created and run 
> >on any 2 classes system;
> 
> How can that work, if you don't know what class1 represents?
> 
> > a VM with big and little vCPUs will only 
> >run on an ARM big.LITTLE incarnation; a VM with A57 and A43 vCPUs 
> >will only run on an host that has at least one A57 and one A43 
> >pCPUs.
> > 
> > What's not clear to me is how to establish:
> >  - the ordering among classes;
> 
> As said before - there's at best some partial ordering going to be
> possible.
> 
> >  - the mapping between Xen's neuter names and the toolstack's (arch) 
> >specific ones.
> 
> Perhaps it needs re-consideration whether class names make
> sense in the first place? What about, for example, making class
> names something entirely local to the domain config file, and
> besides specifying
> 
> vcpuclass=["0-3:class0", "4-7:class1"]
> 
> requiring for it to also specify the properties of the classes it
> uses:
> 
> class0=["..."]
> class1=["..."]
> 
> The specifiers then would be architecture specific, e.g.
> 
> class0=["arm64"]
> class1=["arm64.big"]
> 
> or on x86
> 
> class0=["x86-64"]
> class1=["x86.avx", "x86.avx2"]
> class2=["x86.XeonPhi"]
> 
> Of course this goes quite a bit in the direction of CPUID handling,
> so Andrew may have a word to say here.

This is good, but given that we are not likely to support cross-arch
migration (i.e. ARM to x86), the xl parser can be smart enough to
accept the following syntax too, as an alias to the one you suggested:

vcpuclass=["0-3:arm64.big", "4-7:arm64.LITTLE"]

or even

vcpuclass=["0-3:big", "4-7:LITTLE"]

if the receiving end is not a big.LITTLE machine, it will be easy for it
to map "big" and "LITTLE" to two arbitrary classes, such as class0 and
class1.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-09 Thread Jan Beulich
>>> On 09.12.16 at 09:29,  wrote:
> On Fri, 2016-12-09 at 01:13 -0700, Jan Beulich wrote:
>> > > > On 08.12.16 at 22:54,  wrote:
>> > Yeah, that was what was puzzling me too. Keeping them ordered has
>> > the
>> > nice property that if a user says the following in a config file:
>> > 
>> >  vcpuclass=["0-3:class0", "4-7:class1"]
>> > 
>> > (assuming that class0 and class1 are the always available Xen
>> > names) it
>> 
>> This, btw, is another aspect I think has a basic problem: class0 and
>> class1 say nothing about the properties of a class, and hence are
>> tied to one particular host.
>>
> The other way round, I'd say. I mean, since they say nothing, they're
> _not_ host specific?

No, not really. Or perhaps we mean different things. The name
itself of course can be anything, but what is relevant here is
what it stands for. And "class0" may mean one thing on host 1
and a completely different thing on host2. Yet we need a certain
name to always mean the same thing (or else we'd need
translation when moving VMs between hosts).

>>  I think class names need to be descriptive
>> and uniform across hosts. That would allow migration of such VMs as
>> well as prevent starting them on a host not having suitable hardware.
>> 
> ...what George suggested (but please, George, when back, correct me if
> I'm misrepresenting your ideas :-)) that:
>  - something generic, such as class0, class1 will always exist (well, 
>at least class0). They would basically constitute the Xen interface;
>  - toolstack will accept more specific names, such as 'big' and 
>'little', and also 'A57' and 'A43' (I'm making up the names), etc.
>  - a VM with vCPUs in class0 and class1 will always be created and run 
>on any 2 classes system;

How can that work, if you don't know what class1 represents?

> a VM with big and little vCPUs will only 
>run on an ARM big.LITTLE incarnation; a VM with A57 and A43 vCPUs 
>will only run on an host that has at least one A57 and one A43 
>pCPUs.
> 
> What's not clear to me is how to establish:
>  - the ordering among classes;

As said before - there's at best some partial ordering going to be
possible.

>  - the mapping between Xen's neuter names and the toolstack's (arch) 
>specific ones.

Perhaps it needs re-consideration whether class names make
sense in the first place? What about, for example, making class
names something entirely local to the domain config file, and
besides specifying

vcpuclass=["0-3:class0", "4-7:class1"]

requiring for it to also specify the properties of the classes it
uses:

class0=["..."]
class1=["..."]

The specifiers then would be architecture specific, e.g.

class0=["arm64"]
class1=["arm64.big"]

or on x86

class0=["x86-64"]
class1=["x86.avx", "x86.avx2"]
class2=["x86.XeonPhi"]

Of course this goes quite a bit in the direction of CPUID handling,
so Andrew may have a word to say here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-09 Thread Dario Faggioli
On Fri, 2016-12-09 at 01:13 -0700, Jan Beulich wrote:
> > > > On 08.12.16 at 22:54,  wrote:
> > Yeah, that was what was puzzling me too. Keeping them ordered has
> > the
> > nice property that if a user says the following in a config file:
> > 
> >  vcpuclass=["0-3:class0", "4-7:class1"]
> > 
> > (assuming that class0 and class1 are the always available Xen
> > names) it
> 
> This, btw, is another aspect I think has a basic problem: class0 and
> class1 say nothing about the properties of a class, and hence are
> tied to one particular host.
>
The other way round, I'd say. I mean, since they say nothing, they're
_not_ host specific?

Anyway, naming was another thing on which the debate was not at all
closed, but the point is exactly the one you're making here, in fact...

>  I think class names need to be descriptive
> and uniform across hosts. That would allow migration of such VMs as
> well as prevent starting them on a host not having suitable hardware.
> 
...what George suggested (but please, George, when back, correct me if
I'm misrepresenting your ideas :-)) that:
 - something generic, such as class0, class1 will always exist (well, 
   at least class0). They would basically constitute the Xen interface;
 - toolstack will accept more specific names, such as 'big' and 
   'little', and also 'A57' and 'A43' (I'm making up the names), etc.
 - a VM with vCPUs in class0 and class1 will always be created and run 
   on any 2 classes system; a VM with big and little vCPUs will only 
   run on an ARM big.LITTLE incarnation; a VM with A57 and A43 vCPUs 
   will only run on an host that has at least one A57 and one A43 
   pCPUs.

What's not clear to me is how to establish:
 - the ordering among classes;
 - the mapping between Xen's neuter names and the toolstack's (arch) 
   specific ones.

All this being said, yes, if one specify more than one class and
there's only one, as well as if one specify a class that does not
exist, we should abort domain creation. I shall add this to the specs
(it was covered in the thread, I just forgot).

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-09 Thread Jan Beulich
>>> On 08.12.16 at 22:54,  wrote:
> On Thu, 2016-12-08 at 03:14 -0700, Jan Beulich wrote:
>> > > > On 07.12.16 at 19:29,  wrote:
>> > The list of classes is kept ordered from the more powerful to the
>> > less
>> > powerful.
>> > **TODO:** this has been [proposed by 
>> > George](https://lists.xenproject.org/archives/html/xen-devel/2016-0 
>> > 9/msg02212.html).
>> > I like the idea, what do others think? If we agree on that, note
>> > that there
>> > has been no discussion on defining what "more powerful" means,
>> > neither on
>> > x86 (although, not really that interesting, for now, I'd say), nor
>> > on ARM.
>> 
>> Indeed I think there should be no assumption about the ability to
>> order things here: Even if for some initial set of hardware it may
>> be possible to clearly tell which one's more powerful and which
>> one's more weak, already the moment you extend this from
>> compute power to different ISA extensions you'll immediately end
>> up with the possibility of two CPUs have a distinct extra feature
>> compared to one another (say one a crypto extension and the
>> other a wider vector compute engine).
>> 
> Yeah, that was what was puzzling me too. Keeping them ordered has the
> nice property that if a user says the following in a config file:
> 
>  vcpuclass=["0-3:class0", "4-7:class1"]
> 
> (assuming that class0 and class1 are the always available Xen names) it

This, btw, is another aspect I think has a basic problem: class0 and
class1 say nothing about the properties of a class, and hence are
tied to one particular host. I think class names need to be descriptive
and uniform across hosts. That would allow migration of such VMs as
well as prevent starting them on a host not having suitable hardware.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-08 Thread Dario Faggioli
On Thu, 2016-12-08 at 03:14 -0700, Jan Beulich wrote:
> > > > On 07.12.16 at 19:29,  wrote:
> > The list of classes is kept ordered from the more powerful to the
> > less
> > powerful.
> > **TODO:** this has been [proposed by 
> > George](https://lists.xenproject.org/archives/html/xen-devel/2016-0
> > 9/msg02212.html).
> > I like the idea, what do others think? If we agree on that, note
> > that there
> > has been no discussion on defining what "more powerful" means,
> > neither on
> > x86 (although, not really that interesting, for now, I'd say), nor
> > on ARM.
> 
> Indeed I think there should be no assumption about the ability to
> order things here: Even if for some initial set of hardware it may
> be possible to clearly tell which one's more powerful and which
> one's more weak, already the moment you extend this from
> compute power to different ISA extensions you'll immediately end
> up with the possibility of two CPUs have a distinct extra feature
> compared to one another (say one a crypto extension and the
> other a wider vector compute engine).
> 
Yeah, that was what was puzzling me too. Keeping them ordered has the
nice property that if a user says the following in a config file:

 vcpuclass=["0-3:class0", "4-7:class1"]

(assuming that class0 and class1 are the always available Xen names) it
would be always true that vCPUs 0-3 are 'more powerful', no matter on
what host the VM runs (ARM and x86, now and in 5 years, etc), which
would be really nice.

But I really am not sure whether that is possible.

Perhaps George, which thought about this first, has it more clear...

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-08 Thread Dario Faggioli
On Thu, 2016-12-08 at 11:38 +0100, Juergen Gross wrote:
> On 08/12/16 11:27, Dario Faggioli wrote:
> > On Thu, 2016-12-08 at 07:12 +0100, Juergen Gross wrote:
> > > Any idea how to avoid problems in the schedulers related to vcpus
> > > with
> > > different weights? 
> > > 
> > Sure: use Credit2! :-P
> > 
> > And I'm not joking (not entirely, at least), as the alternative is
> > to
> > re-engineer significantly the algorithm inside Credit, which I'm
> > not
> > sure is doable or worthwhile, especially considering we have
> > alternatives.
> 
> So you really solved the following problem in credit2?
> 
So, pinning will always _affect_ scheduling, that is actually its goal.
And in fact, it really should be used when there is no alternative, or
when the scenario is understood well enough, that its effects are known
(or at least known to be beneficial for the workload running on the
host).

In Credit2, weights used to make a vCPU burn credits faster or slower
than the other vCPUs, while in Credit1, the algorithm is much more
complex. Also, in Credit2, everything is computed per-runqueue. Pinning
of course interferes, but should really be less disruptive than in
Credit1.

All this being said, I was not yet around when you came up with the
idea that pinning was disturbing weighted fairness, so I'm not sure
what the original argument was... I'll go back check the email
conversation in the archive. And again, all the times that one can use
cpupool, that should be the preferred solution, but there are
situations where that's just not suitable, and we need pinning.

This case is a little bit border-line. Sure using pinning is not ideal,
and in fact it's only happening in the initial stages. When actually
modifying the scheduler, we will, in Credit2, do something like having
one runqueue per class (or more, but certainly not any runqueues that
"cross" classes, as that would not work), which puts us in a pretty
decent situation, I think. For Credit, let's see, but I'm afraid we
won't be able to guarantee much more than technical correctness (i.e.,
not scheduling on forbidden classes).

> You have three domains with 2 vcpus each and different weights. Run
> them
> on 3 physical cpus with following pinning:
> 
> dom1: pcpu 1 and 2
> dom2: pcpu 2 and 3
> dom3: pcpu 1 and 3
> 
> How do you decide which vcpu to run on which pcpu for how long?
> 
Ok, it was a public holiday here today, so I did not really have time
to think about this example. And tomorrow I'm on PTO. I'll look closely
on Monday.

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-08 Thread Stefano Stabellini
On Thu, 8 Dec 2016, Jan Beulich wrote:
> >>> On 07.12.16 at 19:29,  wrote:
> > ### x86
> > 
> > There is no HMP platform of relevance, for now, in x86 world. Therefore,
> > only one class will exist, and all the CPUs will be set to belong to it.
> > **TODO X86:** is this correct?
> 
> What about the original Xeon Phi (on a PCIe card)?
> 
> > ## Hypervisor
> > 
> > The hypervisor needs to know within which class each of the present CPUs
> > falls. At boot (or, in general, CPU bringup) time, while identifying the 
> > CPU,
> > a list of classes is constructed, and the mapping between each CPU and the
> > class it is determined it should belong, established.
> > 
> > The list of classes is kept ordered from the more powerful to the less
> > powerful.
> > **TODO:** this has been [proposed by 
> > George](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02212.html).
> > I like the idea, what do others think? If we agree on that, note that there
> > has been no discussion on defining what "more powerful" means, neither on
> > x86 (although, not really that interesting, for now, I'd say), nor on ARM.
> 
> Indeed I think there should be no assumption about the ability to
> order things here: Even if for some initial set of hardware it may
> be possible to clearly tell which one's more powerful and which
> one's more weak, already the moment you extend this from
> compute power to different ISA extensions you'll immediately end
> up with the possibility of two CPUs have a distinct extra feature
> compared to one another (say one a crypto extension and the
> other a wider vector compute engine).
> 
> It may be possible to establish partial ordering though, but it's
> not really clear to me what such ordering would be used for.

I think you are right in saying that there might not be a
straightforward ordering from powerful to weak.

Maybe it is better to say that the Xen architecture specific code will
pick a default class (not necessarily class0). The default class can be
changed with a Xen command line parameter or an hypercall.

This way we can have Xen use big cpus by default, but it can be changed
to LITTLE for example, without implying that big or LITTLE is more
powerful, which actually is difficult to determine even on ARM.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-08 Thread Jan Beulich
>>> On 08.12.16 at 11:23,  wrote:
> On Thu, 2016-12-08 at 03:14 -0700, Jan Beulich wrote:
>> > > > On 07.12.16 at 19:29,  wrote:
>> > ### x86
>> > 
>> > There is no HMP platform of relevance, for now, in x86 world.
>> > Therefore,
>> > only one class will exist, and all the CPUs will be set to belong
>> > to it.
>> > **TODO X86:** is this correct?
>> 
>> What about the original Xeon Phi (on a PCIe card)?
>> 
> Well, what I'd say about it is that I did not know about its existence.
> :-)
> 
> Anyway, if we have HMP on x86 already, and we want to support them,
> we'll have to define criteria for building classes there too. Once that
> is done, the rest of this document should be general enough (or at
> least that was the intent).
> 
> About defining those criteria, I'd appreciate whatever input you x86
> experts will be able to share. :-)

Well, the obvious part of the classification would be differences
in CPUID output - vendor, family, model, stepping, feature flags.
I'm not currently aware of ways to identify differing performance,
but I'm also unaware of systems built with CPUs varying in e.g.
clock speeds.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-08 Thread Juergen Gross
On 08/12/16 11:27, Dario Faggioli wrote:
> On Thu, 2016-12-08 at 07:12 +0100, Juergen Gross wrote:
>> On 07/12/16 19:29, Dario Faggioli wrote:
>>> ### Phase 2
>>>
>>> Inside Xen, the various schedulers will be modified to deal
>>> internally with
>>> the fact that vCPUs can only run on pCPUs from the class(es) they
>>> are
>>> associated with. This allows for more efficient implementation, and
>>> paves
>>> the way for enabling more intelligent logic (e.g., for minimizing
>>> power
>>> consumption) in *phase 3*.
>>>
>> Any idea how to avoid problems in the schedulers related to vcpus
>> with
>> different weights? 
>>
> Sure: use Credit2! :-P
> 
> And I'm not joking (not entirely, at least), as the alternative is to
> re-engineer significantly the algorithm inside Credit, which I'm not
> sure is doable or worthwhile, especially considering we have
> alternatives.

So you really solved the following problem in credit2?

You have three domains with 2 vcpus each and different weights. Run them
on 3 physical cpus with following pinning:

dom1: pcpu 1 and 2
dom2: pcpu 2 and 3
dom3: pcpu 1 and 3

How do you decide which vcpu to run on which pcpu for how long?


Juergen

> 
>> Remember, weights and pinning don't go well together,
>> that was the main reason for inventing cpupools. You should at least
>> name that problem. 
>>
> Yes, that's true. I will add a paragraph about it.
> 
> Thanks and Regards,
> Dario
> -- <> (Raistlin Majere)
> - Dario
> Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer,
> Citrix Systems R Ltd., Cambridge (UK)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-08 Thread Dario Faggioli
On Thu, 2016-12-08 at 07:12 +0100, Juergen Gross wrote:
> On 07/12/16 19:29, Dario Faggioli wrote:
> > 
> > Setting and getting the CPU class of a vCPU will happen via two new
> > hypercalls:
> > 
> > * `XEN_DOMCTL_setvcpuclass`
> > * `XEN_DOMCTL_setvcpuclass`
> 
> XEN_DOMCTL_getvcpuclass
> 
Oops, thanks.

> > ### Phase 2
> > 
> > Inside Xen, the various schedulers will be modified to deal
> > internally with
> > the fact that vCPUs can only run on pCPUs from the class(es) they
> > are
> > associated with. This allows for more efficient implementation, and
> > paves
> > the way for enabling more intelligent logic (e.g., for minimizing
> > power
> > consumption) in *phase 3*.
> > 
> Any idea how to avoid problems in the schedulers related to vcpus
> with
> different weights? 
>
Sure: use Credit2! :-P

And I'm not joking (not entirely, at least), as the alternative is to
re-engineer significantly the algorithm inside Credit, which I'm not
sure is doable or worthwhile, especially considering we have
alternatives.

> Remember, weights and pinning don't go well together,
> that was the main reason for inventing cpupools. You should at least
> name that problem. 
>
Yes, that's true. I will add a paragraph about it.

Thanks and Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-08 Thread Dario Faggioli
On Thu, 2016-12-08 at 03:14 -0700, Jan Beulich wrote:
> > > > On 07.12.16 at 19:29,  wrote:
> > ### x86
> > 
> > There is no HMP platform of relevance, for now, in x86 world.
> > Therefore,
> > only one class will exist, and all the CPUs will be set to belong
> > to it.
> > **TODO X86:** is this correct?
> 
> What about the original Xeon Phi (on a PCIe card)?
> 
Well, what I'd say about it is that I did not know about its existence.
:-)

Anyway, if we have HMP on x86 already, and we want to support them,
we'll have to define criteria for building classes there too. Once that
is done, the rest of this document should be general enough (or at
least that was the intent).

About defining those criteria, I'd appreciate whatever input you x86
experts will be able to share. :-)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)

signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-08 Thread Jan Beulich
>>> On 07.12.16 at 19:29,  wrote:
> ### x86
> 
> There is no HMP platform of relevance, for now, in x86 world. Therefore,
> only one class will exist, and all the CPUs will be set to belong to it.
> **TODO X86:** is this correct?

What about the original Xeon Phi (on a PCIe card)?

> ## Hypervisor
> 
> The hypervisor needs to know within which class each of the present CPUs
> falls. At boot (or, in general, CPU bringup) time, while identifying the CPU,
> a list of classes is constructed, and the mapping between each CPU and the
> class it is determined it should belong, established.
> 
> The list of classes is kept ordered from the more powerful to the less
> powerful.
> **TODO:** this has been [proposed by 
> George](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02212.html).
> I like the idea, what do others think? If we agree on that, note that there
> has been no discussion on defining what "more powerful" means, neither on
> x86 (although, not really that interesting, for now, I'd say), nor on ARM.

Indeed I think there should be no assumption about the ability to
order things here: Even if for some initial set of hardware it may
be possible to clearly tell which one's more powerful and which
one's more weak, already the moment you extend this from
compute power to different ISA extensions you'll immediately end
up with the possibility of two CPUs have a distinct extra feature
compared to one another (say one a crypto extension and the
other a wider vector compute engine).

It may be possible to establish partial ordering though, but it's
not really clear to me what such ordering would be used for.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-07 Thread Juergen Gross
On 07/12/16 19:29, Dario Faggioli wrote:
> Setting and getting the CPU class of a vCPU will happen via two new
> hypercalls:
> 
> * `XEN_DOMCTL_setvcpuclass`
> * `XEN_DOMCTL_setvcpuclass`

XEN_DOMCTL_getvcpuclass

> ### Phase 2
> 
> Inside Xen, the various schedulers will be modified to deal internally with
> the fact that vCPUs can only run on pCPUs from the class(es) they are
> associated with. This allows for more efficient implementation, and paves
> the way for enabling more intelligent logic (e.g., for minimizing power
> consumption) in *phase 3*.
> 
> Calling `libxl_set_vcpuaffinity()` from `xl` / libxl is therefore no longer
> necessary and will be avoided (i.e., only `libxl_set_vcpuclass()` will be
> called).

Any idea how to avoid problems in the schedulers related to vcpus with
different weights? Remember, weights and pinning don't go well together,
that was the main reason for inventing cpupools. You should at least
name that problem. In case of vcpus being capable to run on pcpus of
more than one class this problem might surface again.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [DOC RFC] Heterogeneous Multi Processing Support in Xen

2016-12-07 Thread Dario Faggioli
% Heterogeneous Multi Processing Support in Xen
% Revision 1

\clearpage

# Basics

 
 Status: **Design Document**

Architecture(s): x86, arm

   Component(s): Hypervisor and toolstack
 

# Overview

HMP (Heterogeneous Multi Processing) and AMP (Asymmetric Multi Processing)
refer to systems where physical CPUs are not exactly equal. It may be that
they have different processing power, or capabilities, or that each is
specifically designed to run a particular system component.
Most of the times the CPUs have different Instruction Set Architectures (ISA)
or Application Binary Interfaces (ABIs). But they may *just* be different
implementations of the same ISA, in which case they typically differ in
speed, power efficiency or handling of special things (e.g., erratas).

An example is ARM big.LITTLE, which in fact, is the use case that got the
discussion about HMP started. This document, however, is generic, and does
not target only big.LITTLE.

What need proper Xen support are systems and use cases where virtual CPUs
can not be seamlessly moved around all the physical CPUs. In fact, in these
cases, there must be a way to:

* decide and specify on what (set of) physical CPU(s), each vCPU can execute on;
* enforce that a vCPU that can only run on a certain (set of) pCPUs, is never
  actually run anywhere else.

**N.B.:** it is becoming common to refer as AMP or HMP also to systems which
have various kind of co-processors (from crypto engines to graphic hardware),
integrated with the CPUs on the same chip. This is not what this design 
document is about.

# Classes of CPUs

A *class of CPUs* is defined as follows:

1. each pCPU in the system belongs to a class;
2. a class can consist of one or more pCPUs;
3. each pCPU can only be in one class;
4. CPUs belonging to the same class are homogeneous enough that a virtual
   CPU that blocks/is preempted while running on a pCPU of a class can,
   **seamlessly**, unblock/be scheduler on any pCPU of that same class;
5. when a virtual CPU is associated with a (set of) class(es) of CPUs, it
   means that the vCPU can run on all the pCPUs belonging to the said
   class(es).

So, for instance, in architecture Foobar two classes of CPUs exist, class
foo and class bar. If a virtual CPU running on a CPU 0, which is of class
foo, blocks (or is preempted), it can, when it unblocks (or is selected by
the scheduler to run  again), run on CPU 3, still of class foo, but not on
CPU 6, which is of class bar.

## Defining classes

How a class is defined, i.e., what are the specific characteristics that
determine what CPUs belong to which class, is highly architecture specific.

### x86

There is no HMP platform of relevance, for now, in x86 world. Therefore,
only one class will exist, and all the CPUs will be set to belong to it.
**TODO X86:** is this correct?

### ARM

**TODO ARM:** I know nothing about what specifically should be used to
form classes, so I'm deferring this to ARM people.

So far, in the original thread the following ideas came up (well, there's
more, but I don't know enough of ARM to judge what is really relevant about
this topic):

* 
[Julien](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02153.html)
  "I don't think an hardcoded list of processor in Xen is the right solution.
   There are many existing processors and combinations for big.LITTLE so it
   will nearly be impossible to keep updated."
* 
[Julien](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02256.html)
  "Well, before trying to do something clever like that (i.e naming "big" and
  "little"), we need to have upstreamed bindings available to acknowledge the
  difference. AFAICT, it is not yet upstreamed for Device Tree and I don't
  know any static ACPI tables providing the similar information."
* 
[Peng](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02194.html)
  "For how to differentiate cpus, I am looking the linaro eas cpu topology code"

# User details

## Classes of CPUs for the users

It will be possible, in a VM config file, to specify the (set of) class(es)
of each vCPU. This allows creating HMP VMs.

E.g., on ARM, it will be possible to create big.LITTLE VMs which, if run on
big.LITTLE hosts, could leverage the big.LITTLE support of the guest OS kernel
and tools.

For such purpose, a new option will be added to xl config file:

vcpus = "8"
vcpuclass = ["0-2:class0", "3,4:class1,class3", "5:class0, class2", 
"8:class4"]

with the following meaning:

* vCPUs 0, 1, 2 can only run on pcpus of class class0
* vCPUs 3, 4 can run on pcpus of class class1 **and** on pcpus of class class3
* vCPUs 5 can run on pcpus of class class0 **and** on pCPUs of class class2
* for vCPUs 7, since they're not mentioned, default applies
* vCPUs 8 can only run on pcpus of class class4

For the vCPUs for which no class is specified, default behavior applies.

**TODO:** note