On Mon, Feb 16, 2015 at 04:51:43PM +0000, Dario Faggioli wrote:
> On Mon, 2015-02-16 at 16:11 +0000, Wei Liu wrote:
> > On Mon, Feb 16, 2015 at 03:56:21PM +0000, Dario Faggioli wrote:
> > > On Mon, 2015-02-16 at 15:17 +0000, Wei Liu wrote:
> 
> > > > And there is no way to
> > > > specify priority among the group of nodes you specify with a single
> > > > bitmap.
> > > > 
> > > Why do we need such a thing as a 'priority'? What I'm talking about is
> > > making it possible, for each vnode, to specify vnode-to-pnode mapping as
> > > a bitmap of pnode. What we'd do, in presence of a bitmap, would be
> > > allocating the memory by striping it across _all_ the pnodes present in
> > > the bitmap.
> > > 
> > 
> > Should we enforce memory equally stripped across all nodes? If so this
> > should be stated explicitly in the comment of interface.  
> >
> I don't think we should enforce anything... I was much rather describing
> what happens *right* *now* in that scenario, it being documented or not.
> 
> > I can't see
> > that in your original description. I ask "priority" because I
> > interpreted as something else (which is one of many ways to interpret
> > I think).
> > 
> So, if you're saying that, if we use a bitmap, we should write somewhere
> how libxl would use it, I certainly agree. Up to what level of details
> we, at that point, should do that, I'm not sure. I think I'd be fine, as
> a user, if finding it written that "the memory of the vnode will be
> allocated out of the pnodes specified in the bitmap", with no much
> further detail, especially considering the use case for the feature.
> 

This is of course OK. And the most simple implementation of this
strategy is to pass on the node information to Xen to let Xen decide
which node of the several nodes specified to allocate from. This would
be trivial.

I think having a vnode mapped able to map to several pnode would be
good. I'm just trying to figure out if a single bitmap is enough to
cover all the sensible usecases. Or what should we say about that
interface.

> > If it's up to libxl to make dynamic choice, we should also say that. But
> > this is not very useful to user because libxl's algorithm can change
> > isn't it? How do users expect to know that across versions of Xen?
> > 
> Why does he need to? This would be something enabling a bit more of
> flexibility, if one wants it, or a bit less worse performance, in some
> specific situations, and all this pretty much independently from the
> algorithm used inside libxl, I think.
> 
> As I said, if there is only 1GB free on all pnodes, the user will be
> allowed to specify a set of pnodes for the vnodes, instead of not being
> able to use vnuma at all, no matter how libxl (or whoever else) will
> actually split the memory, in this, previous or future version of Xen...
> This is the scenario I'm talking about, and in such a scenario, knowing
> how the split happens, does not really help much, it is just the
> _possibility_ of splitting, that helps...
> 

I don't see this problem that way though.

Basically you're saying, a user wants to use vNUMA, then at some point
he / she finds out there is no enough memory in each specified pnode to
accommodate his / her requirement, then he / she changes the
configuration on the fly.

In reality, if you have mass deployment you probably won't do that.  You
might just want to use the same configuration all the time. Now this
configuration has different performance on different versions of Xen
because the algorithm is not a fixed algorithm (which is not necessary a
bad thing though, because you can have more sensible algorithm to
improve performance).

> > > If we allow the user (or the automatic placement algorithm) to specify a
> > > bitmap of pnode for each vnode, he could put, say, vnode #1 on pnode #0
> > > and #2, which maybe are really close (in terms of NUMA distances) to
> > > each other, and vnode #2 to pnode #5 and #6 (close to each others too).
> > > This would give worst performance than having each vnode on just one
> > > pnode, but, most likely, better performance than the scenario described
> > > right above.
> > > 
> > 
> > I get what you mean. So by writing the above paragraphs, you sort of
> > confirm that there still are too many implications in the algorithms,
> > right? A user cannot just tell from the interface what the behaviour is
> > going to be.  
> >
> An user can tell that, if he wants a vnode 2GB wide, and there is no
> pnode with 2GB free, but the sum of free memory in pnode #4 and #6 is >=
> 2GB, he can still use vNUMA, by paying the (small or high will depends
> on more factors) price of having that vnode split in two (or more!).
> 

What if #4 and #6 do have > 2GB ram each?  What will the behaviour be?
Does it imply better or worse performance?  Again, I'm thinking about
migrating the same configuration to another version of Xen, or even just
another host that has enough memory.

I guess the best we can say (at this point, if we're to use a bitmap),
is that memory will allocate from the nodes specified, user should not
expect any specific behaviour -- that basically is telling user not to
specify multiple nodes...

> I think there would be room for some increased user satisfaction in
> this, even without knowing much and/or being in control on how exactly
> the split happens, as there are chances for performance to be (if the
> thing is used properly) better than in the no-vNUMA case, which is what
> we're after.
> 
> > You can of course say the algorithm is fixed but I don't
> > think we want to do that?
> > 
> I don't want to, but I don't think it's needed.
> 
> Anyway, I'm more than ok if we want to defer the discussion to after
> this series is in. It will require a further change in the interface,
> but I don't think it would be a terrible price to pay, if we decide the
> feature is worth.
> 
> Or, and that was the other thing I was suggesting, we can have the
> bitmap in vnode_info since now, but then only accept ints in xl config
> parsing, and enforce the weight of the bitmap to be 1 (and perhaps print
> a warning) for now. This would not require changing the API in future,
> it'd just be a matter of changing the xl config file parsing. The
> "problem" would still stand for libxl callers different than xl, though,
> I know.
> 

Note that the uint32_t mapping has a very rigid semantics.  As long as
you give me a well-defined semantics of that bitmap I'm fine with this.
Otherwise I feel more comfortable with the interface as it is.

Wei.

> Regards,
> Dario
> 
> > Wei.
> > 
> > > Hope I made myself clear enough :-)
> > > 
> > > Regards,
> > > Dario
> > 
> > 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to