On Mon, Feb 16, 2015 at 04:51:43PM +0000, Dario Faggioli wrote: > On Mon, 2015-02-16 at 16:11 +0000, Wei Liu wrote: > > On Mon, Feb 16, 2015 at 03:56:21PM +0000, Dario Faggioli wrote: > > > On Mon, 2015-02-16 at 15:17 +0000, Wei Liu wrote: > > > > > And there is no way to > > > > specify priority among the group of nodes you specify with a single > > > > bitmap. > > > > > > > Why do we need such a thing as a 'priority'? What I'm talking about is > > > making it possible, for each vnode, to specify vnode-to-pnode mapping as > > > a bitmap of pnode. What we'd do, in presence of a bitmap, would be > > > allocating the memory by striping it across _all_ the pnodes present in > > > the bitmap. > > > > > > > Should we enforce memory equally stripped across all nodes? If so this > > should be stated explicitly in the comment of interface. > > > I don't think we should enforce anything... I was much rather describing > what happens *right* *now* in that scenario, it being documented or not. > > > I can't see > > that in your original description. I ask "priority" because I > > interpreted as something else (which is one of many ways to interpret > > I think). > > > So, if you're saying that, if we use a bitmap, we should write somewhere > how libxl would use it, I certainly agree. Up to what level of details > we, at that point, should do that, I'm not sure. I think I'd be fine, as > a user, if finding it written that "the memory of the vnode will be > allocated out of the pnodes specified in the bitmap", with no much > further detail, especially considering the use case for the feature. >
This is of course OK. And the most simple implementation of this strategy is to pass on the node information to Xen to let Xen decide which node of the several nodes specified to allocate from. This would be trivial. I think having a vnode mapped able to map to several pnode would be good. I'm just trying to figure out if a single bitmap is enough to cover all the sensible usecases. Or what should we say about that interface. > > If it's up to libxl to make dynamic choice, we should also say that. But > > this is not very useful to user because libxl's algorithm can change > > isn't it? How do users expect to know that across versions of Xen? > > > Why does he need to? This would be something enabling a bit more of > flexibility, if one wants it, or a bit less worse performance, in some > specific situations, and all this pretty much independently from the > algorithm used inside libxl, I think. > > As I said, if there is only 1GB free on all pnodes, the user will be > allowed to specify a set of pnodes for the vnodes, instead of not being > able to use vnuma at all, no matter how libxl (or whoever else) will > actually split the memory, in this, previous or future version of Xen... > This is the scenario I'm talking about, and in such a scenario, knowing > how the split happens, does not really help much, it is just the > _possibility_ of splitting, that helps... > I don't see this problem that way though. Basically you're saying, a user wants to use vNUMA, then at some point he / she finds out there is no enough memory in each specified pnode to accommodate his / her requirement, then he / she changes the configuration on the fly. In reality, if you have mass deployment you probably won't do that. You might just want to use the same configuration all the time. Now this configuration has different performance on different versions of Xen because the algorithm is not a fixed algorithm (which is not necessary a bad thing though, because you can have more sensible algorithm to improve performance). > > > If we allow the user (or the automatic placement algorithm) to specify a > > > bitmap of pnode for each vnode, he could put, say, vnode #1 on pnode #0 > > > and #2, which maybe are really close (in terms of NUMA distances) to > > > each other, and vnode #2 to pnode #5 and #6 (close to each others too). > > > This would give worst performance than having each vnode on just one > > > pnode, but, most likely, better performance than the scenario described > > > right above. > > > > > > > I get what you mean. So by writing the above paragraphs, you sort of > > confirm that there still are too many implications in the algorithms, > > right? A user cannot just tell from the interface what the behaviour is > > going to be. > > > An user can tell that, if he wants a vnode 2GB wide, and there is no > pnode with 2GB free, but the sum of free memory in pnode #4 and #6 is >= > 2GB, he can still use vNUMA, by paying the (small or high will depends > on more factors) price of having that vnode split in two (or more!). > What if #4 and #6 do have > 2GB ram each? What will the behaviour be? Does it imply better or worse performance? Again, I'm thinking about migrating the same configuration to another version of Xen, or even just another host that has enough memory. I guess the best we can say (at this point, if we're to use a bitmap), is that memory will allocate from the nodes specified, user should not expect any specific behaviour -- that basically is telling user not to specify multiple nodes... > I think there would be room for some increased user satisfaction in > this, even without knowing much and/or being in control on how exactly > the split happens, as there are chances for performance to be (if the > thing is used properly) better than in the no-vNUMA case, which is what > we're after. > > > You can of course say the algorithm is fixed but I don't > > think we want to do that? > > > I don't want to, but I don't think it's needed. > > Anyway, I'm more than ok if we want to defer the discussion to after > this series is in. It will require a further change in the interface, > but I don't think it would be a terrible price to pay, if we decide the > feature is worth. > > Or, and that was the other thing I was suggesting, we can have the > bitmap in vnode_info since now, but then only accept ints in xl config > parsing, and enforce the weight of the bitmap to be 1 (and perhaps print > a warning) for now. This would not require changing the API in future, > it'd just be a matter of changing the xl config file parsing. The > "problem" would still stand for libxl callers different than xl, though, > I know. > Note that the uint32_t mapping has a very rigid semantics. As long as you give me a well-defined semantics of that bitmap I'm fine with this. Otherwise I feel more comfortable with the interface as it is. Wei. > Regards, > Dario > > > Wei. > > > > > Hope I made myself clear enough :-) > > > > > > Regards, > > > Dario > > > > > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel