Re: [Xen-devel] Xen ballooning interface
On Tue, Aug 21, 2018 at 10:58:18AM +0100, Wei Liu wrote: > On Mon, Aug 13, 2018 at 03:06:10PM +0200, Juergen Gross wrote: > > Today's interface of Xen for memory ballooning is quite a mess. There > > are some shortcomings which should be addressed somehow. After a > > discussion on IRC there was consensus we should try to design a new > > interface addressing the current and probably future needs. > > > > Current interface > > - > > A guest has access to the following memory related information (all for > > x86): > > > > - the memory map (E820 or EFI) > > - ACPI tables for HVM/PVH guests > > - actual maximum size via XENMEM_maximum_reservation hypercall (the > > hypervisor will deny attempts of the guest to allocate more) > > - current size via XENMEM_current_reservation hypercall > > - Xenstore entry "memory/static-max" for the upper bound of memory size > > (information for the guest which memory size might be reached without > > hotplugging memory) > > - Xenstore entry "memory/target" for current target size (used for > > ballooning: Xen tools set the size the guest should try to reach by > > allocating or releasing memory) > > > > The main problem with this interface is the guest doesn't know in all > > cases which memory is included in the values (e.g. memory allocated by > > Xen tools for the firmware of a HVM guest is included in the Xenstore > > and hypercall information, but not in the memory map). > > > > Somewhat related: who has the canonical source of all the information? > I think Xen should have that, but it is unclear to me how toolstack can > get such information from Xen. ISTR currently it is possible to get > current number of pages and maximum numbers of pages, both of which > contain pages for firmware which are visible to guests (E820 / EFI > reserved). > > Without that fixed, the new interface won't be of much use because the > information toolstack put in the new nodes is still potentially wrong. > Currently toolstack applies some constant fudge numbers, which is a bit > unpleasant. > > It would be at least useful to break down the accounting inside the > hypervisor a bit more: > > * max_pages : maximum number of pages a domain can use for whatever > purpose (ram + firmware + others) > * curr_pages : current number of pages a domain is using (ram + ...) > * max_ram_pages : maximum number of pages a domain can use for ram > * curr_ram_pages : ... The problem here is that new hypercalls would have to be added, because firmware running inside the guest picks RAM regions and changes them to reserved for example, and the firmware would need a way to tell Xen about those changes. We could even have something like an expanded memory map with more types in order to describe MMIO regions trapped inside of the hypervisor, firmware regions, ram, etc... that could be modified by both the toolstack and Xen. Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen ballooning interface
On Mon, Aug 13, 2018 at 03:06:10PM +0200, Juergen Gross wrote: > Today's interface of Xen for memory ballooning is quite a mess. There > are some shortcomings which should be addressed somehow. After a > discussion on IRC there was consensus we should try to design a new > interface addressing the current and probably future needs. > > Current interface > - > A guest has access to the following memory related information (all for > x86): > > - the memory map (E820 or EFI) > - ACPI tables for HVM/PVH guests > - actual maximum size via XENMEM_maximum_reservation hypercall (the > hypervisor will deny attempts of the guest to allocate more) > - current size via XENMEM_current_reservation hypercall > - Xenstore entry "memory/static-max" for the upper bound of memory size > (information for the guest which memory size might be reached without > hotplugging memory) > - Xenstore entry "memory/target" for current target size (used for > ballooning: Xen tools set the size the guest should try to reach by > allocating or releasing memory) > > The main problem with this interface is the guest doesn't know in all > cases which memory is included in the values (e.g. memory allocated by > Xen tools for the firmware of a HVM guest is included in the Xenstore > and hypercall information, but not in the memory map). > Somewhat related: who has the canonical source of all the information? I think Xen should have that, but it is unclear to me how toolstack can get such information from Xen. ISTR currently it is possible to get current number of pages and maximum numbers of pages, both of which contain pages for firmware which are visible to guests (E820 / EFI reserved). Without that fixed, the new interface won't be of much use because the information toolstack put in the new nodes is still potentially wrong. Currently toolstack applies some constant fudge numbers, which is a bit unpleasant. It would be at least useful to break down the accounting inside the hypervisor a bit more: * max_pages : maximum number of pages a domain can use for whatever purpose (ram + firmware + others) * curr_pages : current number of pages a domain is using (ram + ...) * max_ram_pages : maximum number of pages a domain can use for ram * curr_ram_pages : ... etc etc IIRC there are current two schools of thought which disagrees with each other what the "maximum number of pages" in hypervisor means. This is not saying we can't design new interfaces, it is just that it wouldn't be very useful IMHO. Wei. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen ballooning interface
On 14/08/18 09:34, Jan Beulich wrote: On 14.08.18 at 09:19, wrote: >> On 14/08/18 09:02, Jan Beulich wrote: >> On 13.08.18 at 17:44, wrote: On 13/08/18 17:29, Jan Beulich wrote: On 13.08.18 at 16:20, wrote: >> On 13/08/18 15:54, Jan Beulich wrote: >> On 13.08.18 at 15:06, wrote: Suggested new interface --- Hypercalls, memory map(s) and ACPI tables should stay the same (for compatibility reasons or because they are architectural interfaces). As the main confusion in the current interface is related to the specification of the target memory size this part of the interface should be changed: specifying the size of the ballooned area instead is much clearer and will be the same for all guest types (no firmware memory or magic additions involved). >>> >>> But isn't this backwards? The balloon size is a piece of information >>> internal to the guest. Why should the outside world know or care? >> >> Instead of specifying an absolute value to reach you'd specify how much >> memory the guest should stay below its maximum. I think this is a valid >> approach. > > But with you vNUMA model there's no single such value, and nothing > like a "maximum" (which would need to be per virtual node afaics). With vNUMA there is a current value of memory per node supplied by the tools and a maximum per node can be caclulated the same way. >>> >>> Can it? If so, I must be overlooking some accounting done >>> somewhere. I'm only aware of a global maximum. >> >> The tools set the vnuma information for the guest. How do they do this >> without knowing the memory size per vnuma node? > > That's the current (initial) size, not the maximum. Which is the same in the current implementation: libxl__vnuma_config_check() will fail if the memory of all vnuma nodes won't sum up to the max memory of the domain. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen ballooning interface
>>> On 14.08.18 at 09:19, wrote: > On 14/08/18 09:02, Jan Beulich wrote: > On 13.08.18 at 17:44, wrote: >>> On 13/08/18 17:29, Jan Beulich wrote: >>> On 13.08.18 at 16:20, wrote: > On 13/08/18 15:54, Jan Beulich wrote: > On 13.08.18 at 15:06, wrote: >>> Suggested new interface >>> --- >>> Hypercalls, memory map(s) and ACPI tables should stay the same (for >>> compatibility reasons or because they are architectural interfaces). >>> >>> As the main confusion in the current interface is related to the >>> specification of the target memory size this part of the interface >>> should be changed: specifying the size of the ballooned area instead >>> is much clearer and will be the same for all guest types (no firmware >>> memory or magic additions involved). >> >> But isn't this backwards? The balloon size is a piece of information >> internal to the guest. Why should the outside world know or care? > > Instead of specifying an absolute value to reach you'd specify how much > memory the guest should stay below its maximum. I think this is a valid > approach. But with you vNUMA model there's no single such value, and nothing like a "maximum" (which would need to be per virtual node afaics). >>> >>> With vNUMA there is a current value of memory per node supplied by the >>> tools and a maximum per node can be caclulated the same way. >> >> Can it? If so, I must be overlooking some accounting done >> somewhere. I'm only aware of a global maximum. > > The tools set the vnuma information for the guest. How do they do this > without knowing the memory size per vnuma node? That's the current (initial) size, not the maximum. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen ballooning interface
On 14/08/18 09:02, Jan Beulich wrote: On 13.08.18 at 17:44, wrote: >> On 13/08/18 17:29, Jan Beulich wrote: >> On 13.08.18 at 16:20, wrote: On 13/08/18 15:54, Jan Beulich wrote: On 13.08.18 at 15:06, wrote: >> Suggested new interface >> --- >> Hypercalls, memory map(s) and ACPI tables should stay the same (for >> compatibility reasons or because they are architectural interfaces). >> >> As the main confusion in the current interface is related to the >> specification of the target memory size this part of the interface >> should be changed: specifying the size of the ballooned area instead >> is much clearer and will be the same for all guest types (no firmware >> memory or magic additions involved). > > But isn't this backwards? The balloon size is a piece of information > internal to the guest. Why should the outside world know or care? Instead of specifying an absolute value to reach you'd specify how much memory the guest should stay below its maximum. I think this is a valid approach. >>> >>> But with you vNUMA model there's no single such value, and nothing >>> like a "maximum" (which would need to be per virtual node afaics). >> >> With vNUMA there is a current value of memory per node supplied by the >> tools and a maximum per node can be caclulated the same way. > > Can it? If so, I must be overlooking some accounting done > somewhere. I'm only aware of a global maximum. The tools set the vnuma information for the guest. How do they do this without knowing the memory size per vnuma node? > >> This results in a balloon size per node. >> >> There is still the option to let the guest adjust the per node balloon >> sizes after reaching the final memory size or maybe during the process >> of ballooning at a certain rate. > > I'm probably increasingly confused: Shouldn't, for whichever value > in xenstore, there be a firm determination of which single party is > supposed to modify a value? Aiui the intention is for the (target) > balloon size to be set by the tools. Sorry if I wasn't clear enough here: the guest shouldn't rewrite the target balloon size, but e.g. memory/vnode/balloon-size. > >> Any further thoughts on this? > > The other problem we've always had was that address information > could not be conveyed to the driver. The worst example in the past > was that 32-bit PV domains can't run on arbitrarily high underlying > physical addresses, but of course there are other cases where > memory below a certain boundary may be needed. The obvious > problem with directly exposing address information through the > interface is that for HVM guests machine addresses are meaningless. > Hence I wonder whether a dedicated "balloon out this page if you > can" mechanism would be something to consider. Isn't this a problem orthogonal to the one we are discussing here? >>> >>> Yes, but I think we shouldn't design a new interface without >>> considering all current shortcomings. >> >> I don't think the suggested interface would make it harder to add a way >> to request special pages to be preferred in the ballooning process. > > Address and (virtual) node may conflict with one another. But I > think we've meanwhile settled on the node value to only be a hint > in a request. I think so, yes. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen ballooning interface
On 13/08/18 17:29, Jan Beulich wrote: On 13.08.18 at 16:20, wrote: >> On 13/08/18 15:54, Jan Beulich wrote: >> On 13.08.18 at 15:06, wrote: Suggested new interface --- Hypercalls, memory map(s) and ACPI tables should stay the same (for compatibility reasons or because they are architectural interfaces). As the main confusion in the current interface is related to the specification of the target memory size this part of the interface should be changed: specifying the size of the ballooned area instead is much clearer and will be the same for all guest types (no firmware memory or magic additions involved). >>> >>> But isn't this backwards? The balloon size is a piece of information >>> internal to the guest. Why should the outside world know or care? >> >> Instead of specifying an absolute value to reach you'd specify how much >> memory the guest should stay below its maximum. I think this is a valid >> approach. > > But with you vNUMA model there's no single such value, and nothing > like a "maximum" (which would need to be per virtual node afaics). With vNUMA there is a current value of memory per node supplied by the tools and a maximum per node can be caclulated the same way. This results in a balloon size per node. There is still the option to let the guest adjust the per node balloon sizes after reaching the final memory size or maybe during the process of ballooning at a certain rate. > Any further thoughts on this? >>> >>> The other problem we've always had was that address information >>> could not be conveyed to the driver. The worst example in the past >>> was that 32-bit PV domains can't run on arbitrarily high underlying >>> physical addresses, but of course there are other cases where >>> memory below a certain boundary may be needed. The obvious >>> problem with directly exposing address information through the >>> interface is that for HVM guests machine addresses are meaningless. >>> Hence I wonder whether a dedicated "balloon out this page if you >>> can" mechanism would be something to consider. >> >> Isn't this a problem orthogonal to the one we are discussing here? > > Yes, but I think we shouldn't design a new interface without > considering all current shortcomings. I don't think the suggested interface would make it harder to add a way to request special pages to be preferred in the ballooning process. > >> I'd rather do a localhost guest migration to free specific pages a >> guest is owning and tell the Xen memory allocator not to hand them >> out to the new guest created by the migration. > > There may not be enough memory to do a localhost migration. > Ballooning, after all, may be done just because of a memory > shortage. True. Still I believe adding the tooling to identify domains owning needed memory pages and demand them to balloon those out in order to make use of those pages for creation of a special domain is nothing which is going to happen soon. So as long as we are confident that the new interface wouldn't block such a usage I think we are fine. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen ballooning interface
>>> On 13.08.18 at 16:20, wrote: > On 13/08/18 15:54, Jan Beulich wrote: > On 13.08.18 at 15:06, wrote: >>> Suggested new interface >>> --- >>> Hypercalls, memory map(s) and ACPI tables should stay the same (for >>> compatibility reasons or because they are architectural interfaces). >>> >>> As the main confusion in the current interface is related to the >>> specification of the target memory size this part of the interface >>> should be changed: specifying the size of the ballooned area instead >>> is much clearer and will be the same for all guest types (no firmware >>> memory or magic additions involved). >> >> But isn't this backwards? The balloon size is a piece of information >> internal to the guest. Why should the outside world know or care? > > Instead of specifying an absolute value to reach you'd specify how much > memory the guest should stay below its maximum. I think this is a valid > approach. But with you vNUMA model there's no single such value, and nothing like a "maximum" (which would need to be per virtual node afaics). >>> Any further thoughts on this? >> >> The other problem we've always had was that address information >> could not be conveyed to the driver. The worst example in the past >> was that 32-bit PV domains can't run on arbitrarily high underlying >> physical addresses, but of course there are other cases where >> memory below a certain boundary may be needed. The obvious >> problem with directly exposing address information through the >> interface is that for HVM guests machine addresses are meaningless. >> Hence I wonder whether a dedicated "balloon out this page if you >> can" mechanism would be something to consider. > > Isn't this a problem orthogonal to the one we are discussing here? Yes, but I think we shouldn't design a new interface without considering all current shortcomings. > I'd rather do a localhost guest migration to free specific pages a > guest is owning and tell the Xen memory allocator not to hand them > out to the new guest created by the migration. There may not be enough memory to do a localhost migration. Ballooning, after all, may be done just because of a memory shortage. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen ballooning interface
On Mon, Aug 13, 2018 at 04:27:06PM +0200, Juergen Gross wrote: > On 13/08/18 16:12, Roger Pau Monné wrote: > > On Mon, Aug 13, 2018 at 03:06:10PM +0200, Juergen Gross wrote: > > Currently as you say there's a difference between the xenstore target > > and the guest memory map, because some memory is used by the firmware. > > In order to solve this the toolstack won't provide an absolute memory > > target but instead a relative one to the guest that contains the > > balloon size. > > > > But the toolstack interface (xl) still uses mem-set which is an > > absolute value. How is the toolstack going to accurately calculate the > > balloon size without knowing the extra memory used by the firmware? > > mem-set will make use of the current allocation the tools know about and > add/subtract the difference to the new value to/from the target balloon > size. I don't think firmware will eat away memory when the guest OS is > already running. :-) > > The main difference to today's situation is that the same component > which did the initial calculation how much memory should be allocated is > doing the math in case of ballooning now. So no guesswork any longer. Right, it doesn't matter how much memory is used by the firmware because the guest is going to balloon down an exact amount given by the toolstack, so that at the end of the ballooning the used memory is going to match the toolstack expectations. Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen ballooning interface
On 13/08/18 16:12, Roger Pau Monné wrote: > On Mon, Aug 13, 2018 at 03:06:10PM +0200, Juergen Gross wrote: >> Today's interface of Xen for memory ballooning is quite a mess. There >> are some shortcomings which should be addressed somehow. After a >> discussion on IRC there was consensus we should try to design a new >> interface addressing the current and probably future needs. > > Thanks for doing this! Memory accounting is quite messy at the moment > :(. > > [...] >> Open questions >> -- >> Should we add memory size information to the memory/vnode nodes? >> >> Should the guest add information about its current balloon sizes to the >> memory/vnode nodes (i.e. after ballooning, or every x seconds while >> ballooning)? >> >> Should we specify whether the guest is free to balloon another vnode >> than specified? > > What if the guest simply doesn't support NUMA and doesn't know > anything about nodes? Okay, that's a rather good answer to this question. :-) >> Should memory hotplug (at least for PV domains) use the vnode specific >> Xenstore paths, too, if supported by the guest? > > Is extra memory hotplug going to set: > > memory/vnode/target-balloon-size = -1000 > > In order to tell the guest it can hotplug past the boot time amount of > memory? Interesting idea. > >> Any further thoughts on this? > > Isn't this just moving the memory accounting problem to another piece > of software? > > Currently as you say there's a difference between the xenstore target > and the guest memory map, because some memory is used by the firmware. > In order to solve this the toolstack won't provide an absolute memory > target but instead a relative one to the guest that contains the > balloon size. > > But the toolstack interface (xl) still uses mem-set which is an > absolute value. How is the toolstack going to accurately calculate the > balloon size without knowing the extra memory used by the firmware? mem-set will make use of the current allocation the tools know about and add/subtract the difference to the new value to/from the target balloon size. I don't think firmware will eat away memory when the guest OS is already running. :-) The main difference to today's situation is that the same component which did the initial calculation how much memory should be allocated is doing the math in case of ballooning now. So no guesswork any longer. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen ballooning interface
On 13/08/18 15:54, Jan Beulich wrote: On 13.08.18 at 15:06, wrote: >> Suggested new interface >> --- >> Hypercalls, memory map(s) and ACPI tables should stay the same (for >> compatibility reasons or because they are architectural interfaces). >> >> As the main confusion in the current interface is related to the >> specification of the target memory size this part of the interface >> should be changed: specifying the size of the ballooned area instead >> is much clearer and will be the same for all guest types (no firmware >> memory or magic additions involved). > > But isn't this backwards? The balloon size is a piece of information > internal to the guest. Why should the outside world know or care? Instead of specifying an absolute value to reach you'd specify how much memory the guest should stay below its maximum. I think this is a valid approach. > What if the guest internals don't even allow the balloon to be the > size requested? Same as today: what if the guest internals don't even allow to reach the requested target size? > >> Open questions >> -- >> Should we add memory size information to the memory/vnode nodes? >> >> Should the guest add information about its current balloon sizes to the >> memory/vnode nodes (i.e. after ballooning, or every x seconds while >> ballooning)? >> >> Should we specify whether the guest is free to balloon another vnode >> than specified? > > Ballooning out _some_ memory is always going to be better than > ballooning out none at all. I think the node can only serve as a hint > here. I agree. I just wanted to point out we need to define the possible reactions to such a situation. > >> Should memory hotplug (at least for PV domains) use the vnode specific >> Xenstore paths, too, if supported by the guest? >> >> >> Any further thoughts on this? > > The other problem we've always had was that address information > could not be conveyed to the driver. The worst example in the past > was that 32-bit PV domains can't run on arbitrarily high underlying > physical addresses, but of course there are other cases where > memory below a certain boundary may be needed. The obvious > problem with directly exposing address information through the > interface is that for HVM guests machine addresses are meaningless. > Hence I wonder whether a dedicated "balloon out this page if you > can" mechanism would be something to consider. Isn't this a problem orthogonal to the one we are discussing here? I'd rather do a localhost guest migration to free specific pages a guest is owning and tell the Xen memory allocator not to hand them out to the new guest created by the migration. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen ballooning interface
On Mon, Aug 13, 2018 at 03:06:10PM +0200, Juergen Gross wrote: > Today's interface of Xen for memory ballooning is quite a mess. There > are some shortcomings which should be addressed somehow. After a > discussion on IRC there was consensus we should try to design a new > interface addressing the current and probably future needs. Thanks for doing this! Memory accounting is quite messy at the moment :(. [...] > Open questions > -- > Should we add memory size information to the memory/vnode nodes? > > Should the guest add information about its current balloon sizes to the > memory/vnode nodes (i.e. after ballooning, or every x seconds while > ballooning)? > > Should we specify whether the guest is free to balloon another vnode > than specified? What if the guest simply doesn't support NUMA and doesn't know anything about nodes? > Should memory hotplug (at least for PV domains) use the vnode specific > Xenstore paths, too, if supported by the guest? Is extra memory hotplug going to set: memory/vnode/target-balloon-size = -1000 In order to tell the guest it can hotplug past the boot time amount of memory? > Any further thoughts on this? Isn't this just moving the memory accounting problem to another piece of software? Currently as you say there's a difference between the xenstore target and the guest memory map, because some memory is used by the firmware. In order to solve this the toolstack won't provide an absolute memory target but instead a relative one to the guest that contains the balloon size. But the toolstack interface (xl) still uses mem-set which is an absolute value. How is the toolstack going to accurately calculate the balloon size without knowing the extra memory used by the firmware? Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Xen ballooning interface
>>> On 13.08.18 at 15:06, wrote: > Suggested new interface > --- > Hypercalls, memory map(s) and ACPI tables should stay the same (for > compatibility reasons or because they are architectural interfaces). > > As the main confusion in the current interface is related to the > specification of the target memory size this part of the interface > should be changed: specifying the size of the ballooned area instead > is much clearer and will be the same for all guest types (no firmware > memory or magic additions involved). But isn't this backwards? The balloon size is a piece of information internal to the guest. Why should the outside world know or care? What if the guest internals don't even allow the balloon to be the size requested? > Open questions > -- > Should we add memory size information to the memory/vnode nodes? > > Should the guest add information about its current balloon sizes to the > memory/vnode nodes (i.e. after ballooning, or every x seconds while > ballooning)? > > Should we specify whether the guest is free to balloon another vnode > than specified? Ballooning out _some_ memory is always going to be better than ballooning out none at all. I think the node can only serve as a hint here. > Should memory hotplug (at least for PV domains) use the vnode specific > Xenstore paths, too, if supported by the guest? > > > Any further thoughts on this? The other problem we've always had was that address information could not be conveyed to the driver. The worst example in the past was that 32-bit PV domains can't run on arbitrarily high underlying physical addresses, but of course there are other cases where memory below a certain boundary may be needed. The obvious problem with directly exposing address information through the interface is that for HVM guests machine addresses are meaningless. Hence I wonder whether a dedicated "balloon out this page if you can" mechanism would be something to consider. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Xen ballooning interface
Today's interface of Xen for memory ballooning is quite a mess. There are some shortcomings which should be addressed somehow. After a discussion on IRC there was consensus we should try to design a new interface addressing the current and probably future needs. Current interface - A guest has access to the following memory related information (all for x86): - the memory map (E820 or EFI) - ACPI tables for HVM/PVH guests - actual maximum size via XENMEM_maximum_reservation hypercall (the hypervisor will deny attempts of the guest to allocate more) - current size via XENMEM_current_reservation hypercall - Xenstore entry "memory/static-max" for the upper bound of memory size (information for the guest which memory size might be reached without hotplugging memory) - Xenstore entry "memory/target" for current target size (used for ballooning: Xen tools set the size the guest should try to reach by allocating or releasing memory) The main problem with this interface is the guest doesn't know in all cases which memory is included in the values (e.g. memory allocated by Xen tools for the firmware of a HVM guest is included in the Xenstore and hypercall information, but not in the memory map). So without tweaking the available information a HVM guest booted with a certain amount of memory will believe it has to balloon up, as the target value in Xenstore will be larger than the memory the guest assumes to have available according to the memory map. An additional complexity is added by Xen tools which add a magic size constant depending on guest type to the Xenstore values. The current interface has no way to specify (virtual) NUMA nodes for ballooning. In case vNUMA is being added to Xen the ballooning interface needs an extension, too. Suggested new interface --- Hypercalls, memory map(s) and ACPI tables should stay the same (for compatibility reasons or because they are architectural interfaces). As the main confusion in the current interface is related to the specification of the target memory size this part of the interface should be changed: specifying the size of the ballooned area instead is much clearer and will be the same for all guest types (no firmware memory or magic additions involved). In order to support vNUMA the balloon size should be per vNUMA node. With the new interface in use Xen tools will calculate the balloon size per vnode and write the related values to Xenstore: memory/vnode/target-balloon-size The guest will have setup a watch on those entries, so it can react on a modification as today. The guest will indicate support of the new ballooning interface by writing the value "1" into Xenstore entry control/feature-balloon-vnode. In case Xen supports the new interface and the guest does so, too, only the new interface should be used. Xen tools will remove the (old) node memory/target-size in this case. Open questions -- Should we add memory size information to the memory/vnode nodes? Should the guest add information about its current balloon sizes to the memory/vnode nodes (i.e. after ballooning, or every x seconds while ballooning)? Should we specify whether the guest is free to balloon another vnode than specified? Should memory hotplug (at least for PV domains) use the vnode specific Xenstore paths, too, if supported by the guest? Any further thoughts on this? Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel