Re: [Xen-devel] Xen ballooning interface

2018-08-21 Thread Roger Pau Monné
On Tue, Aug 21, 2018 at 10:58:18AM +0100, Wei Liu wrote:
> On Mon, Aug 13, 2018 at 03:06:10PM +0200, Juergen Gross wrote:
> > Today's interface of Xen for memory ballooning is quite a mess. There
> > are some shortcomings which should be addressed somehow. After a
> > discussion on IRC there was consensus we should try to design a new
> > interface addressing the current and probably future needs.
> > 
> > Current interface
> > -
> > A guest has access to the following memory related information (all for
> > x86):
> > 
> > - the memory map (E820 or EFI)
> > - ACPI tables for HVM/PVH guests
> > - actual maximum size via XENMEM_maximum_reservation hypercall (the
> >   hypervisor will deny attempts of the guest to allocate more)
> > - current size via XENMEM_current_reservation hypercall
> > - Xenstore entry "memory/static-max" for the upper bound of memory size
> >   (information for the guest which memory size might be reached without
> >   hotplugging memory)
> > - Xenstore entry "memory/target" for current target size (used for
> >   ballooning: Xen tools set the size the guest should try to reach by
> >   allocating or releasing memory)
> > 
> > The main problem with this interface is the guest doesn't know in all
> > cases which memory is included in the values (e.g. memory allocated by
> > Xen tools for the firmware of a HVM guest is included in the Xenstore
> > and hypercall information, but not in the memory map).
> > 
> 
> Somewhat related: who has the canonical source of all the information?
> I think Xen should have that, but it is unclear to me how toolstack can
> get such information from Xen. ISTR currently it is possible to get
> current number of pages and maximum numbers of pages, both of which
> contain pages for firmware which are visible to guests (E820 / EFI
> reserved).
> 
> Without that fixed, the new interface won't be of much use because the
> information toolstack put in the new nodes is still potentially wrong.
> Currently toolstack applies some constant fudge numbers, which is a bit
> unpleasant.
> 
> It would be at least useful to break down the accounting inside the
> hypervisor a bit more:
> 
> * max_pages : maximum number of pages a domain can use for whatever
>   purpose (ram + firmware + others)
> * curr_pages : current number of pages a domain is using (ram + ...)
> * max_ram_pages : maximum number of pages a domain can use for ram
> * curr_ram_pages : ...

The problem here is that new hypercalls would have to be added,
because firmware running inside the guest picks RAM regions and
changes them to reserved for example, and the firmware would need a
way to tell Xen about those changes.

We could even have something like an expanded memory map with more
types in order to describe MMIO regions trapped inside of the
hypervisor, firmware regions, ram, etc... that could be modified by
both the toolstack and Xen.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen ballooning interface

2018-08-21 Thread Wei Liu
On Mon, Aug 13, 2018 at 03:06:10PM +0200, Juergen Gross wrote:
> Today's interface of Xen for memory ballooning is quite a mess. There
> are some shortcomings which should be addressed somehow. After a
> discussion on IRC there was consensus we should try to design a new
> interface addressing the current and probably future needs.
> 
> Current interface
> -
> A guest has access to the following memory related information (all for
> x86):
> 
> - the memory map (E820 or EFI)
> - ACPI tables for HVM/PVH guests
> - actual maximum size via XENMEM_maximum_reservation hypercall (the
>   hypervisor will deny attempts of the guest to allocate more)
> - current size via XENMEM_current_reservation hypercall
> - Xenstore entry "memory/static-max" for the upper bound of memory size
>   (information for the guest which memory size might be reached without
>   hotplugging memory)
> - Xenstore entry "memory/target" for current target size (used for
>   ballooning: Xen tools set the size the guest should try to reach by
>   allocating or releasing memory)
> 
> The main problem with this interface is the guest doesn't know in all
> cases which memory is included in the values (e.g. memory allocated by
> Xen tools for the firmware of a HVM guest is included in the Xenstore
> and hypercall information, but not in the memory map).
> 

Somewhat related: who has the canonical source of all the information?
I think Xen should have that, but it is unclear to me how toolstack can
get such information from Xen. ISTR currently it is possible to get
current number of pages and maximum numbers of pages, both of which
contain pages for firmware which are visible to guests (E820 / EFI
reserved).

Without that fixed, the new interface won't be of much use because the
information toolstack put in the new nodes is still potentially wrong.
Currently toolstack applies some constant fudge numbers, which is a bit
unpleasant.

It would be at least useful to break down the accounting inside the
hypervisor a bit more:

* max_pages : maximum number of pages a domain can use for whatever
  purpose (ram + firmware + others)
* curr_pages : current number of pages a domain is using (ram + ...)
* max_ram_pages : maximum number of pages a domain can use for ram
* curr_ram_pages : ...
etc etc

IIRC there are current two schools of thought which disagrees with each
other what the "maximum number of pages" in hypervisor means.

This is not saying we can't design new interfaces, it is just that it
wouldn't be very useful IMHO.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen ballooning interface

2018-08-14 Thread Juergen Gross
On 14/08/18 09:34, Jan Beulich wrote:
 On 14.08.18 at 09:19,  wrote:
>> On 14/08/18 09:02, Jan Beulich wrote:
>> On 13.08.18 at 17:44,  wrote:
 On 13/08/18 17:29, Jan Beulich wrote:
 On 13.08.18 at 16:20,  wrote:
>> On 13/08/18 15:54, Jan Beulich wrote:
>> On 13.08.18 at 15:06,  wrote:
 Suggested new interface
 ---
 Hypercalls, memory map(s) and ACPI tables should stay the same (for
 compatibility reasons or because they are architectural interfaces).

 As the main confusion in the current interface is related to the
 specification of the target memory size this part of the interface
 should be changed: specifying the size of the ballooned area instead
 is much clearer and will be the same for all guest types (no firmware
 memory or magic additions involved).
>>>
>>> But isn't this backwards? The balloon size is a piece of information
>>> internal to the guest. Why should the outside world know or care?
>>
>> Instead of specifying an absolute value to reach you'd specify how much
>> memory the guest should stay below its maximum. I think this is a valid
>> approach.
>
> But with you vNUMA model there's no single such value, and nothing
> like a "maximum" (which would need to be per virtual node afaics).

 With vNUMA there is a current value of memory per node supplied by the
 tools and a maximum per node can be caclulated the same way.
>>>
>>> Can it? If so, I must be overlooking some accounting done
>>> somewhere. I'm only aware of a global maximum.
>>
>> The tools set the vnuma information for the guest. How do they do this
>> without knowing the memory size per vnuma node?
> 
> That's the current (initial) size, not the maximum.

Which is the same in the current implementation:
libxl__vnuma_config_check() will fail if the memory of all vnuma nodes
won't sum up to the max memory of the domain.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen ballooning interface

2018-08-14 Thread Jan Beulich
>>> On 14.08.18 at 09:19,  wrote:
> On 14/08/18 09:02, Jan Beulich wrote:
> On 13.08.18 at 17:44,  wrote:
>>> On 13/08/18 17:29, Jan Beulich wrote:
>>> On 13.08.18 at 16:20,  wrote:
> On 13/08/18 15:54, Jan Beulich wrote:
> On 13.08.18 at 15:06,  wrote:
>>> Suggested new interface
>>> ---
>>> Hypercalls, memory map(s) and ACPI tables should stay the same (for
>>> compatibility reasons or because they are architectural interfaces).
>>>
>>> As the main confusion in the current interface is related to the
>>> specification of the target memory size this part of the interface
>>> should be changed: specifying the size of the ballooned area instead
>>> is much clearer and will be the same for all guest types (no firmware
>>> memory or magic additions involved).
>>
>> But isn't this backwards? The balloon size is a piece of information
>> internal to the guest. Why should the outside world know or care?
>
> Instead of specifying an absolute value to reach you'd specify how much
> memory the guest should stay below its maximum. I think this is a valid
> approach.

 But with you vNUMA model there's no single such value, and nothing
 like a "maximum" (which would need to be per virtual node afaics).
>>>
>>> With vNUMA there is a current value of memory per node supplied by the
>>> tools and a maximum per node can be caclulated the same way.
>> 
>> Can it? If so, I must be overlooking some accounting done
>> somewhere. I'm only aware of a global maximum.
> 
> The tools set the vnuma information for the guest. How do they do this
> without knowing the memory size per vnuma node?

That's the current (initial) size, not the maximum.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen ballooning interface

2018-08-14 Thread Juergen Gross
On 14/08/18 09:02, Jan Beulich wrote:
 On 13.08.18 at 17:44,  wrote:
>> On 13/08/18 17:29, Jan Beulich wrote:
>> On 13.08.18 at 16:20,  wrote:
 On 13/08/18 15:54, Jan Beulich wrote:
 On 13.08.18 at 15:06,  wrote:
>> Suggested new interface
>> ---
>> Hypercalls, memory map(s) and ACPI tables should stay the same (for
>> compatibility reasons or because they are architectural interfaces).
>>
>> As the main confusion in the current interface is related to the
>> specification of the target memory size this part of the interface
>> should be changed: specifying the size of the ballooned area instead
>> is much clearer and will be the same for all guest types (no firmware
>> memory or magic additions involved).
>
> But isn't this backwards? The balloon size is a piece of information
> internal to the guest. Why should the outside world know or care?

 Instead of specifying an absolute value to reach you'd specify how much
 memory the guest should stay below its maximum. I think this is a valid
 approach.
>>>
>>> But with you vNUMA model there's no single such value, and nothing
>>> like a "maximum" (which would need to be per virtual node afaics).
>>
>> With vNUMA there is a current value of memory per node supplied by the
>> tools and a maximum per node can be caclulated the same way.
> 
> Can it? If so, I must be overlooking some accounting done
> somewhere. I'm only aware of a global maximum.

The tools set the vnuma information for the guest. How do they do this
without knowing the memory size per vnuma node?

> 
>> This results in a balloon size per node.
>>
>> There is still the option to let the guest adjust the per node balloon
>> sizes after reaching the final memory size or maybe during the process
>> of ballooning at a certain rate.
> 
> I'm probably increasingly confused: Shouldn't, for whichever value
> in xenstore, there be a firm determination of which single party is
> supposed to modify a value? Aiui the intention is for the (target)
> balloon size to be set by the tools.

Sorry if I wasn't clear enough here: the guest shouldn't rewrite the
target balloon size, but e.g. memory/vnode/balloon-size.

> 
>> Any further thoughts on this?
>
> The other problem we've always had was that address information
> could not be conveyed to the driver. The worst example in the past
> was that 32-bit PV domains can't run on arbitrarily high underlying
> physical addresses, but of course there are other cases where
> memory below a certain boundary may be needed. The obvious
> problem with directly exposing address information through the
> interface is that for HVM guests machine addresses are meaningless.
> Hence I wonder whether a dedicated "balloon out this page if you
> can" mechanism would be something to consider.

 Isn't this a problem orthogonal to the one we are discussing here?
>>>
>>> Yes, but I think we shouldn't design a new interface without
>>> considering all current shortcomings.
>>
>> I don't think the suggested interface would make it harder to add a way
>> to request special pages to be preferred in the ballooning process.
> 
> Address and (virtual) node may conflict with one another. But I
> think we've meanwhile settled on the node value to only be a hint
> in a request.

I think so, yes.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen ballooning interface

2018-08-13 Thread Juergen Gross
On 13/08/18 17:29, Jan Beulich wrote:
 On 13.08.18 at 16:20,  wrote:
>> On 13/08/18 15:54, Jan Beulich wrote:
>> On 13.08.18 at 15:06,  wrote:
 Suggested new interface
 ---
 Hypercalls, memory map(s) and ACPI tables should stay the same (for
 compatibility reasons or because they are architectural interfaces).

 As the main confusion in the current interface is related to the
 specification of the target memory size this part of the interface
 should be changed: specifying the size of the ballooned area instead
 is much clearer and will be the same for all guest types (no firmware
 memory or magic additions involved).
>>>
>>> But isn't this backwards? The balloon size is a piece of information
>>> internal to the guest. Why should the outside world know or care?
>>
>> Instead of specifying an absolute value to reach you'd specify how much
>> memory the guest should stay below its maximum. I think this is a valid
>> approach.
> 
> But with you vNUMA model there's no single such value, and nothing
> like a "maximum" (which would need to be per virtual node afaics).

With vNUMA there is a current value of memory per node supplied by the
tools and a maximum per node can be caclulated the same way. This
results in a balloon size per node.

There is still the option to let the guest adjust the per node balloon
sizes after reaching the final memory size or maybe during the process
of ballooning at a certain rate.

> 
 Any further thoughts on this?
>>>
>>> The other problem we've always had was that address information
>>> could not be conveyed to the driver. The worst example in the past
>>> was that 32-bit PV domains can't run on arbitrarily high underlying
>>> physical addresses, but of course there are other cases where
>>> memory below a certain boundary may be needed. The obvious
>>> problem with directly exposing address information through the
>>> interface is that for HVM guests machine addresses are meaningless.
>>> Hence I wonder whether a dedicated "balloon out this page if you
>>> can" mechanism would be something to consider.
>>
>> Isn't this a problem orthogonal to the one we are discussing here?
> 
> Yes, but I think we shouldn't design a new interface without
> considering all current shortcomings.

I don't think the suggested interface would make it harder to add a way
to request special pages to be preferred in the ballooning process.

> 
>> I'd rather do a localhost guest migration to free specific pages a
>> guest is owning and tell the Xen memory allocator not to hand them
>> out to the new guest created by the migration.
> 
> There may not be enough memory to do a localhost migration.
> Ballooning, after all, may be done just because of a memory
> shortage.

True.

Still I believe adding the tooling to identify domains owning needed
memory pages and demand them to balloon those out in order to make use
of those pages for creation of a special domain is nothing which is
going to happen soon.

So as long as we are confident that the new interface wouldn't block
such a usage I think we are fine.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen ballooning interface

2018-08-13 Thread Jan Beulich
>>> On 13.08.18 at 16:20,  wrote:
> On 13/08/18 15:54, Jan Beulich wrote:
> On 13.08.18 at 15:06,  wrote:
>>> Suggested new interface
>>> ---
>>> Hypercalls, memory map(s) and ACPI tables should stay the same (for
>>> compatibility reasons or because they are architectural interfaces).
>>>
>>> As the main confusion in the current interface is related to the
>>> specification of the target memory size this part of the interface
>>> should be changed: specifying the size of the ballooned area instead
>>> is much clearer and will be the same for all guest types (no firmware
>>> memory or magic additions involved).
>> 
>> But isn't this backwards? The balloon size is a piece of information
>> internal to the guest. Why should the outside world know or care?
> 
> Instead of specifying an absolute value to reach you'd specify how much
> memory the guest should stay below its maximum. I think this is a valid
> approach.

But with you vNUMA model there's no single such value, and nothing
like a "maximum" (which would need to be per virtual node afaics).

>>> Any further thoughts on this?
>> 
>> The other problem we've always had was that address information
>> could not be conveyed to the driver. The worst example in the past
>> was that 32-bit PV domains can't run on arbitrarily high underlying
>> physical addresses, but of course there are other cases where
>> memory below a certain boundary may be needed. The obvious
>> problem with directly exposing address information through the
>> interface is that for HVM guests machine addresses are meaningless.
>> Hence I wonder whether a dedicated "balloon out this page if you
>> can" mechanism would be something to consider.
> 
> Isn't this a problem orthogonal to the one we are discussing here?

Yes, but I think we shouldn't design a new interface without
considering all current shortcomings.

> I'd rather do a localhost guest migration to free specific pages a
> guest is owning and tell the Xen memory allocator not to hand them
> out to the new guest created by the migration.

There may not be enough memory to do a localhost migration.
Ballooning, after all, may be done just because of a memory
shortage.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen ballooning interface

2018-08-13 Thread Roger Pau Monné
On Mon, Aug 13, 2018 at 04:27:06PM +0200, Juergen Gross wrote:
> On 13/08/18 16:12, Roger Pau Monné wrote:
> > On Mon, Aug 13, 2018 at 03:06:10PM +0200, Juergen Gross wrote:
> > Currently as you say there's a difference between the xenstore target
> > and the guest memory map, because some memory is used by the firmware.
> > In order to solve this the toolstack won't provide an absolute memory
> > target but instead a relative one to the guest that contains the
> > balloon size.
> > 
> > But the toolstack interface (xl) still uses mem-set which is an
> > absolute value. How is the toolstack going to accurately calculate the
> > balloon size without knowing the extra memory used by the firmware?
> 
> mem-set will make use of the current allocation the tools know about and
> add/subtract the difference to the new value to/from the target balloon
> size. I don't think firmware will eat away memory when the guest OS is
> already running. :-)
> 
> The main difference to today's situation is that the same component
> which did the initial calculation how much memory should be allocated is
> doing the math in case of ballooning now. So no guesswork any longer.

Right, it doesn't matter how much memory is used by the firmware
because the guest is going to balloon down an exact amount given by
the toolstack, so that at the end of the ballooning the used memory is
going to match the toolstack expectations.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen ballooning interface

2018-08-13 Thread Juergen Gross
On 13/08/18 16:12, Roger Pau Monné wrote:
> On Mon, Aug 13, 2018 at 03:06:10PM +0200, Juergen Gross wrote:
>> Today's interface of Xen for memory ballooning is quite a mess. There
>> are some shortcomings which should be addressed somehow. After a
>> discussion on IRC there was consensus we should try to design a new
>> interface addressing the current and probably future needs.
> 
> Thanks for doing this! Memory accounting is quite messy at the moment
> :(.
> 
> [...]
>> Open questions
>> --
>> Should we add memory size information to the memory/vnode nodes?
>>
>> Should the guest add information about its current balloon sizes to the
>> memory/vnode nodes (i.e. after ballooning, or every x seconds while
>> ballooning)?
>>
>> Should we specify whether the guest is free to balloon another vnode
>> than specified?
> 
> What if the guest simply doesn't support NUMA and doesn't know
> anything about nodes?

Okay, that's a rather good answer to this question. :-)

>> Should memory hotplug (at least for PV domains) use the vnode specific
>> Xenstore paths, too, if supported by the guest?
> 
> Is extra memory hotplug going to set:
> 
> memory/vnode/target-balloon-size = -1000
> 
> In order to tell the guest it can hotplug past the boot time amount of
> memory?

Interesting idea.

> 
>> Any further thoughts on this?
> 
> Isn't this just moving the memory accounting problem to another piece
> of software?
> 
> Currently as you say there's a difference between the xenstore target
> and the guest memory map, because some memory is used by the firmware.
> In order to solve this the toolstack won't provide an absolute memory
> target but instead a relative one to the guest that contains the
> balloon size.
> 
> But the toolstack interface (xl) still uses mem-set which is an
> absolute value. How is the toolstack going to accurately calculate the
> balloon size without knowing the extra memory used by the firmware?

mem-set will make use of the current allocation the tools know about and
add/subtract the difference to the new value to/from the target balloon
size. I don't think firmware will eat away memory when the guest OS is
already running. :-)

The main difference to today's situation is that the same component
which did the initial calculation how much memory should be allocated is
doing the math in case of ballooning now. So no guesswork any longer.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen ballooning interface

2018-08-13 Thread Juergen Gross
On 13/08/18 15:54, Jan Beulich wrote:
 On 13.08.18 at 15:06,  wrote:
>> Suggested new interface
>> ---
>> Hypercalls, memory map(s) and ACPI tables should stay the same (for
>> compatibility reasons or because they are architectural interfaces).
>>
>> As the main confusion in the current interface is related to the
>> specification of the target memory size this part of the interface
>> should be changed: specifying the size of the ballooned area instead
>> is much clearer and will be the same for all guest types (no firmware
>> memory or magic additions involved).
> 
> But isn't this backwards? The balloon size is a piece of information
> internal to the guest. Why should the outside world know or care?

Instead of specifying an absolute value to reach you'd specify how much
memory the guest should stay below its maximum. I think this is a valid
approach.

> What if the guest internals don't even allow the balloon to be the
> size requested?

Same as today: what if the guest internals don't even allow to reach the
requested target size?

> 
>> Open questions
>> --
>> Should we add memory size information to the memory/vnode nodes?
>>
>> Should the guest add information about its current balloon sizes to the
>> memory/vnode nodes (i.e. after ballooning, or every x seconds while
>> ballooning)?
>>
>> Should we specify whether the guest is free to balloon another vnode
>> than specified?
> 
> Ballooning out _some_ memory is always going to be better than
> ballooning out none at all. I think the node can only serve as a hint
> here.

I agree. I just wanted to point out we need to define the possible
reactions to such a situation.

> 
>> Should memory hotplug (at least for PV domains) use the vnode specific
>> Xenstore paths, too, if supported by the guest?
>>
>>
>> Any further thoughts on this?
> 
> The other problem we've always had was that address information
> could not be conveyed to the driver. The worst example in the past
> was that 32-bit PV domains can't run on arbitrarily high underlying
> physical addresses, but of course there are other cases where
> memory below a certain boundary may be needed. The obvious
> problem with directly exposing address information through the
> interface is that for HVM guests machine addresses are meaningless.
> Hence I wonder whether a dedicated "balloon out this page if you
> can" mechanism would be something to consider.

Isn't this a problem orthogonal to the one we are discussing here?
I'd rather do a localhost guest migration to free specific pages a
guest is owning and tell the Xen memory allocator not to hand them
out to the new guest created by the migration.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen ballooning interface

2018-08-13 Thread Roger Pau Monné
On Mon, Aug 13, 2018 at 03:06:10PM +0200, Juergen Gross wrote:
> Today's interface of Xen for memory ballooning is quite a mess. There
> are some shortcomings which should be addressed somehow. After a
> discussion on IRC there was consensus we should try to design a new
> interface addressing the current and probably future needs.

Thanks for doing this! Memory accounting is quite messy at the moment
:(.

[...]
> Open questions
> --
> Should we add memory size information to the memory/vnode nodes?
> 
> Should the guest add information about its current balloon sizes to the
> memory/vnode nodes (i.e. after ballooning, or every x seconds while
> ballooning)?
> 
> Should we specify whether the guest is free to balloon another vnode
> than specified?

What if the guest simply doesn't support NUMA and doesn't know
anything about nodes?

> Should memory hotplug (at least for PV domains) use the vnode specific
> Xenstore paths, too, if supported by the guest?

Is extra memory hotplug going to set:

memory/vnode/target-balloon-size = -1000

In order to tell the guest it can hotplug past the boot time amount of
memory?

> Any further thoughts on this?

Isn't this just moving the memory accounting problem to another piece
of software?

Currently as you say there's a difference between the xenstore target
and the guest memory map, because some memory is used by the firmware.
In order to solve this the toolstack won't provide an absolute memory
target but instead a relative one to the guest that contains the
balloon size.

But the toolstack interface (xl) still uses mem-set which is an
absolute value. How is the toolstack going to accurately calculate the
balloon size without knowing the extra memory used by the firmware?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen ballooning interface

2018-08-13 Thread Jan Beulich
>>> On 13.08.18 at 15:06,  wrote:
> Suggested new interface
> ---
> Hypercalls, memory map(s) and ACPI tables should stay the same (for
> compatibility reasons or because they are architectural interfaces).
> 
> As the main confusion in the current interface is related to the
> specification of the target memory size this part of the interface
> should be changed: specifying the size of the ballooned area instead
> is much clearer and will be the same for all guest types (no firmware
> memory or magic additions involved).

But isn't this backwards? The balloon size is a piece of information
internal to the guest. Why should the outside world know or care?
What if the guest internals don't even allow the balloon to be the
size requested?

> Open questions
> --
> Should we add memory size information to the memory/vnode nodes?
> 
> Should the guest add information about its current balloon sizes to the
> memory/vnode nodes (i.e. after ballooning, or every x seconds while
> ballooning)?
> 
> Should we specify whether the guest is free to balloon another vnode
> than specified?

Ballooning out _some_ memory is always going to be better than
ballooning out none at all. I think the node can only serve as a hint
here.

> Should memory hotplug (at least for PV domains) use the vnode specific
> Xenstore paths, too, if supported by the guest?
> 
> 
> Any further thoughts on this?

The other problem we've always had was that address information
could not be conveyed to the driver. The worst example in the past
was that 32-bit PV domains can't run on arbitrarily high underlying
physical addresses, but of course there are other cases where
memory below a certain boundary may be needed. The obvious
problem with directly exposing address information through the
interface is that for HVM guests machine addresses are meaningless.
Hence I wonder whether a dedicated "balloon out this page if you
can" mechanism would be something to consider.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Xen ballooning interface

2018-08-13 Thread Juergen Gross
Today's interface of Xen for memory ballooning is quite a mess. There
are some shortcomings which should be addressed somehow. After a
discussion on IRC there was consensus we should try to design a new
interface addressing the current and probably future needs.

Current interface
-
A guest has access to the following memory related information (all for
x86):

- the memory map (E820 or EFI)
- ACPI tables for HVM/PVH guests
- actual maximum size via XENMEM_maximum_reservation hypercall (the
  hypervisor will deny attempts of the guest to allocate more)
- current size via XENMEM_current_reservation hypercall
- Xenstore entry "memory/static-max" for the upper bound of memory size
  (information for the guest which memory size might be reached without
  hotplugging memory)
- Xenstore entry "memory/target" for current target size (used for
  ballooning: Xen tools set the size the guest should try to reach by
  allocating or releasing memory)

The main problem with this interface is the guest doesn't know in all
cases which memory is included in the values (e.g. memory allocated by
Xen tools for the firmware of a HVM guest is included in the Xenstore
and hypercall information, but not in the memory map).

So without tweaking the available information a HVM guest booted with
a certain amount of memory will believe it has to balloon up, as the
target value in Xenstore will be larger than the memory the guest
assumes to have available according to the memory map.

An additional complexity is added by Xen tools which add a magic size
constant depending on guest type to the Xenstore values.

The current interface has no way to specify (virtual) NUMA nodes for
ballooning. In case vNUMA is being added to Xen the ballooning interface
needs an extension, too.


Suggested new interface
---
Hypercalls, memory map(s) and ACPI tables should stay the same (for
compatibility reasons or because they are architectural interfaces).

As the main confusion in the current interface is related to the
specification of the target memory size this part of the interface
should be changed: specifying the size of the ballooned area instead
is much clearer and will be the same for all guest types (no firmware
memory or magic additions involved).

In order to support vNUMA the balloon size should be per vNUMA node.

With the new interface in use Xen tools will calculate the balloon
size per vnode and write the related values to Xenstore:

memory/vnode/target-balloon-size

The guest will have setup a watch on those entries, so it can react on a
modification as today.

The guest will indicate support of the new ballooning interface by
writing the value "1" into Xenstore entry control/feature-balloon-vnode.
In case Xen supports the new interface and the guest does so, too, only
the new interface should be used. Xen tools will remove the (old) node
memory/target-size in this case.

Open questions
--
Should we add memory size information to the memory/vnode nodes?

Should the guest add information about its current balloon sizes to the
memory/vnode nodes (i.e. after ballooning, or every x seconds while
ballooning)?

Should we specify whether the guest is free to balloon another vnode
than specified?

Should memory hotplug (at least for PV domains) use the vnode specific
Xenstore paths, too, if supported by the guest?


Any further thoughts on this?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel