Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-05 Thread Wei Liu
On Thu, Jun 04, 2015 at 04:28:00PM +0100, Wei Liu wrote:
[...]
> 2. Libxl record that in toolstack save path.
> 3. Remote end calls xc_domain_setmaxmem in toolstack restore path.

Unfortunately toolstack restore is called after
libxl__xc_domain_restore so it cannot be used to solve our problem.

Wei.

> 
> Wei.
> 
> > Wei.
> > 
> > > ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-04 Thread Ian Campbell
On Thu, 2015-06-04 at 16:28 +0100, Wei Liu wrote:
> An alternative we came up with during our IRL discussion.
> 
> 1. QEMU writes the size of additional memory in xenstore.

Above is orthogonal to below I think. You existing patch could be
reimplemented in terms of the below, while the above is potentially a
separate piece of work.

> 2. Libxl record that in toolstack save path.
> 3. Remote end calls xc_domain_setmaxmem in toolstack restore path.
> 
> Wei.
> 
> > Wei.
> > 
> > > ~Andrew



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-04 Thread Wei Liu
On Tue, Jun 02, 2015 at 06:40:56PM +0100, Wei Liu wrote:
> On Tue, Jun 02, 2015 at 05:11:02PM +0100, Andrew Cooper wrote:
> > On 02/06/15 16:49, Ian Campbell wrote:
> > > On Tue, 2015-06-02 at 15:08 +0100, Wei Liu wrote:
> > > [...]
> > >>> So here is a proof of concept patch to record and honour that value
> > >>> during migration.  A new field is added in IDL. Note that we don't
> > >>> provide xl level config option for it and mandate it to be default value
> > >>> during domain creation. This is to prevent libxl user from using it to
> > >>> avoid unforeseen repercussions.
> > >> [...]
> > >>> This field is mandated to be default value during guest creation to
> > >>> avoid unforeseen repercussions. It's only honour when restoring a guest.
> > > IMHO this means that the libxl API/IDL is the wrong place for this
> > > value. Only user and/or application serviceable parts belong in the API.
> > >
> > > So while I agree that this value need to be communicated across a
> > > migration, the JSON blob is not the right mechanism for doing so. IOW if
> > > you go down this general path I think you need a new
> > > field/record/whatever in the migration protocol at some layer or other
> > > (if not libxc then at the libxl layer).
> > >
> > > To my mind this "actual state" vs "user configured state" is more akin
> > > to the sorts of things which is in the hypervisor save blob or something
> > > like that (nb: This is not a suggestion that it should go there).
> > >
> > > IIRC Don also outlined another case, which is
> > > xl create -p
> > > xl migrate
> > > xl unpause
> > >
> > > Which might need more thought if any bumping can happen after the
> > > migrate i.e. on unpause?
> > >
> > >
> > 
> > The problem is qemu using set_max_mem.  It should never have done so.
> > 
> > Nothing other than libxl should be using such hypercalls, at which point
> > libxls idea of guest memory is accurate and the bug ceases to exist.
> > 
> 
> That would be an ideal solution, but it's not going to materialise
> any time soon because:
> 
> 1. Libxl cannot know how much more memory guest needs prior to QEMU
>start-up.
> 2. QEMU cannot communicate back the value to libxl because
>  2.1 there is no communication channel;
>  2.2 there is no libxld to update the config (maybe we can work around
>  this by waiting for QEMU during guest creation).
> 
> While we work towards that end, a short term solution might be:
> 
> 1. Make QEMU not call xc_domain_setmaxmem.
> 2. Provide a field in IDL and a config option in xl to indicate how
>much more memory is needed. User can fill in that option as he / she
>sees fit.
> 
> After we get to the ideal solution we can then deprecate that field /
> option.
> 

An alternative we came up with during our IRL discussion.

1. QEMU writes the size of additional memory in xenstore.
2. Libxl record that in toolstack save path.
3. Remote end calls xc_domain_setmaxmem in toolstack restore path.

Wei.

> Wei.
> 
> > ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-04 Thread Ian Campbell
On Thu, 2015-06-04 at 10:32 +0100, Andrew Cooper wrote:
> On 04/06/15 10:26, Ian Campbell wrote:
> > On Thu, 2015-06-04 at 10:14 +0100, Wei Liu wrote:
> >> The main objection is that we shouldn't call xc_domain_setmaxmem in the
> >> middle of a migration stream.
> > In the middle of an _xc_ migration stream.
> >
> > This seems like the sort of thing it would be OK to have in a (to be
> > introduced) libxl stream (which would itself contain the xc stream as a
> > data item).
> >
> > I think we are expecting such a thing to be introduced as part of the
> > libxl side of migration v2, aren't we?
> 
> No.  libxl migration v2 will not be affecting the behaviour here.

By "such a thing" I was referring to "a libxl stream format", not this
piece of data specifically.

This stream format however will be extensible (I hope) and therefore
could accommodate state information which differs from the domain
configuration.

> The only reasonable way I see this being fixed is for the libxl json
> blob

For _a_ libxl json blob, not necessarily the existing one, which is the
user's domain configuration information.

IOW it doesn't have to be the existing libxl_domain_config blob. (Nor
does it really need to be json, but whatever)

>  to contain the correct size of the VM, and for
> libxl_domain_create() to make an appropriately sized VM in the first place.
> 
> One extension which libxl migration v2 will bring is the ability to send
> a new json blob later in the stream, but such a fix still requires the
> json blob to have the correct size in it.
> 
> ~Andrew



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-04 Thread Andrew Cooper
On 04/06/15 10:26, Ian Campbell wrote:
> On Thu, 2015-06-04 at 10:14 +0100, Wei Liu wrote:
>> The main objection is that we shouldn't call xc_domain_setmaxmem in the
>> middle of a migration stream.
> In the middle of an _xc_ migration stream.
>
> This seems like the sort of thing it would be OK to have in a (to be
> introduced) libxl stream (which would itself contain the xc stream as a
> data item).
>
> I think we are expecting such a thing to be introduced as part of the
> libxl side of migration v2, aren't we?

No.  libxl migration v2 will not be affecting the behaviour here.

The only reasonable way I see this being fixed is for the libxl json
blob to contain the correct size of the VM, and for
libxl_domain_create() to make an appropriately sized VM in the first place.

One extension which libxl migration v2 will bring is the ability to send
a new json blob later in the stream, but such a fix still requires the
json blob to have the correct size in it.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-04 Thread Ian Campbell
On Thu, 2015-06-04 at 10:14 +0100, Wei Liu wrote:
> The main objection is that we shouldn't call xc_domain_setmaxmem in the
> middle of a migration stream.

In the middle of an _xc_ migration stream.

This seems like the sort of thing it would be OK to have in a (to be
introduced) libxl stream (which would itself contain the xc stream as a
data item).

I think we are expecting such a thing to be introduced as part of the
libxl side of migration v2, aren't we?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-04 Thread Wei Liu
On Wed, Jun 03, 2015 at 02:22:25PM +0100, George Dunlap wrote:
> On Tue, Jun 2, 2015 at 3:05 PM, Wei Liu  wrote:
> > Previous discussion at [0].
> >
> > For the benefit of discussion, we refer to max_memkb inside hypervisor
> > as hv_max_memkb (name subject to improvement). That's the maximum number
> > of memory a domain can use.
> 
> Why don't we try to use "memory" for virtual RAM that we report to the
> guest, and "pages" for what exists inside the hypervisor?  "Pages" is
> the term the hypervisor itself uses internally (i.e., set_max_mem()
> actually changes a domain's max_pages value).
> 
> So in this case both guest memory and option roms are created using
> hypervisor pages.
> 
> > Libxl doesn't know hv_max_memkb for a domain needs prior to QEMU start-up
> > because of optional ROMs etc.
> 
> So a translation of this using "memory/pages" terminology would be:
> 
> QEMU may need extra pages from Xen to implement option ROMS, and so at
> the moment it calls set_max_mem() to increase max_pages so that it can
> allocate more pages to the guest.  libxl doesn't know what max_pages a
> domain needs prior to qemu start-up.
> 
> > Libxl doesn't know the hv_max_memkb even after QEMU start-up, because
> > there is no mechanism to community between QEMU and libxl. This is an
> > area that needs improvement, we've encountered problems in this area
> > before.
> 
> [translating]
> Libxl doesn't know max_pages  even after qemu start-up, because there
> is no mechanism to communicate between qemu and libxl.
> 
> > QEMU calls xc_domain_setmaxmem to increase hv_max_memkb by N pages. Those
> > pages are only accounted in hypervisor. During migration, libxl
> > (currently) doesn't extract that value from hypervisor.
> 
> [translating]
> qemu calls xc_domain_setmaxmem to increase max_pages by N pages.
> Those pages are only accounted for in the hypervisor.  libxl
> (currently) does not extract that value from the hypervisor.
> 
> > So now the problem is on the remote end:
> >
> > 1. Libxl indicates domain needs X pages.
> > 2. Domain actually needs X + N pages.
> > 3. Remote end tries to write N more pages and fail.
> >
> > This behaviour currently doesn't affect normal migration (that you
> > transfer libxl JSON to remote, construct a domain, then start QEMU)
> > because QEMU won't bump hv_max_memkb again. This is by design and
> > reflected in QEMU code.
> 
> I don't understand this paragraph -- does the remote domain actually
> need X+N pages or not?  If it does, in what way does this behavior
> "not affect normal migration"?
> 

I was wrong. I don't recollect how I came to that conclusion. It does
affect normal migration.

> > This behaviour affects COLO and becomes a bug in that case, because
> > secondary VM's QEMU doesn't go through the same start-of-day
> > initialisation (Hongyang, correct me if I'm wrong), i.e. no bumping
> > hv_max_memkb inside QEMU.
> >
> > Andrew plans to embed JSON inside migration v2 and COLO is based on
> > migration v2. The bug is fixed if JSON is correct in the first place.
> >
> > As COLO is not yet upstream, so this bug is not a blocker for 4.6. But
> > it should be fixed for the benefit of COLO.
> >
> > So here is a proof of concept patch to record and honour that value
> > during migration.  A new field is added in IDL. Note that we don't
> > provide xl level config option for it and mandate it to be default value
> > during domain creation. This is to prevent libxl user from using it to
> > avoid unforeseen repercussions.
> >
> > This patch is compiled test only. If we agree this is the way to go I
> > will test and submit a proper patch.
> 
> Reading max_pages from Xen and setting it on the far side seems like a
> reasonable option.  Is there a reason we can't add a magic XC_SAVE_ID

Yes. That's the correct behaviour we want to have. The question is where
should we put that value and when to set it.

> for v1, like we do for other parameters?
> 

The main objection is that we shouldn't call xc_domain_setmaxmem in the
middle of a migration stream.

Wei.

>  -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-03 Thread Don Slutz
On 06/03/15 09:53, George Dunlap wrote:
> On 06/03/2015 02:32 PM, Andrew Cooper wrote:
>> On 03/06/15 14:22, George Dunlap wrote:
>>> On Tue, Jun 2, 2015 at 3:05 PM, Wei Liu  wrote:
 Previous discussion at [0].

 For the benefit of discussion, we refer to max_memkb inside hypervisor
 as hv_max_memkb (name subject to improvement). That's the maximum number
 of memory a domain can use.
>>> Why don't we try to use "memory" for virtual RAM that we report to the
>>> guest, and "pages" for what exists inside the hypervisor?  "Pages" is
>>> the term the hypervisor itself uses internally (i.e., set_max_mem()
>>> actually changes a domain's max_pages value).
>>>
>>> So in this case both guest memory and option roms are created using
>>> hypervisor pages.
>>>
 Libxl doesn't know hv_max_memkb for a domain needs prior to QEMU start-up
 because of optional ROMs etc.
>>> So a translation of this using "memory/pages" terminology would be:
>>>
>>> QEMU may need extra pages from Xen to implement option ROMS, and so at
>>> the moment it calls set_max_mem() to increase max_pages so that it can
>>> allocate more pages to the guest.  libxl doesn't know what max_pages a
>>> domain needs prior to qemu start-up.
>>>
 Libxl doesn't know the hv_max_memkb even after QEMU start-up, because
 there is no mechanism to community between QEMU and libxl. This is an
 area that needs improvement, we've encountered problems in this area
 before.
>>> [translating]
>>> Libxl doesn't know max_pages  even after qemu start-up, because there
>>> is no mechanism to communicate between qemu and libxl.

This is not 100% true.  libxl and QEMU do communicate.  Currently QEMU
does not record it's changes but I have not search all the info you can
get.  Also libxl "knows" what it set maxmem to before QEMU starts and it
can read (via xc) what it is after QEMU is done all the changes.  QEMU
only responds to commands from libxl at this time.

 QEMU calls xc_domain_setmaxmem to increase hv_max_memkb by N pages. Those
 pages are only accounted in hypervisor. During migration, libxl
 (currently) doesn't extract that value from hypervisor.
>>> [translating]
>>> qemu calls xc_domain_setmaxmem to increase max_pages by N pages.
>>> Those pages are only accounted for in the hypervisor.  libxl
>>> (currently) does not extract that value from the hypervisor.

Yes.

 So now the problem is on the remote end:

 1. Libxl indicates domain needs X pages.
 2. Domain actually needs X + N pages.
 3. Remote end tries to write N more pages and fail.

 This behaviour currently doesn't affect normal migration (that you
 transfer libxl JSON to remote, construct a domain, then start QEMU)
 because QEMU won't bump hv_max_memkb again. This is by design and
 reflected in QEMU code.
>>> I don't understand this paragraph -- does the remote domain actually
>>> need X+N pages or not?  If it does, in what way does this behavior
>>> "not affect normal migration"?

As far as I know is does.  And it affects "xl save" and "xl restore".

However you need more then 4 e1000 nics (for example) to see this.

 This behaviour affects COLO and becomes a bug in that case, because
 secondary VM's QEMU doesn't go through the same start-of-day
 initialisation (Hongyang, correct me if I'm wrong), i.e. no bumping
 hv_max_memkb inside QEMU.

 Andrew plans to embed JSON inside migration v2 and COLO is based on
 migration v2. The bug is fixed if JSON is correct in the first place.

 As COLO is not yet upstream, so this bug is not a blocker for 4.6. But
 it should be fixed for the benefit of COLO.

 So here is a proof of concept patch to record and honour that value
 during migration.  A new field is added in IDL. Note that we don't
 provide xl level config option for it and mandate it to be default value
 during domain creation. This is to prevent libxl user from using it to
 avoid unforeseen repercussions.

 This patch is compiled test only. If we agree this is the way to go I
 will test and submit a proper patch.
>>> Reading max_pages from Xen and setting it on the far side seems like a
>>> reasonable option.
>> It is the wrong level to fix the bug.  Yes - it will (and does) fix one
>> manifestation of the bug, but does not solve the problem.
>>
>>>   Is there a reason we can't add a magic XC_SAVE_ID
>>> for v1, like we do for other parameters?

Wei Liu failed to refer to:

Date: Tue, 14 Apr 2015 18:06:26 -0400
Subject: [PATCH v2 1/1] tools: Handle xc_maxmem adjustments
Thread-Topic: [PATCH v2 1/1] tools: Handle xc_maxmem adjustments
Thread-Index: AdB2/0pJoAZieMWqTomVEBNbV7iiEA==
Message-ID: <1429049186-27343-1-git-send-email-dsl...@verizon.com>
References: <20150414175437.gb11...@zion.uk.xensource.com>
In-Reply-To: <20150414175437.gb11...@zion.uk.xensource.com>

Which was this coded up.

>> Amongst other things, playing with xc_domain_setm

Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-03 Thread George Dunlap
On 06/03/2015 02:32 PM, Andrew Cooper wrote:
> On 03/06/15 14:22, George Dunlap wrote:
>> On Tue, Jun 2, 2015 at 3:05 PM, Wei Liu  wrote:
>>> Previous discussion at [0].
>>>
>>> For the benefit of discussion, we refer to max_memkb inside hypervisor
>>> as hv_max_memkb (name subject to improvement). That's the maximum number
>>> of memory a domain can use.
>> Why don't we try to use "memory" for virtual RAM that we report to the
>> guest, and "pages" for what exists inside the hypervisor?  "Pages" is
>> the term the hypervisor itself uses internally (i.e., set_max_mem()
>> actually changes a domain's max_pages value).
>>
>> So in this case both guest memory and option roms are created using
>> hypervisor pages.
>>
>>> Libxl doesn't know hv_max_memkb for a domain needs prior to QEMU start-up
>>> because of optional ROMs etc.
>> So a translation of this using "memory/pages" terminology would be:
>>
>> QEMU may need extra pages from Xen to implement option ROMS, and so at
>> the moment it calls set_max_mem() to increase max_pages so that it can
>> allocate more pages to the guest.  libxl doesn't know what max_pages a
>> domain needs prior to qemu start-up.
>>
>>> Libxl doesn't know the hv_max_memkb even after QEMU start-up, because
>>> there is no mechanism to community between QEMU and libxl. This is an
>>> area that needs improvement, we've encountered problems in this area
>>> before.
>> [translating]
>> Libxl doesn't know max_pages  even after qemu start-up, because there
>> is no mechanism to communicate between qemu and libxl.
>>
>>> QEMU calls xc_domain_setmaxmem to increase hv_max_memkb by N pages. Those
>>> pages are only accounted in hypervisor. During migration, libxl
>>> (currently) doesn't extract that value from hypervisor.
>> [translating]
>> qemu calls xc_domain_setmaxmem to increase max_pages by N pages.
>> Those pages are only accounted for in the hypervisor.  libxl
>> (currently) does not extract that value from the hypervisor.
>>
>>> So now the problem is on the remote end:
>>>
>>> 1. Libxl indicates domain needs X pages.
>>> 2. Domain actually needs X + N pages.
>>> 3. Remote end tries to write N more pages and fail.
>>>
>>> This behaviour currently doesn't affect normal migration (that you
>>> transfer libxl JSON to remote, construct a domain, then start QEMU)
>>> because QEMU won't bump hv_max_memkb again. This is by design and
>>> reflected in QEMU code.
>> I don't understand this paragraph -- does the remote domain actually
>> need X+N pages or not?  If it does, in what way does this behavior
>> "not affect normal migration"?
>>
>>> This behaviour affects COLO and becomes a bug in that case, because
>>> secondary VM's QEMU doesn't go through the same start-of-day
>>> initialisation (Hongyang, correct me if I'm wrong), i.e. no bumping
>>> hv_max_memkb inside QEMU.
>>>
>>> Andrew plans to embed JSON inside migration v2 and COLO is based on
>>> migration v2. The bug is fixed if JSON is correct in the first place.
>>>
>>> As COLO is not yet upstream, so this bug is not a blocker for 4.6. But
>>> it should be fixed for the benefit of COLO.
>>>
>>> So here is a proof of concept patch to record and honour that value
>>> during migration.  A new field is added in IDL. Note that we don't
>>> provide xl level config option for it and mandate it to be default value
>>> during domain creation. This is to prevent libxl user from using it to
>>> avoid unforeseen repercussions.
>>>
>>> This patch is compiled test only. If we agree this is the way to go I
>>> will test and submit a proper patch.
>> Reading max_pages from Xen and setting it on the far side seems like a
>> reasonable option.
> 
> It is the wrong level to fix the bug.  Yes - it will (and does) fix one
> manifestation of the bug, but does not solve the problem.
> 
>>   Is there a reason we can't add a magic XC_SAVE_ID
>> for v1, like we do for other parameters?
> 
> Amongst other things, playing with xc_domain_setmaxmem() is liable to
> cause a PoD domain to be shot by Xen because the PoD cache was not
> adjusted at the same time that maxmem was.

As far as I can tell that's completely unrelated.

The PoD target needs to be updated when you change the *balloon driver*
target in xenstore.  libxl_domain_setmaxmem() for instance won't update
the PoD target either.

> Only libxl is in a position to safely adjust domain memory.

I was initially of the same mind as you in this matter.  But I think if
you stop thinking about this as "the domain's memory", and start instead
thinking about it as qemu modifying the state of the machine (by, say,
adding option roms), then I think it can be less strict.  The *memory*
(i.e., what is seen by the guest as virtual RAM) is controlled by libxl,
but the *pages* may be manipulated by others (as they are by the balloon
driver, for instance).

Although actually -- I guess that exposes another issue with this: what
if someone calls setmaxmem in libxl?  libxl doesn't know about the extra
N pages that qemu wanted

Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-03 Thread Andrew Cooper
On 03/06/15 14:22, George Dunlap wrote:
> On Tue, Jun 2, 2015 at 3:05 PM, Wei Liu  wrote:
>> Previous discussion at [0].
>>
>> For the benefit of discussion, we refer to max_memkb inside hypervisor
>> as hv_max_memkb (name subject to improvement). That's the maximum number
>> of memory a domain can use.
> Why don't we try to use "memory" for virtual RAM that we report to the
> guest, and "pages" for what exists inside the hypervisor?  "Pages" is
> the term the hypervisor itself uses internally (i.e., set_max_mem()
> actually changes a domain's max_pages value).
>
> So in this case both guest memory and option roms are created using
> hypervisor pages.
>
>> Libxl doesn't know hv_max_memkb for a domain needs prior to QEMU start-up
>> because of optional ROMs etc.
> So a translation of this using "memory/pages" terminology would be:
>
> QEMU may need extra pages from Xen to implement option ROMS, and so at
> the moment it calls set_max_mem() to increase max_pages so that it can
> allocate more pages to the guest.  libxl doesn't know what max_pages a
> domain needs prior to qemu start-up.
>
>> Libxl doesn't know the hv_max_memkb even after QEMU start-up, because
>> there is no mechanism to community between QEMU and libxl. This is an
>> area that needs improvement, we've encountered problems in this area
>> before.
> [translating]
> Libxl doesn't know max_pages  even after qemu start-up, because there
> is no mechanism to communicate between qemu and libxl.
>
>> QEMU calls xc_domain_setmaxmem to increase hv_max_memkb by N pages. Those
>> pages are only accounted in hypervisor. During migration, libxl
>> (currently) doesn't extract that value from hypervisor.
> [translating]
> qemu calls xc_domain_setmaxmem to increase max_pages by N pages.
> Those pages are only accounted for in the hypervisor.  libxl
> (currently) does not extract that value from the hypervisor.
>
>> So now the problem is on the remote end:
>>
>> 1. Libxl indicates domain needs X pages.
>> 2. Domain actually needs X + N pages.
>> 3. Remote end tries to write N more pages and fail.
>>
>> This behaviour currently doesn't affect normal migration (that you
>> transfer libxl JSON to remote, construct a domain, then start QEMU)
>> because QEMU won't bump hv_max_memkb again. This is by design and
>> reflected in QEMU code.
> I don't understand this paragraph -- does the remote domain actually
> need X+N pages or not?  If it does, in what way does this behavior
> "not affect normal migration"?
>
>> This behaviour affects COLO and becomes a bug in that case, because
>> secondary VM's QEMU doesn't go through the same start-of-day
>> initialisation (Hongyang, correct me if I'm wrong), i.e. no bumping
>> hv_max_memkb inside QEMU.
>>
>> Andrew plans to embed JSON inside migration v2 and COLO is based on
>> migration v2. The bug is fixed if JSON is correct in the first place.
>>
>> As COLO is not yet upstream, so this bug is not a blocker for 4.6. But
>> it should be fixed for the benefit of COLO.
>>
>> So here is a proof of concept patch to record and honour that value
>> during migration.  A new field is added in IDL. Note that we don't
>> provide xl level config option for it and mandate it to be default value
>> during domain creation. This is to prevent libxl user from using it to
>> avoid unforeseen repercussions.
>>
>> This patch is compiled test only. If we agree this is the way to go I
>> will test and submit a proper patch.
> Reading max_pages from Xen and setting it on the far side seems like a
> reasonable option.

It is the wrong level to fix the bug.  Yes - it will (and does) fix one
manifestation of the bug, but does not solve the problem.

>   Is there a reason we can't add a magic XC_SAVE_ID
> for v1, like we do for other parameters?

Amongst other things, playing with xc_domain_setmaxmem() is liable to
cause a PoD domain to be shot by Xen because the PoD cache was not
adjusted at the same time that maxmem was.

Only libxl is in a position to safely adjust domain memory.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-03 Thread George Dunlap
On Tue, Jun 2, 2015 at 3:05 PM, Wei Liu  wrote:
> Previous discussion at [0].
>
> For the benefit of discussion, we refer to max_memkb inside hypervisor
> as hv_max_memkb (name subject to improvement). That's the maximum number
> of memory a domain can use.

Why don't we try to use "memory" for virtual RAM that we report to the
guest, and "pages" for what exists inside the hypervisor?  "Pages" is
the term the hypervisor itself uses internally (i.e., set_max_mem()
actually changes a domain's max_pages value).

So in this case both guest memory and option roms are created using
hypervisor pages.

> Libxl doesn't know hv_max_memkb for a domain needs prior to QEMU start-up
> because of optional ROMs etc.

So a translation of this using "memory/pages" terminology would be:

QEMU may need extra pages from Xen to implement option ROMS, and so at
the moment it calls set_max_mem() to increase max_pages so that it can
allocate more pages to the guest.  libxl doesn't know what max_pages a
domain needs prior to qemu start-up.

> Libxl doesn't know the hv_max_memkb even after QEMU start-up, because
> there is no mechanism to community between QEMU and libxl. This is an
> area that needs improvement, we've encountered problems in this area
> before.

[translating]
Libxl doesn't know max_pages  even after qemu start-up, because there
is no mechanism to communicate between qemu and libxl.

> QEMU calls xc_domain_setmaxmem to increase hv_max_memkb by N pages. Those
> pages are only accounted in hypervisor. During migration, libxl
> (currently) doesn't extract that value from hypervisor.

[translating]
qemu calls xc_domain_setmaxmem to increase max_pages by N pages.
Those pages are only accounted for in the hypervisor.  libxl
(currently) does not extract that value from the hypervisor.

> So now the problem is on the remote end:
>
> 1. Libxl indicates domain needs X pages.
> 2. Domain actually needs X + N pages.
> 3. Remote end tries to write N more pages and fail.
>
> This behaviour currently doesn't affect normal migration (that you
> transfer libxl JSON to remote, construct a domain, then start QEMU)
> because QEMU won't bump hv_max_memkb again. This is by design and
> reflected in QEMU code.

I don't understand this paragraph -- does the remote domain actually
need X+N pages or not?  If it does, in what way does this behavior
"not affect normal migration"?

> This behaviour affects COLO and becomes a bug in that case, because
> secondary VM's QEMU doesn't go through the same start-of-day
> initialisation (Hongyang, correct me if I'm wrong), i.e. no bumping
> hv_max_memkb inside QEMU.
>
> Andrew plans to embed JSON inside migration v2 and COLO is based on
> migration v2. The bug is fixed if JSON is correct in the first place.
>
> As COLO is not yet upstream, so this bug is not a blocker for 4.6. But
> it should be fixed for the benefit of COLO.
>
> So here is a proof of concept patch to record and honour that value
> during migration.  A new field is added in IDL. Note that we don't
> provide xl level config option for it and mandate it to be default value
> during domain creation. This is to prevent libxl user from using it to
> avoid unforeseen repercussions.
>
> This patch is compiled test only. If we agree this is the way to go I
> will test and submit a proper patch.

Reading max_pages from Xen and setting it on the far side seems like a
reasonable option.  Is there a reason we can't add a magic XC_SAVE_ID
for v1, like we do for other parameters?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-02 Thread Yang Hongyang



On 06/02/2015 11:49 PM, Ian Campbell wrote:

On Tue, 2015-06-02 at 15:08 +0100, Wei Liu wrote:
[...]

So here is a proof of concept patch to record and honour that value
during migration.  A new field is added in IDL. Note that we don't
provide xl level config option for it and mandate it to be default value
during domain creation. This is to prevent libxl user from using it to
avoid unforeseen repercussions.

[...]

This field is mandated to be default value during guest creation to
avoid unforeseen repercussions. It's only honour when restoring a guest.


IMHO this means that the libxl API/IDL is the wrong place for this
value. Only user and/or application serviceable parts belong in the API.

So while I agree that this value need to be communicated across a
migration, the JSON blob is not the right mechanism for doing so. IOW if
you go down this general path I think you need a new
field/record/whatever in the migration protocol at some layer or other
(if not libxc then at the libxl layer).

To my mind this "actual state" vs "user configured state" is more akin
to the sorts of things which is in the hypervisor save blob or something
like that (nb: This is not a suggestion that it should go there).

IIRC Don also outlined another case, which is
 xl create -p
 xl migrate
 xl unpause


Actually this is what COLO do. On primary, we must start using -p then
migrate to ensure the disk is consistent.



Which might need more thought if any bumping can happen after the
migrate i.e. on unpause?


.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-02 Thread Wei Liu
On Tue, Jun 02, 2015 at 05:11:02PM +0100, Andrew Cooper wrote:
> On 02/06/15 16:49, Ian Campbell wrote:
> > On Tue, 2015-06-02 at 15:08 +0100, Wei Liu wrote:
> > [...]
> >>> So here is a proof of concept patch to record and honour that value
> >>> during migration.  A new field is added in IDL. Note that we don't
> >>> provide xl level config option for it and mandate it to be default value
> >>> during domain creation. This is to prevent libxl user from using it to
> >>> avoid unforeseen repercussions.
> >> [...]
> >>> This field is mandated to be default value during guest creation to
> >>> avoid unforeseen repercussions. It's only honour when restoring a guest.
> > IMHO this means that the libxl API/IDL is the wrong place for this
> > value. Only user and/or application serviceable parts belong in the API.
> >
> > So while I agree that this value need to be communicated across a
> > migration, the JSON blob is not the right mechanism for doing so. IOW if
> > you go down this general path I think you need a new
> > field/record/whatever in the migration protocol at some layer or other
> > (if not libxc then at the libxl layer).
> >
> > To my mind this "actual state" vs "user configured state" is more akin
> > to the sorts of things which is in the hypervisor save blob or something
> > like that (nb: This is not a suggestion that it should go there).
> >
> > IIRC Don also outlined another case, which is
> > xl create -p
> > xl migrate
> > xl unpause
> >
> > Which might need more thought if any bumping can happen after the
> > migrate i.e. on unpause?
> >
> >
> 
> The problem is qemu using set_max_mem.  It should never have done so.
> 
> Nothing other than libxl should be using such hypercalls, at which point
> libxls idea of guest memory is accurate and the bug ceases to exist.
> 

That would be an ideal solution, but it's not going to materialise
any time soon because:

1. Libxl cannot know how much more memory guest needs prior to QEMU
   start-up.
2. QEMU cannot communicate back the value to libxl because
 2.1 there is no communication channel;
 2.2 there is no libxld to update the config (maybe we can work around
 this by waiting for QEMU during guest creation).

While we work towards that end, a short term solution might be:

1. Make QEMU not call xc_domain_setmaxmem.
2. Provide a field in IDL and a config option in xl to indicate how
   much more memory is needed. User can fill in that option as he / she
   sees fit.

After we get to the ideal solution we can then deprecate that field /
option.

Wei.

> ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-02 Thread Wei Liu
On Tue, Jun 02, 2015 at 04:49:09PM +0100, Ian Campbell wrote:
> On Tue, 2015-06-02 at 15:08 +0100, Wei Liu wrote:
> [...]
> > > So here is a proof of concept patch to record and honour that value
> > > during migration.  A new field is added in IDL. Note that we don't
> > > provide xl level config option for it and mandate it to be default value
> > > during domain creation. This is to prevent libxl user from using it to
> > > avoid unforeseen repercussions.
> > [...]
> > > This field is mandated to be default value during guest creation to
> > > avoid unforeseen repercussions. It's only honour when restoring a guest.
> 
> IMHO this means that the libxl API/IDL is the wrong place for this
> value. Only user and/or application serviceable parts belong in the API.
> 

I feared the same when I wrote the patch.

> So while I agree that this value need to be communicated across a
> migration, the JSON blob is not the right mechanism for doing so. IOW if
> you go down this general path I think you need a new
> field/record/whatever in the migration protocol at some layer or other
> (if not libxc then at the libxl layer).
> 

There isn't a libxl layer for migration at the moment. The stream is xl
+ libxc. Maybe we should wait till migration v2 is in place.

> To my mind this "actual state" vs "user configured state" is more akin
> to the sorts of things which is in the hypervisor save blob or something
> like that (nb: This is not a suggestion that it should go there).
> 
> IIRC Don also outlined another case, which is
> xl create -p
> xl migrate
> xl unpause
> 
> Which might need more thought if any bumping can happen after the
> migrate i.e. on unpause?
> 

I just tried. It failed at "xl migrate".

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-02 Thread Andrew Cooper
On 02/06/15 16:49, Ian Campbell wrote:
> On Tue, 2015-06-02 at 15:08 +0100, Wei Liu wrote:
> [...]
>>> So here is a proof of concept patch to record and honour that value
>>> during migration.  A new field is added in IDL. Note that we don't
>>> provide xl level config option for it and mandate it to be default value
>>> during domain creation. This is to prevent libxl user from using it to
>>> avoid unforeseen repercussions.
>> [...]
>>> This field is mandated to be default value during guest creation to
>>> avoid unforeseen repercussions. It's only honour when restoring a guest.
> IMHO this means that the libxl API/IDL is the wrong place for this
> value. Only user and/or application serviceable parts belong in the API.
>
> So while I agree that this value need to be communicated across a
> migration, the JSON blob is not the right mechanism for doing so. IOW if
> you go down this general path I think you need a new
> field/record/whatever in the migration protocol at some layer or other
> (if not libxc then at the libxl layer).
>
> To my mind this "actual state" vs "user configured state" is more akin
> to the sorts of things which is in the hypervisor save blob or something
> like that (nb: This is not a suggestion that it should go there).
>
> IIRC Don also outlined another case, which is
> xl create -p
> xl migrate
> xl unpause
>
> Which might need more thought if any bumping can happen after the
> migrate i.e. on unpause?
>
>

The problem is qemu using set_max_mem.  It should never have done so.

Nothing other than libxl should be using such hypercalls, at which point
libxls idea of guest memory is accurate and the bug ceases to exist.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-02 Thread Ian Campbell
On Tue, 2015-06-02 at 15:08 +0100, Wei Liu wrote:
[...]
> > So here is a proof of concept patch to record and honour that value
> > during migration.  A new field is added in IDL. Note that we don't
> > provide xl level config option for it and mandate it to be default value
> > during domain creation. This is to prevent libxl user from using it to
> > avoid unforeseen repercussions.
> [...]
> > This field is mandated to be default value during guest creation to
> > avoid unforeseen repercussions. It's only honour when restoring a guest.

IMHO this means that the libxl API/IDL is the wrong place for this
value. Only user and/or application serviceable parts belong in the API.

So while I agree that this value need to be communicated across a
migration, the JSON blob is not the right mechanism for doing so. IOW if
you go down this general path I think you need a new
field/record/whatever in the migration protocol at some layer or other
(if not libxc then at the libxl layer).

To my mind this "actual state" vs "user configured state" is more akin
to the sorts of things which is in the hypervisor save blob or something
like that (nb: This is not a suggestion that it should go there).

IIRC Don also outlined another case, which is
xl create -p
xl migrate
xl unpause

Which might need more thought if any bumping can happen after the
migrate i.e. on unpause?



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-02 Thread Wei Liu
I fat-fingered Andrew's email address. Really CC him this time.

On Tue, Jun 02, 2015 at 03:05:07PM +0100, Wei Liu wrote:
> Previous discussion at [0].
> 
> For the benefit of discussion, we refer to max_memkb inside hypervisor
> as hv_max_memkb (name subject to improvement). That's the maximum number
> of memory a domain can use.
> 
> Libxl doesn't know hv_max_memkb for a domain needs prior to QEMU start-up
> because of optional ROMs etc.
> 
> Libxl doesn't know the hv_max_memkb even after QEMU start-up, because
> there is no mechanism to community between QEMU and libxl. This is an
> area that needs improvement, we've encountered problems in this area
> before.
> 
> QEMU calls xc_domain_setmaxmem to increase hv_max_memkb by N pages. Those
> pages are only accounted in hypervisor. During migration, libxl
> (currently) doesn't extract that value from hypervisor.
> 
> So now the problem is on the remote end:
> 
> 1. Libxl indicates domain needs X pages.
> 2. Domain actually needs X + N pages.
> 3. Remote end tries to write N more pages and fail.
> 
> This behaviour currently doesn't affect normal migration (that you
> transfer libxl JSON to remote, construct a domain, then start QEMU)
> because QEMU won't bump hv_max_memkb again. This is by design and
> reflected in QEMU code.
> 
> This behaviour affects COLO and becomes a bug in that case, because
> secondary VM's QEMU doesn't go through the same start-of-day
> initialisation (Hongyang, correct me if I'm wrong), i.e. no bumping
> hv_max_memkb inside QEMU.
> 
> Andrew plans to embed JSON inside migration v2 and COLO is based on
> migration v2. The bug is fixed if JSON is correct in the first place.
> 
> As COLO is not yet upstream, so this bug is not a blocker for 4.6. But
> it should be fixed for the benefit of COLO.
> 
> So here is a proof of concept patch to record and honour that value
> during migration.  A new field is added in IDL. Note that we don't
> provide xl level config option for it and mandate it to be default value
> during domain creation. This is to prevent libxl user from using it to
> avoid unforeseen repercussions.
> 
> This patch is compiled test only. If we agree this is the way to go I
> will test and submit a proper patch.
> 
> Wei.
> 
> [0] <1428941353-18673-1-git-send-email-dsl...@verizon.com>
> 
> ---8<---
> >From ab9dc179ea4ee26eb88f61f8dad36dd01b63bb6b Mon Sep 17 00:00:00 2001
> From: Wei Liu 
> Date: Tue, 2 Jun 2015 14:53:20 +0100
> Subject: [PATCH] libxl: record and honour hv_max_memkb
> 
> The new field hv_max_memkb in IDL is used to record "max_memkb" inside
> hypervisor. That reflects the maximum memory a guest can ask for.
> 
> This field is mandated to be default value during guest creation to
> avoid unforeseen repercussions. It's only honour when restoring a guest.
> 
> (XXX compiled test only at this stage)
> 
> Signed-off-by: Wei Liu 
> ---
>  tools/libxl/libxl.c | 17 +
>  tools/libxl/libxl_create.c  |  6 ++
>  tools/libxl/libxl_dom.c |  9 +++--
>  tools/libxl/libxl_types.idl |  1 +
>  4 files changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> index 9117b01..72fec8b 100644
> --- a/tools/libxl/libxl.c
> +++ b/tools/libxl/libxl.c
> @@ -6614,6 +6614,7 @@ int libxl_retrieve_domain_configuration(libxl_ctx *ctx, 
> uint32_t domid,
>  GC_INIT(ctx);
>  int rc;
>  libxl__domain_userdata_lock *lock = NULL;
> +uint64_t hv_max_memkb;
>  
>  CTX_LOCK;
>  
> @@ -6654,6 +6655,7 @@ int libxl_retrieve_domain_configuration(libxl_ctx *ctx, 
> uint32_t domid,
>  }
>  libxl_uuid_copy(ctx, &d_config->c_info.uuid, &info.uuid);
>  libxl_dominfo_dispose(&info);
> +hv_max_memkb = info.max_memkb; /* store and use later */
>  }
>  
>  /* Memory limits:
> @@ -6661,17 +6663,15 @@ int libxl_retrieve_domain_configuration(libxl_ctx 
> *ctx, uint32_t domid,
>   * Currently there are three memory limits:
>   *  1. "target" in xenstore (originally memory= in config file)
>   *  2. "static-max" in xenstore (originally maxmem= in config file)
> - *  3. "max_memkb" in hypervisor
> - *
> - * The third one is not visible and currently managed by
> - * toolstack. In order to rebuild a domain we only need to have
> - * "target" and "static-max".
> + *  3. "max_memkb" in hypervisor (corresponds to hv_max_memkb in
> + * idl, not visible to xl level)
>   */
>  {
> -uint32_t target_memkb = 0, max_memkb = 0;
> +uint32_t target_memkb = 0, static_max_memkb = 0;
>  
>  /* "target" */
> -rc = libxl__get_memory_target(gc, domid, &target_memkb, &max_memkb);
> +rc = libxl__get_memory_target(gc, domid, &target_memkb,
> +  &static_max_memkb);
>  if (rc) {
>  LOG(ERROR, "fail to get memory target for domain %d", domid);
>  goto out;
> @@ -6683,7 +6683,8 @@ int libxl_ret

[Xen-devel] RFC: QEMU bumping memory limit and domain restore

2015-06-02 Thread Wei Liu
Previous discussion at [0].

For the benefit of discussion, we refer to max_memkb inside hypervisor
as hv_max_memkb (name subject to improvement). That's the maximum number
of memory a domain can use.

Libxl doesn't know hv_max_memkb for a domain needs prior to QEMU start-up
because of optional ROMs etc.

Libxl doesn't know the hv_max_memkb even after QEMU start-up, because
there is no mechanism to community between QEMU and libxl. This is an
area that needs improvement, we've encountered problems in this area
before.

QEMU calls xc_domain_setmaxmem to increase hv_max_memkb by N pages. Those
pages are only accounted in hypervisor. During migration, libxl
(currently) doesn't extract that value from hypervisor.

So now the problem is on the remote end:

1. Libxl indicates domain needs X pages.
2. Domain actually needs X + N pages.
3. Remote end tries to write N more pages and fail.

This behaviour currently doesn't affect normal migration (that you
transfer libxl JSON to remote, construct a domain, then start QEMU)
because QEMU won't bump hv_max_memkb again. This is by design and
reflected in QEMU code.

This behaviour affects COLO and becomes a bug in that case, because
secondary VM's QEMU doesn't go through the same start-of-day
initialisation (Hongyang, correct me if I'm wrong), i.e. no bumping
hv_max_memkb inside QEMU.

Andrew plans to embed JSON inside migration v2 and COLO is based on
migration v2. The bug is fixed if JSON is correct in the first place.

As COLO is not yet upstream, so this bug is not a blocker for 4.6. But
it should be fixed for the benefit of COLO.

So here is a proof of concept patch to record and honour that value
during migration.  A new field is added in IDL. Note that we don't
provide xl level config option for it and mandate it to be default value
during domain creation. This is to prevent libxl user from using it to
avoid unforeseen repercussions.

This patch is compiled test only. If we agree this is the way to go I
will test and submit a proper patch.

Wei.

[0] <1428941353-18673-1-git-send-email-dsl...@verizon.com>

---8<---
>From ab9dc179ea4ee26eb88f61f8dad36dd01b63bb6b Mon Sep 17 00:00:00 2001
From: Wei Liu 
Date: Tue, 2 Jun 2015 14:53:20 +0100
Subject: [PATCH] libxl: record and honour hv_max_memkb

The new field hv_max_memkb in IDL is used to record "max_memkb" inside
hypervisor. That reflects the maximum memory a guest can ask for.

This field is mandated to be default value during guest creation to
avoid unforeseen repercussions. It's only honour when restoring a guest.

(XXX compiled test only at this stage)

Signed-off-by: Wei Liu 
---
 tools/libxl/libxl.c | 17 +
 tools/libxl/libxl_create.c  |  6 ++
 tools/libxl/libxl_dom.c |  9 +++--
 tools/libxl/libxl_types.idl |  1 +
 4 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 9117b01..72fec8b 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -6614,6 +6614,7 @@ int libxl_retrieve_domain_configuration(libxl_ctx *ctx, 
uint32_t domid,
 GC_INIT(ctx);
 int rc;
 libxl__domain_userdata_lock *lock = NULL;
+uint64_t hv_max_memkb;
 
 CTX_LOCK;
 
@@ -6654,6 +6655,7 @@ int libxl_retrieve_domain_configuration(libxl_ctx *ctx, 
uint32_t domid,
 }
 libxl_uuid_copy(ctx, &d_config->c_info.uuid, &info.uuid);
 libxl_dominfo_dispose(&info);
+hv_max_memkb = info.max_memkb; /* store and use later */
 }
 
 /* Memory limits:
@@ -6661,17 +6663,15 @@ int libxl_retrieve_domain_configuration(libxl_ctx *ctx, 
uint32_t domid,
  * Currently there are three memory limits:
  *  1. "target" in xenstore (originally memory= in config file)
  *  2. "static-max" in xenstore (originally maxmem= in config file)
- *  3. "max_memkb" in hypervisor
- *
- * The third one is not visible and currently managed by
- * toolstack. In order to rebuild a domain we only need to have
- * "target" and "static-max".
+ *  3. "max_memkb" in hypervisor (corresponds to hv_max_memkb in
+ * idl, not visible to xl level)
  */
 {
-uint32_t target_memkb = 0, max_memkb = 0;
+uint32_t target_memkb = 0, static_max_memkb = 0;
 
 /* "target" */
-rc = libxl__get_memory_target(gc, domid, &target_memkb, &max_memkb);
+rc = libxl__get_memory_target(gc, domid, &target_memkb,
+  &static_max_memkb);
 if (rc) {
 LOG(ERROR, "fail to get memory target for domain %d", domid);
 goto out;
@@ -6683,7 +6683,8 @@ int libxl_retrieve_domain_configuration(libxl_ctx *ctx, 
uint32_t domid,
 d_config->b_info.target_memkb = target_memkb +
 d_config->b_info.video_memkb;
 
-d_config->b_info.max_memkb = max_memkb;
+d_config->b_info.max_memkb = static_max_memkb;
+d_config->b_info.hv_max_memkb = hv_max_memkb;
 }
 
 /* Devices: disk, nic, vtpm, pc