Re: [Xen-devel] [PATCH Remus v2 00/10] Remus support for Migration-v2

2015-05-12 Thread Yang Hongyang



On 05/11/2015 07:01 PM, Andrew Cooper wrote:

On 11/05/15 11:48, Hongyang Yang wrote:


On 05/11/2015 05:00 PM, Andrew Cooper wrote:

On 11/05/15 07:28, Hongyang Yang wrote:

On 05/09/2015 02:12 AM, Andrew Cooper wrote:

On 08/05/15 10:33, Yang Hongyang wrote:

This patchset implement the Remus support for Migration v2 but
without
memory compressing.

[...]




last iter of memory


end_of_checkpoint()
Checkpoint record


ctx-save.callbacks-postcopy()
this callback should not be omitted, it do some necessary work before
resume
primary (such as call Remus devices preresume callbacks to ensure the
disk
data is consistent) and then resume the primary guest. I think this
callback should be renamed to ctx-save.callbacks-resume().


That looks to be a useful cleanup (and answers one of my questions of
what exactly postcopy was)




   ctx-save.callbacks-checkpoint()
   libxl qemu record


Maybe we should add another callback to send qemu record instead of
using checkpoint callback. We can call it
ctx-save.callbacks-save_qemu()


This is another layering violation.  libxc should not prescribe what
libxl might or might not do.  One example we are experimenting with in
XenServer at the moment is support for multiple emulators attached to a
single domain, which would necessitate two LIBXL_EMULATOR records to be
sent per checkpoint.  libxl might also want to send an updated json blob
or such.


Ok, so we'd better not introduce save_qemu callback.




Then in checkpoint callback, we only call remus devices commit
callbacks(
which will release the network buffer etc...) then decide whether we
need to
do another checkpoint or quit checkpointed stream.
With Remus, checkpoint callback only wait for 200ms(can be specified
by -i)
then return.
With COLO, checkpoint callback will ask COLO proxy if we need to do a
checkpoint, will return when COLO proxy module indicate a checkpoint
is needed.


That sounds like COLO wants a should_checkpoint() callback which
separates the decision to make a checkpoint from the logic of
implementing a checkpoint.


We use checkpoint callback to do should_checkpoint() thing currently.
libxc will check the return value of checkpoint callback.


But that causes a chicken  egg problem.

I am planning to use a CHECKPOINT record to synchronise the transfer of
ownership of the FD between libxc and libxl.  Therefore, a CHECKPOINT
record must be in the stream ahead of the checkpoint() callback, as
libxl will then write/read some records in itself.


The record name CHECKPOINT seems do not match the thing what you are
planning to do, In this case I think END-OF-CHECKPOINT which represent the
END of libxc side checkpoint is better, when libxc side checkpoint end,
libxc should transfer the ownership of FD to libxl and let libxl to
handle the following stream. libxl side can also use END-OF-CHECKPOINT
as a sign to hand the ownership of the FD to libxc.



As a result, the checkpoint() callback itself can't be used to gate
whether a CHECKPOINT record is written by libxc.


I was wondering how you will do the FD transfer job?










   ...
   libxl end-of-checkpoint record
   ctx-save.callbacks-checkpoint() returns
start_of_checkpoint()


ctx-save.callbacks-suspend()


memory
end_of_checkpoint()
Checkpoint record
etc...

This will eventually allow both libxc and libxl to send checkpoint
data
(and by the looks of it, remove the need for postcopy()).  With this
libxc/remus work it is fine to use XG_LIBXL_HVM_COMPAT to cover the
current qemu situation, but I would prefer not to be also retrofitting
libxc checkpoint records when doing the libxl/migv2 work.

Does this look plausible in for Remus (and eventually COLO) support?


With comments above, I would suggest the save flow as below:

libxc writes:   libxl writes:

live migration:
Image Header
Domain Header
start_of_stream()
start_of_checkpoint()
live memory
ctx-save.callbacks-suspend()
last iter memory
end_of_checkpoint()
if ( checkpointd )
End of Checkpoint record
/*If resotre side receives this record, input fd should be handed to
libxl*/
else
goto end

loop of checkpointed stream:
ctx-save.callbacks-resume()
ctx-save.callbacks-save_qemu()
  libxl qemu record
  ...
  libxl end-of-checkpoint record
/*If resotre side receives this record, input fd should be handed to
libxc*/
ctx-save.callbacks-save_qemu() returns
ctx-save.callbacks-checkpoint()
start_of_checkpoint()
ctx-save.callbacks-suspend()
memory
end_of_checkpoint()
End of Checkpoint record
goto 'loop of checkpointed stream'

end:
END record
/*If resotre side receives this record, input fd should be handed to
libxl*/


In order to keep it simple, we can keep the current
ctx-save.callbacks-checkpoint() as it is, which do the save_qemu
thing, call
Remus 

Re: [Xen-devel] [PATCH Remus v2 00/10] Remus support for Migration-v2

2015-05-12 Thread Yang Hongyang



On 05/12/2015 05:40 PM, Andrew Cooper wrote:

On 12/05/15 09:12, Yang Hongyang wrote:



That sounds like COLO wants a should_checkpoint() callback which
separates the decision to make a checkpoint from the logic of
implementing a checkpoint.


We use checkpoint callback to do should_checkpoint() thing currently.
libxc will check the return value of checkpoint callback.


But that causes a chicken  egg problem.

I am planning to use a CHECKPOINT record to synchronise the transfer of
ownership of the FD between libxc and libxl.  Therefore, a CHECKPOINT
record must be in the stream ahead of the checkpoint() callback, as
libxl will then write/read some records in itself.


The record name CHECKPOINT seems do not match the thing what you are
planning to do, In this case I think END-OF-CHECKPOINT which represent
the
END of libxc side checkpoint is better, when libxc side checkpoint end,
libxc should transfer the ownership of FD to libxl and let libxl to
handle the following stream. libxl side can also use END-OF-CHECKPOINT
as a sign to hand the ownership of the FD to libxc.


END_OF_CHECKPOINT implies the presence of START_OF_CHECKPOINT.  The
current spec for CHECKPOINT is more of a sentinal value between
checkpoints of data.





As a result, the checkpoint() callback itself can't be used to gate
whether a CHECKPOINT record is written by libxc.


I was wondering how you will do the FD transfer job?


The FD needs to be readable/writable in both the libxl and
libxl-save-helper processes.  The CHECKPOINT record simply signals a
transfer of ownership.


I have a 4th alternative in mind, but would like your feedback from my
comments in this email first.


So what's the 4th alternative?


I have some corrections to my patch series based on David's feedback,
and your comments.  After that, it should hopefully be far easier to
describe.


OK, I've addressed all comments on my series and wait for your series
to continue :-)


Sent.  Sorry for the delay (I also have some XenServer issues I am
working on atm).


Never mind, and thank you very much for the quick turnaround! The design
looks more clear now, It really helps me a lot!



~Andrew
.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH Remus v2 00/10] Remus support for Migration-v2

2015-05-12 Thread Andrew Cooper
On 12/05/15 09:12, Yang Hongyang wrote:

 That sounds like COLO wants a should_checkpoint() callback which
 separates the decision to make a checkpoint from the logic of
 implementing a checkpoint.

 We use checkpoint callback to do should_checkpoint() thing currently.
 libxc will check the return value of checkpoint callback.

 But that causes a chicken  egg problem.

 I am planning to use a CHECKPOINT record to synchronise the transfer of
 ownership of the FD between libxc and libxl.  Therefore, a CHECKPOINT
 record must be in the stream ahead of the checkpoint() callback, as
 libxl will then write/read some records in itself.

 The record name CHECKPOINT seems do not match the thing what you are
 planning to do, In this case I think END-OF-CHECKPOINT which represent
 the
 END of libxc side checkpoint is better, when libxc side checkpoint end,
 libxc should transfer the ownership of FD to libxl and let libxl to
 handle the following stream. libxl side can also use END-OF-CHECKPOINT
 as a sign to hand the ownership of the FD to libxc.

END_OF_CHECKPOINT implies the presence of START_OF_CHECKPOINT.  The
current spec for CHECKPOINT is more of a sentinal value between
checkpoints of data.



 As a result, the checkpoint() callback itself can't be used to gate
 whether a CHECKPOINT record is written by libxc.

 I was wondering how you will do the FD transfer job?

The FD needs to be readable/writable in both the libxl and
libxl-save-helper processes.  The CHECKPOINT record simply signals a
transfer of ownership.

 I have a 4th alternative in mind, but would like your feedback from my
 comments in this email first.

 So what's the 4th alternative?

 I have some corrections to my patch series based on David's feedback,
 and your comments.  After that, it should hopefully be far easier to
 describe.

 OK, I've addressed all comments on my series and wait for your series
 to continue :-)

Sent.  Sorry for the delay (I also have some XenServer issues I am
working on atm).

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH Remus v2 00/10] Remus support for Migration-v2

2015-05-11 Thread Hongyang Yang

On 05/09/2015 02:12 AM, Andrew Cooper wrote:

On 08/05/15 10:33, Yang Hongyang wrote:

This patchset implement the Remus support for Migration v2 but without
memory compressing.

The series can be found on github:
https://github.com/macrosheep/xen/tree/Remus-newmig-v2

PATCH 1-7: Some refactor and prepare work.
PATCH 8-9: The main Remus loop implement.
PATCH 10: Fix for Remus.


I have reviewed the other half of the series now, and have some design
to discuss.  (I was hoping to get this email sent in reply to v1, but
never mind).  This largely concerns patch 7 and onwards.

Migration v2 has substantially more structure than legacy did.  Once
issue so far is that your series relies on using more than one END
record, which is not supported in the spec.  (Of course - the spec is
fine to be extended in forward-compatible ways.)


I use END record as a info that indicate the end of the stream. I saw
that you add a checkpoint record in your v2 series of Remus related patches,
I can use that record to indicate the end of the checkpointed stream, but
I think the record better to be called as end-of-checkpoint?



To fix the qemu layering issues I need to have some explicit negotiation
between libxc and libxl about sharing ownership of the input fd.  This
is going to require a new record in the format, and I currently drafting
a patch or two which should help in this regard.

My view for the eventual stream looks something like this (time going
downwards):

libxc writes:   libxl writes:

Image Header
Domain Header
start_of_stream()
start_of_checkpoint()


live memory

ctx-save.callbacks-suspend()
this callback suspend the primary guest and then calls Remus devices
postsuspend callbacks to buffer the network pkts etc.

last iter of memory


end_of_checkpoint()
Checkpoint record


ctx-save.callbacks-postcopy()
this callback should not be omitted, it do some necessary work before resume
primary (such as call Remus devices preresume callbacks to ensure the disk
data is consistent) and then resume the primary guest. I think this
callback should be renamed to ctx-save.callbacks-resume().


 ctx-save.callbacks-checkpoint()
 libxl qemu record


Maybe we should add another callback to send qemu record instead of
using checkpoint callback. We can call it ctx-save.callbacks-save_qemu()
Then in checkpoint callback, we only call remus devices commit callbacks(
which will release the network buffer etc...) then decide whether we need to
do another checkpoint or quit checkpointed stream.
With Remus, checkpoint callback only wait for 200ms(can be specified by -i)
then return.
With COLO, checkpoint callback will ask COLO proxy if we need to do a
checkpoint, will return when COLO proxy module indicate a checkpoint is needed.


 ...
 libxl end-of-checkpoint record
 ctx-save.callbacks-checkpoint() returns
start_of_checkpoint()


ctx-save.callbacks-suspend()


memory
end_of_checkpoint()
Checkpoint record
etc...

This will eventually allow both libxc and libxl to send checkpoint data
(and by the looks of it, remove the need for postcopy()).  With this
libxc/remus work it is fine to use XG_LIBXL_HVM_COMPAT to cover the
current qemu situation, but I would prefer not to be also retrofitting
libxc checkpoint records when doing the libxl/migv2 work.

Does this look plausible in for Remus (and eventually COLO) support?


With comments above, I would suggest the save flow as below:

libxc writes:   libxl writes:

live migration:
Image Header
Domain Header
start_of_stream()
start_of_checkpoint()
live memory
ctx-save.callbacks-suspend()
last iter memory
end_of_checkpoint()
if ( checkpointd )
  End of Checkpoint record
  /*If resotre side receives this record, input fd should be handed to libxl*/
else
  goto end

loop of checkpointed stream:
ctx-save.callbacks-resume()
ctx-save.callbacks-save_qemu()
libxl qemu record
...
libxl end-of-checkpoint record
/*If resotre side receives this record, input fd should be handed to libxc*/
ctx-save.callbacks-save_qemu() returns
ctx-save.callbacks-checkpoint()
start_of_checkpoint()
ctx-save.callbacks-suspend()
memory
end_of_checkpoint()
End of Checkpoint record
goto 'loop of checkpointed stream'

end:
END record
/*If resotre side receives this record, input fd should be handed to libxl*/


In order to keep it simple, we can keep the current 
ctx-save.callbacks-checkpoint() as it is, which do the save_qemu thing, call

Remus devices commit callbacks and then decide whether we need a checkpoint. We
can also combine the ctx-save.callbacks-resume() with
ctx-save.callbacks-checkpoint(), with only one checkpoint() callback, we do
the following things:
 - Call Remus devices preresume callbacks
 - Resume the primary
 - Save qemu records
 - Call Remus devices commit 

Re: [Xen-devel] [PATCH Remus v2 00/10] Remus support for Migration-v2

2015-05-11 Thread Andrew Cooper
On 11/05/15 11:48, Hongyang Yang wrote:

 On 05/11/2015 05:00 PM, Andrew Cooper wrote:
 On 11/05/15 07:28, Hongyang Yang wrote:
 On 05/09/2015 02:12 AM, Andrew Cooper wrote:
 On 08/05/15 10:33, Yang Hongyang wrote:
 This patchset implement the Remus support for Migration v2 but
 without
 memory compressing.
 [...]


 last iter of memory

 end_of_checkpoint()
 Checkpoint record

 ctx-save.callbacks-postcopy()
 this callback should not be omitted, it do some necessary work before
 resume
 primary (such as call Remus devices preresume callbacks to ensure the
 disk
 data is consistent) and then resume the primary guest. I think this
 callback should be renamed to ctx-save.callbacks-resume().

 That looks to be a useful cleanup (and answers one of my questions of
 what exactly postcopy was)


   ctx-save.callbacks-checkpoint()
   libxl qemu record

 Maybe we should add another callback to send qemu record instead of
 using checkpoint callback. We can call it
 ctx-save.callbacks-save_qemu()

 This is another layering violation.  libxc should not prescribe what
 libxl might or might not do.  One example we are experimenting with in
 XenServer at the moment is support for multiple emulators attached to a
 single domain, which would necessitate two LIBXL_EMULATOR records to be
 sent per checkpoint.  libxl might also want to send an updated json blob
 or such.

 Ok, so we'd better not introduce save_qemu callback.


 Then in checkpoint callback, we only call remus devices commit
 callbacks(
 which will release the network buffer etc...) then decide whether we
 need to
 do another checkpoint or quit checkpointed stream.
 With Remus, checkpoint callback only wait for 200ms(can be specified
 by -i)
 then return.
 With COLO, checkpoint callback will ask COLO proxy if we need to do a
 checkpoint, will return when COLO proxy module indicate a checkpoint
 is needed.

 That sounds like COLO wants a should_checkpoint() callback which
 separates the decision to make a checkpoint from the logic of
 implementing a checkpoint.

 We use checkpoint callback to do should_checkpoint() thing currently.
 libxc will check the return value of checkpoint callback.

But that causes a chicken  egg problem.

I am planning to use a CHECKPOINT record to synchronise the transfer of
ownership of the FD between libxc and libxl.  Therefore, a CHECKPOINT
record must be in the stream ahead of the checkpoint() callback, as
libxl will then write/read some records in itself.

As a result, the checkpoint() callback itself can't be used to gate
whether a CHECKPOINT record is written by libxc.




   ...
   libxl end-of-checkpoint record
   ctx-save.callbacks-checkpoint() returns
 start_of_checkpoint()

 ctx-save.callbacks-suspend()

 memory
 end_of_checkpoint()
 Checkpoint record
 etc...

 This will eventually allow both libxc and libxl to send checkpoint
 data
 (and by the looks of it, remove the need for postcopy()).  With this
 libxc/remus work it is fine to use XG_LIBXL_HVM_COMPAT to cover the
 current qemu situation, but I would prefer not to be also retrofitting
 libxc checkpoint records when doing the libxl/migv2 work.

 Does this look plausible in for Remus (and eventually COLO) support?

 With comments above, I would suggest the save flow as below:

 libxc writes:   libxl writes:

 live migration:
 Image Header
 Domain Header
 start_of_stream()
 start_of_checkpoint()
 live memory
 ctx-save.callbacks-suspend()
 last iter memory
 end_of_checkpoint()
 if ( checkpointd )
End of Checkpoint record
/*If resotre side receives this record, input fd should be handed to
 libxl*/
 else
goto end

 loop of checkpointed stream:
 ctx-save.callbacks-resume()
 ctx-save.callbacks-save_qemu()
  libxl qemu record
  ...
  libxl end-of-checkpoint record
 /*If resotre side receives this record, input fd should be handed to
 libxc*/
 ctx-save.callbacks-save_qemu() returns
 ctx-save.callbacks-checkpoint()
 start_of_checkpoint()
 ctx-save.callbacks-suspend()
 memory
 end_of_checkpoint()
 End of Checkpoint record
 goto 'loop of checkpointed stream'

 end:
 END record
 /*If resotre side receives this record, input fd should be handed to
 libxl*/


 In order to keep it simple, we can keep the current
 ctx-save.callbacks-checkpoint() as it is, which do the save_qemu
 thing, call
 Remus devices commit callbacks and then decide whether we need a
 checkpoint. We
 can also combine the ctx-save.callbacks-resume() with
 ctx-save.callbacks-checkpoint(), with only one checkpoint()
 callback, we do
 the following things:
   - Call Remus devices preresume callbacks
   - Resume the primary
   - Save qemu records
   - Call Remus devices commit callbacks
   - Decide whether we need a checkpoint

 Overall, there are 3 options for the save flow:
 1. 

Re: [Xen-devel] [PATCH Remus v2 00/10] Remus support for Migration-v2

2015-05-11 Thread Andrew Cooper
On 11/05/15 07:28, Hongyang Yang wrote:
 On 05/09/2015 02:12 AM, Andrew Cooper wrote:
 On 08/05/15 10:33, Yang Hongyang wrote:
 This patchset implement the Remus support for Migration v2 but without
 memory compressing.

 The series can be found on github:
 https://github.com/macrosheep/xen/tree/Remus-newmig-v2

 PATCH 1-7: Some refactor and prepare work.
 PATCH 8-9: The main Remus loop implement.
 PATCH 10: Fix for Remus.

 I have reviewed the other half of the series now, and have some design
 to discuss.  (I was hoping to get this email sent in reply to v1, but
 never mind).  This largely concerns patch 7 and onwards.

 Migration v2 has substantially more structure than legacy did.  Once
 issue so far is that your series relies on using more than one END
 record, which is not supported in the spec.  (Of course - the spec is
 fine to be extended in forward-compatible ways.)

 I use END record as a info that indicate the end of the stream.

I suspected this, but is not a backwards compatible use of the migration
v2 stream.  There must only be a single END record, and it must be the
very last record the save side produces.

 I saw
 that you add a checkpoint record in your v2 series of Remus related
 patches,
 I can use that record to indicate the end of the checkpointed stream, but
 I think the record better to be called as end-of-checkpoint?

It is the logical end of the libxc bits of a checkpoint, but not of the
(soon to exist) libxl bits.



 To fix the qemu layering issues I need to have some explicit negotiation
 between libxc and libxl about sharing ownership of the input fd.  This
 is going to require a new record in the format, and I currently drafting
 a patch or two which should help in this regard.

 My view for the eventual stream looks something like this (time going
 downwards):

 libxc writes:   libxl writes:

 Image Header
 Domain Header
 start_of_stream()
 start_of_checkpoint()

 live memory

 ctx-save.callbacks-suspend()
 this callback suspend the primary guest and then calls Remus devices
 postsuspend callbacks to buffer the network pkts etc.

Sorry yes - I omitted this call in the example for brevity, but was not
intending to omit it from the code.


 last iter of memory

 end_of_checkpoint()
 Checkpoint record

 ctx-save.callbacks-postcopy()
 this callback should not be omitted, it do some necessary work before
 resume
 primary (such as call Remus devices preresume callbacks to ensure the
 disk
 data is consistent) and then resume the primary guest. I think this
 callback should be renamed to ctx-save.callbacks-resume().

That looks to be a useful cleanup (and answers one of my questions of
what exactly postcopy was)


  ctx-save.callbacks-checkpoint()
  libxl qemu record

 Maybe we should add another callback to send qemu record instead of
 using checkpoint callback. We can call it
 ctx-save.callbacks-save_qemu()

This is another layering violation.  libxc should not prescribe what
libxl might or might not do.  One example we are experimenting with in
XenServer at the moment is support for multiple emulators attached to a
single domain, which would necessitate two LIBXL_EMULATOR records to be
sent per checkpoint.  libxl might also want to send an updated json blob
or such.

 Then in checkpoint callback, we only call remus devices commit callbacks(
 which will release the network buffer etc...) then decide whether we
 need to
 do another checkpoint or quit checkpointed stream.
 With Remus, checkpoint callback only wait for 200ms(can be specified
 by -i)
 then return.
 With COLO, checkpoint callback will ask COLO proxy if we need to do a
 checkpoint, will return when COLO proxy module indicate a checkpoint
 is needed.

That sounds like COLO wants a should_checkpoint() callback which
separates the decision to make a checkpoint from the logic of
implementing a checkpoint.


  ...
  libxl end-of-checkpoint record
  ctx-save.callbacks-checkpoint() returns
 start_of_checkpoint()

 ctx-save.callbacks-suspend()

 memory
 end_of_checkpoint()
 Checkpoint record
 etc...

 This will eventually allow both libxc and libxl to send checkpoint data
 (and by the looks of it, remove the need for postcopy()).  With this
 libxc/remus work it is fine to use XG_LIBXL_HVM_COMPAT to cover the
 current qemu situation, but I would prefer not to be also retrofitting
 libxc checkpoint records when doing the libxl/migv2 work.

 Does this look plausible in for Remus (and eventually COLO) support?

 With comments above, I would suggest the save flow as below:

 libxc writes:   libxl writes:

 live migration:
 Image Header
 Domain Header
 start_of_stream()
 start_of_checkpoint()
 live memory
 ctx-save.callbacks-suspend()
 last iter memory
 end_of_checkpoint()
 if ( checkpointd )
   End of Checkpoint record
   /*If resotre side receives this record, input fd should be 

Re: [Xen-devel] [PATCH Remus v2 00/10] Remus support for Migration-v2

2015-05-11 Thread Hongyang Yang


On 05/11/2015 05:00 PM, Andrew Cooper wrote:

On 11/05/15 07:28, Hongyang Yang wrote:

On 05/09/2015 02:12 AM, Andrew Cooper wrote:

On 08/05/15 10:33, Yang Hongyang wrote:

This patchset implement the Remus support for Migration v2 but without
memory compressing.

[...]




last iter of memory


end_of_checkpoint()
Checkpoint record


ctx-save.callbacks-postcopy()
this callback should not be omitted, it do some necessary work before
resume
primary (such as call Remus devices preresume callbacks to ensure the
disk
data is consistent) and then resume the primary guest. I think this
callback should be renamed to ctx-save.callbacks-resume().


That looks to be a useful cleanup (and answers one of my questions of
what exactly postcopy was)




  ctx-save.callbacks-checkpoint()
  libxl qemu record


Maybe we should add another callback to send qemu record instead of
using checkpoint callback. We can call it
ctx-save.callbacks-save_qemu()


This is another layering violation.  libxc should not prescribe what
libxl might or might not do.  One example we are experimenting with in
XenServer at the moment is support for multiple emulators attached to a
single domain, which would necessitate two LIBXL_EMULATOR records to be
sent per checkpoint.  libxl might also want to send an updated json blob
or such.


Ok, so we'd better not introduce save_qemu callback.




Then in checkpoint callback, we only call remus devices commit callbacks(
which will release the network buffer etc...) then decide whether we
need to
do another checkpoint or quit checkpointed stream.
With Remus, checkpoint callback only wait for 200ms(can be specified
by -i)
then return.
With COLO, checkpoint callback will ask COLO proxy if we need to do a
checkpoint, will return when COLO proxy module indicate a checkpoint
is needed.


That sounds like COLO wants a should_checkpoint() callback which
separates the decision to make a checkpoint from the logic of
implementing a checkpoint.


We use checkpoint callback to do should_checkpoint() thing currently.
libxc will check the return value of checkpoint callback.






  ...
  libxl end-of-checkpoint record
  ctx-save.callbacks-checkpoint() returns
start_of_checkpoint()


ctx-save.callbacks-suspend()


memory
end_of_checkpoint()
Checkpoint record
etc...

This will eventually allow both libxc and libxl to send checkpoint data
(and by the looks of it, remove the need for postcopy()).  With this
libxc/remus work it is fine to use XG_LIBXL_HVM_COMPAT to cover the
current qemu situation, but I would prefer not to be also retrofitting
libxc checkpoint records when doing the libxl/migv2 work.

Does this look plausible in for Remus (and eventually COLO) support?


With comments above, I would suggest the save flow as below:

libxc writes:   libxl writes:

live migration:
Image Header
Domain Header
start_of_stream()
start_of_checkpoint()
live memory
ctx-save.callbacks-suspend()
last iter memory
end_of_checkpoint()
if ( checkpointd )
   End of Checkpoint record
   /*If resotre side receives this record, input fd should be handed to
libxl*/
else
   goto end

loop of checkpointed stream:
ctx-save.callbacks-resume()
ctx-save.callbacks-save_qemu()
 libxl qemu record
 ...
 libxl end-of-checkpoint record
/*If resotre side receives this record, input fd should be handed to
libxc*/
ctx-save.callbacks-save_qemu() returns
ctx-save.callbacks-checkpoint()
start_of_checkpoint()
ctx-save.callbacks-suspend()
memory
end_of_checkpoint()
End of Checkpoint record
goto 'loop of checkpointed stream'

end:
END record
/*If resotre side receives this record, input fd should be handed to
libxl*/


In order to keep it simple, we can keep the current
ctx-save.callbacks-checkpoint() as it is, which do the save_qemu
thing, call
Remus devices commit callbacks and then decide whether we need a
checkpoint. We
can also combine the ctx-save.callbacks-resume() with
ctx-save.callbacks-checkpoint(), with only one checkpoint()
callback, we do
the following things:
  - Call Remus devices preresume callbacks
  - Resume the primary
  - Save qemu records
  - Call Remus devices commit callbacks
  - Decide whether we need a checkpoint

Overall, there are 3 options for the save flow:
1. keep the current callbacks, rename postcopy() to resume()
2. split the checkpoint() callback to save_qemu() and checkpoint()
3. combine the current postcopy() and checkpoint()
Which one do you think is the best?


I have a 4th alternative in mind, but would like your feedback from my
comments in this email first.


So what's the 4th alternative?



~Andrew
.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH Remus v2 00/10] Remus support for Migration-v2

2015-05-08 Thread Yang Hongyang
This patchset implement the Remus support for Migration v2 but without
memory compressing.

The series can be found on github:
https://github.com/macrosheep/xen/tree/Remus-newmig-v2

PATCH 1-7: Some refactor and prepare work.
PATCH 8-9: The main Remus loop implement.
PATCH 10: Fix for Remus.

v2:
 - move to_send bitmap to ctx-save union and rename it to dirty_bitmap
 - introduce setup() and cleanup() on save
 - rename send_some_pages to send_dirty_pages
 - remove defer the setting of HVM_PARAM_IDENT_PT, it should be fixed
   on hypervisor side
 - the last patch still there for my test purpose until Andrew find a
   suitable solution(which we commented on the first series) :)

v1:
initial support

Summary of changes:
M = modified
A = acked
N = new,
no mark = unchanged from last round

Yang Hongyang (10):
M  tools/libxc: adjust the memory allocation for migration
N  tools/libxc: introduce setup() and cleanup() on save
N  tools/libxc: rename send_some_pages to send_dirty_pages
N  tools/libxc: introduce DECLARE_HYPERCALL_BUFFER_USER_POINTER
M  tools/libxc: reuse send_dirty_pages() in send_all_pages()
   tools/libxc: introduce process_record()
   tools/libxc: split read/handle qemu info
   tools/libxc: implement Remus checkpointed save
   tools/libxc: implement Remus checkpointed restore
   tools/libxc: X86_PV_INFO can be sent multiple times under Remus

 tools/libxc/include/xenctrl.h   |   8 ++
 tools/libxc/include/xenguest.h  |   1 +
 tools/libxc/xc_bitops.h |   5 +
 tools/libxc/xc_sr_common.h  |  15 +++
 tools/libxc/xc_sr_restore.c | 179 ++-
 tools/libxc/xc_sr_restore_x86_hvm.c |  28 -
 tools/libxc/xc_sr_restore_x86_pv.c  |   2 +-
 tools/libxc/xc_sr_save.c| 234 +++-
 tools/libxl/libxl_dom.c |   1 +
 9 files changed, 330 insertions(+), 143 deletions(-)

-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH Remus v2 00/10] Remus support for Migration-v2

2015-05-08 Thread Andrew Cooper
On 08/05/15 10:33, Yang Hongyang wrote:
 This patchset implement the Remus support for Migration v2 but without
 memory compressing.

 The series can be found on github:
 https://github.com/macrosheep/xen/tree/Remus-newmig-v2

 PATCH 1-7: Some refactor and prepare work.
 PATCH 8-9: The main Remus loop implement.
 PATCH 10: Fix for Remus.

I have reviewed the other half of the series now, and have some design
to discuss.  (I was hoping to get this email sent in reply to v1, but
never mind).  This largely concerns patch 7 and onwards.

Migration v2 has substantially more structure than legacy did.  Once
issue so far is that your series relies on using more than one END
record, which is not supported in the spec.  (Of course - the spec is
fine to be extended in forward-compatible ways.)

To fix the qemu layering issues I need to have some explicit negotiation
between libxc and libxl about sharing ownership of the input fd.  This
is going to require a new record in the format, and I currently drafting
a patch or two which should help in this regard.

My view for the eventual stream looks something like this (time going
downwards):

libxc writes:   libxl writes:

Image Header
Domain Header
start_of_stream()
start_of_checkpoint()
memory
end_of_checkpoint()
Checkpoint record
ctx-save.callbacks-checkpoint()
libxl qemu record
...
libxl end-of-checkpoint record
ctx-save.callbacks-checkpoint() returns
start_of_checkpoint()
memory
end_of_checkpoint()
Checkpoint record
etc...

This will eventually allow both libxc and libxl to send checkpoint data
(and by the looks of it, remove the need for postcopy()).  With this
libxc/remus work it is fine to use XG_LIBXL_HVM_COMPAT to cover the
current qemu situation, but I would prefer not to be also retrofitting
libxc checkpoint records when doing the libxl/migv2 work.

Does this look plausible in for Remus (and eventually COLO) support?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel