We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.
We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is
We add each netdev (except vhost-net) a default filter-buffer,
which will be used for COLO or Micro-checkpoint to buffer VM's packets.
The name of default filter-buffer is 'nop'.
For the default filter-buffer, it will not buffer any packets in default.
So it has no side effect for the netdev.
Add a migrate state: MIGRATION_STATUS_COLO, enter this migration state
after the first live migration successfully finished.
We reuse migration thread, so if colo is enabled by user, migration thread will
go into the process of colo.
Signed-off-by: zhanghailiang
For PVM, if there is failover request from users.
The colo thread will exit the loop while the failover BH does the
cleanup work and resumes VM.
Signed-off-by: zhanghailiang
Signed-off-by: Li Zhijian
Reviewed-by: Dr. David Alan Gilbert
We need to release all the packets from VM in COLO or Micro-checkpoint,
here we add a new helper function to realse the packets that buffered
by default buffer-filter
Signed-off-by: zhanghailiang
Cc: Jason Wang
Cc: Yang Hongyang
If we start qemu with -S, the runstate will change from 'prelaunch' to 'running'
after going into colo state.
So it is necessary to update the global runstate after going into colo state.
Signed-off-by: zhanghailiang
Signed-off-by: Li Zhijian
Enable default filter to buffer packets and release the
packets after a checkpoint.
Signed-off-by: zhanghailiang
Cc: Jason Wang
Cc: Yang Hongyang
---
v12:
- Add a helper function to check if all netdev supports
The default buffer filter doesn't buffer packets in default,
but we need to buffer packets for COLO or Micro-checkpoint,
Here we add a helper function to enable/disable filter's buffer
capability.
Signed-off-by: zhanghailiang
Cc: Jason Wang
valentin writes:
Please disregard this patch. The patch from Mark Cave-Ayland, dated
December 20, 2015 already fixed this issue.
> From: Valentin Rakush
>
> This patch fixes compilation errors when --enable-trace-backend=stderr option
> is
On 12/29/2015 02:31 PM, Zhang Chen wrote:
> Hi~
> Just a small ping...
> No news for a week.
> Colo proxy is a part of COLO project, we need review and comments.
>
>
> Thanks
> zhangchen
Hi, will find sometime to review this this week.
Thanks
>
>
> On 12/22/2015 06:42 PM, Zhang Chen wrote:
>>
For migration destination, we also need to know its state,
we will use it in COLO.
Here we add a new member 'state' for MigrationIncomingState,
and also use migrate_set_state() to modify its value.
Signed-off-by: zhanghailiang
Reviewed-by: Dr. David Alan Gilbert
We need communications protocol of user-defined to control the checkpoint
process.
The new checkpoint request is started by Primary VM, and the interactive process
like below:
Checkpoint synchronizing points:
Primary Secondary
Introduce two new QEMUSizedBuffer APIs which will be used by COLO to buffer
VM state:
One is qsb_put_buffer(), which put the content of a given QEMUSizedBuffer
into QEMUFile, this is used to send buffered VM state to secondary.
Another is qsb_fill_buffer(), read 'size' bytes of data from the file
We record the address of the dirty pages that received,
it will help flushing pages that cached into SVM.
We record them by re-using migration dirty bitmap.
Signed-off-by: zhanghailiang
Reviewed-by: Dr. David Alan Gilbert
---
v12:
- Add
We should not destroy the state of SVM (Secondary VM) until we receive the whole
state from the PVM (Primary VM), in case the primary fails in the middle of
sending
the state, so, here we cache the device state in Secondary before restore it.
Besides, we should call qemu_system_reset() before
When handling failover, we do different things according to the different stage
of failover process, here we introduce a global atomic variable to record the
status of failover.
We add four failover status to indicate the different stage of failover process.
You should use the helpers to get and
If users require SVM to takeover work, colo incoming thread should
exit from loop while failover BH helps backing to migration incoming
coroutine.
Signed-off-by: zhanghailiang
Signed-off-by: Li Zhijian
Reviewed-by: Dr. David Alan Gilbert
For COLO's checkpoint process, we will do savevm/loadvm repeatedly.
So every time we call qemu_loadvm_section_start_full(), we will
add all sections information into loadvm_handlers list for one time.
There will be many instances in loadvm_handlers for one section,
and this will lead to memory
qemu_loadvm_state is too long, and we can simplify it by splitting up
with three helper functions.
Signed-off-by: zhanghailiang
Reviewed-by: Dr. David Alan Gilbert
v13:
- Add Reviewed-by tag
---
migration/savevm.c | 156
We add helper function colo_supported() to indicate whether
colo is supported or not, with which we use to control whether or not
showing 'x-colo' string to users, they can use qmp command
'query-migrate-capabilities' or hmp command 'info migrate_capabilities'
to learn if colo is supported.
Cc:
Make sure master start block replication after slave's block replication
started.
Signed-off-by: zhanghailiang
Signed-off-by: Wen Congyang
Signed-off-by: Li Zhijian
---
migration/colo.c | 52
Do checkpoint periodically, the default interval is 200ms.
Signed-off-by: zhanghailiang
Signed-off-by: Li Zhijian
Reviewed-by: Dr. David Alan Gilbert
---
v12:
- Add Reviewed-by tag
v11:
- Fix wrong sleep time for
Cc: Markus Armbruster
On 2015/12/29 15:08, zhanghailiang wrote:
This is the 13th version of COLO (Still only support periodic checkpoint).
Here is only COLO frame part, you can get the whole codes from github:
https://github.com/coloft/qemu/commits/colo-v2.4-periodic-mode
It is unnecessary to call qemu_savevm_state_begin() in every checkponit process.
It mainly sets up devices and does the first device state pass. These data will
not change during the later checkpoint process. So, we split it out of
colo_do_checkpoint_transaction(), in this way, we can reduce these
We separate the process of saving/loading ram and device state when do
checkpoint,
we add new helpers for save/load ram/device. With this change, we can directly
transfer ram from master to slave without using QEMUSizeBuffer as assistant,
which also reduce the size of extra memory been used
If VM is in COLO FT state, we should do some extra work before normal shutdown
process. SVM will ignore the shutdown command if this command is issued directly
to it, PVM will send the shutdown command to SVM if it gets this command.
Cc: Paolo Bonzini
Signed-off-by:
Guest will enter this state when paused to save/restore VM state
under colo checkpoint.
Cc: Eric Blake
Cc: Markus Armbruster
Signed-off-by: zhanghailiang
Signed-off-by: Li Zhijian
Signed-off-by:
We should not do failover work while the main thread is loading
VM's state, otherwise it will destroy the consistent of VM's memory and
device state.
Here we add a new failover status 'RELAUNCH' which means we should
relaunch the process of failover.
Signed-off-by: zhanghailiang
Split host_from_stream_offset() into two parts:
One is to get ram block, which the block idstr may be get from migration
stream, the other is to get hva (host) address from block and the offset.
Besides, we will do the check working in a new helper offset_in_ramblock().
Signed-off-by:
If some errors happen during VM's COLO FT stage, it's important to notify the
users
of this event. Together with 'x_colo_lost_heartbeat', users can intervene in
COLO's
failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users
For default buffer filter, its 'interval' value is zero,
so here we should accept zero interval.
Signed-off-by: zhanghailiang
Reviewed-by: Yang Hongyang
Cc: Jason Wang
---
v12:
- Add Reviewed-by tag
v11:
- Add
If the net connection between COLO's two sides is broken while colo/colo
incoming
thread is blocked in 'read'/'write' socket fd. It will not detect this error
until
connect timeout. It will be a long time.
Here we shutdown all the related socket file descriptors to wake up the blocking
101 - 132 of 132 matches
Mail list logo