[Qemu-devel] [PATCH v2 3/3] migration: introduce adaptive model for waiting thread

2019-01-10 Thread guangrong . xiao
From: Xiao Guangrong Currently we have two behaviors if all threads are busy to do compression, the main thread mush wait one of them becoming free if @compress-wait-thread set to on or the main thread can directly return without wait and post the page out as normal one Both of them have its pro

[Qemu-devel] [PATCH v2 1/3] migration: introduce pages-per-second

2019-01-10 Thread guangrong . xiao
From: Xiao Guangrong It introduces a new statistic, pages-per-second, as bandwidth or mbps is not enough to measure the performance of posting pages out as we have compression, xbzrle, which can significantly reduce the amount of the data size, instead, pages-per-second is the one we want Signed

[Qemu-devel] [PATCH v2 2/3] migration: fix memory leak when updating tls-creds and tls-hostname

2019-01-10 Thread guangrong . xiao
From: Xiao Guangrong If we update parameter, tls-creds and tls-hostname, these string values are duplicated to local variables in migrate_params_test_apply() by using g_strdup(), however these new allocated memory are missed to be freed Actually, they are not used to check anything, we can direc

[Qemu-devel] [PATCH v2 0/3] optimize waiting for free thread to do compression

2019-01-10 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v2: squash 'compress-wait-thread-adaptive' into 'compress-wait-thread' based on peter's suggestion Currently we have two behaviors if all threads are busy to do compression, the main thread mush wait one of them becoming free if @compress-wait-thread set to on

[Qemu-devel] [PATCH 1/2] migration: introduce compress-wait-thread-adaptive

2018-12-13 Thread guangrong . xiao
From: Xiao Guangrong Currently we have two behaviors if all threads are busy to do compression, the main thread mush wait one of them becoming free if @compress-wait-thread set to on or the main thread can directly return without wait and post the page out as normal one Both of them have its pro

[Qemu-devel] [PATCH 0/2] optimize waiting for free thread to do compression

2018-12-12 Thread guangrong . xiao
From: Xiao Guangrong Currently we have two behaviors if all threads are busy to do compression, the main thread mush wait one of them becoming free if @compress-wait-thread set to on or the main thread can directly return without wait and post the page out as normal one Both of them have its pro

[Qemu-devel] [PATCH 2/2] migration: introduce pages-per-second

2018-12-12 Thread guangrong . xiao
From: Xiao Guangrong It introduces a new statistic, pages-per-second, as bandwidth or mbps is not enough to measure the performance of posting pages out as we have compression, xbzrle, which can significantly reduce the amount of the data size, instead, pages-per-second if the one we want Signed

[Qemu-devel] [PATCH v3 5/5] tests: add threaded-workqueue-bench

2018-11-21 Thread guangrong . xiao
From: Xiao Guangrong It's the benhcmark of threaded-workqueue, also it's a good example to show how threaded-workqueue is used Signed-off-by: Xiao Guangrong --- tests/Makefile.include | 5 +- tests/threaded-workqueue-bench.c | 255 +++ 2 files ch

[Qemu-devel] [PATCH v3 3/5] migration: use threaded workqueue for compression

2018-11-21 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the threaded workqueue Signed-off-by: Xiao Guangrong --- migration/ram.c | 308 1 file changed, 110 insertions(+), 198 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index

[Qemu-devel] [PATCH v3 4/5] migration: use threaded workqueue for decompression

2018-11-21 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the threaded workqueue Signed-off-by: Xiao Guangrong --- migration/ram.c | 222 1 file changed, 77 insertions(+), 145 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 2

[Qemu-devel] [PATCH v3 2/5] util: introduce threaded workqueue

2018-11-21 Thread guangrong . xiao
From: Xiao Guangrong This modules implements the lockless and efficient threaded workqueue. Three abstracted objects are used in this module: - Request. It not only contains the data that the workqueue fetches out to finish the request but also offers the space to save the result af

[Qemu-devel] [PATCH v3 1/5] bitops: introduce change_bit_atomic

2018-11-21 Thread guangrong . xiao
From: Xiao Guangrong It will be used by threaded workqueue Signed-off-by: Xiao Guangrong --- include/qemu/bitops.h | 13 + 1 file changed, 13 insertions(+) diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h index 3f0926cf40..c522958852 100644 --- a/include/qemu/bitops.h ++

[Qemu-devel] [PATCH v3 0/5] migration: improve multithreads

2018-11-21 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v3: Thanks to Emilio's comments and his example code, the changes in this version are: 1. move @requests from the shared data struct to each single thread 2. move completion ev from the shared data struct to each single thread 3. move bitmaps from the shared data

[Qemu-devel] [PATCH v2 3/5] migration: use threaded workqueue for compression

2018-11-06 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the threaded workqueue Signed-off-by: Xiao Guangrong --- migration/ram.c | 313 +--- 1 file changed, 115 insertions(+), 198 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index

[Qemu-devel] [PATCH v2 1/5] bitops: introduce change_bit_atomic

2018-11-06 Thread guangrong . xiao
From: Xiao Guangrong It will be used by threaded workqueue Signed-off-by: Xiao Guangrong --- include/qemu/bitops.h | 13 + 1 file changed, 13 insertions(+) diff --git a/include/qemu/bitops.h b/include/qemu/bitops.h index 3f0926cf40..c522958852 100644 --- a/include/qemu/bitops.h ++

[Qemu-devel] [PATCH v2 5/5] tests: add threaded-workqueue-bench

2018-11-06 Thread guangrong . xiao
From: Xiao Guangrong It's the benhcmark of threaded-workqueue, also it's a good example to show how threaded-workqueue is used Signed-off-by: Xiao Guangrong --- tests/Makefile.include | 5 +- tests/threaded-workqueue-bench.c | 256 +++ 2 files ch

[Qemu-devel] [PATCH v2 4/5] migration: use threaded workqueue for decompression

2018-11-06 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the threaded workqueue Signed-off-by: Xiao Guangrong --- migration/ram.c | 225 1 file changed, 81 insertions(+), 144 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index a

[Qemu-devel] [PATCH v2 2/5] util: introduce threaded workqueue

2018-11-06 Thread guangrong . xiao
From: Xiao Guangrong This modules implements the lockless and efficient threaded workqueue. Three abstracted objects are used in this module: - Request. It not only contains the data that the workqueue fetches out to finish the request but also offers the space to save the result af

[Qemu-devel] [PATCH v2 0/5] migration: improve multithreads

2018-11-06 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v2: These changes are based on Paolo's suggestion: 1) rename the lockless multithreads model to threaded workqueue 2) hugely improve the internal design, that make all the request be a large array, properly partition it, assign requests to threads respectiv

[Qemu-devel] [PATCH 4/4] migration: use lockless Multithread model for decompression

2018-10-16 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the lockless multithread model Signed-off-by: Xiao Guangrong --- migration/ram.c | 223 1 file changed, 78 insertions(+), 145 deletions(-) diff --git a/migration/ram.c b/migration/ram.c

[Qemu-devel] [PATCH 2/4] migration: introduce lockless multithreads model

2018-10-16 Thread guangrong . xiao
From: Xiao Guangrong Current implementation of compression and decompression are very hard to be enabled on productions. We noticed that too many wait-wakes go to kernel space and CPU usages are very low even if the system is really free The reasons are: 1) there are two many locks used to do sy

[Qemu-devel] [PATCH 1/4] ptr_ring: port ptr_ring from linux kernel to QEMU

2018-10-16 Thread guangrong . xiao
From: Xiao Guangrong ptr_ring is good to minimize cache-contention and has the simple model of memory barrier which will be used by lockless threads model to pass requests between main migration thread and compression threads Some changes are made: 1) drop unnecessary APIs, e.g, for _irq, _bh AP

[Qemu-devel] [PATCH 0/4] migration: improve multithreads

2018-10-16 Thread guangrong . xiao
From: Xiao Guangrong This is the last part of our previous work: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00526.html This part finally improves the multithreads model used by compression and decompression, that makes the compression feature is really usable in the production.

[Qemu-devel] [PATCH 3/4] migration: use lockless Multithread model for compression

2018-10-16 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the lockless multithread model Signed-off-by: Xiao Guangrong --- migration/ram.c | 312 +--- 1 file changed, 115 insertions(+), 197 deletions(-) diff --git a/migration/ram.c b/migration/ram.

[Qemu-devel] [PATCH v6 3/3] migration: use save_page_use_compression in flush_compressed_data

2018-09-06 Thread guangrong . xiao
From: Xiao Guangrong It avoids to touch compression locks if xbzrle and compression are both enabled Signed-off-by: Xiao Guangrong --- migration/ram.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index 65a563993d..747dd9208b 100644 --

[Qemu-devel] [PATCH v6 2/3] migration: show the statistics of compression

2018-09-06 Thread guangrong . xiao
From: Xiao Guangrong Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy compressed-size: amount of bytes after compression compression-rate: rate of compressed size R

[Qemu-devel] [PATCH v6 0/3] migration: compression optimization

2018-09-06 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v6: Thanks to Juan's review, in this version we 1) move flush compressed data to find_dirty_block() where it hits the end of memblock 2) use save_page_use_compression instead of migrate_use_compression in flush_compressed_data Xiao Guangrong (3): migrat

[Qemu-devel] [PATCH v6 1/3] migration: do not flush_compressed_data at the end of iteration

2018-09-06 Thread guangrong . xiao
From: Xiao Guangrong flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feeds new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all t

[Qemu-devel] [PATCH v5 3/4] migration: show the statistics of compression

2018-09-03 Thread guangrong . xiao
From: Xiao Guangrong Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy compressed-size: amount of bytes after compression compression-rate: rate of compressed size R

[Qemu-devel] [PATCH v5 4/4] migration: handle the error condition properly

2018-09-03 Thread guangrong . xiao
From: Xiao Guangrong ram_find_and_save_block() can return negative if any error hanppens, however, it is completely ignored in current code Signed-off-by: Xiao Guangrong --- migration/ram.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/migration/ram.c

[Qemu-devel] [PATCH v5 2/4] migration: fix calculating xbzrle_counters.cache_miss_rate

2018-09-03 Thread guangrong . xiao
From: Xiao Guangrong As Peter pointed out: | - xbzrle_counters.cache_miss is done in save_xbzrle_page(), so it's | per-guest-page granularity | | - RAMState.iterations is done for each ram_find_and_save_block(), so | it's per-host-page granularity | | An example is that when we migrate a 2M h

[Qemu-devel] [PATCH v5 1/4] migration: do not flush_compressed_data at the end of each iteration

2018-09-03 Thread guangrong . xiao
From: Xiao Guangrong flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feeds new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all t

[Qemu-devel] [PATCH v5 0/4] migration: compression optimization

2018-09-03 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v5: use the way in the older version to handle flush_compressed_data in the iteration, i.e, introduce dirty_sync_count and flush compressed data if the count is changed. That's because we should post the data after QEMU_VM_SECTION_PART has been posted

[Qemu-devel] [PATCH v4 10/10] migration: handle the error condition properly

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong ram_find_and_save_block() can return negative if any error hanppens, however, it is completely ignored in current code Signed-off-by: Xiao Guangrong --- migration/ram.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/migration/ram.c

[Qemu-devel] [PATCH v4 09/10] migration: show the statistics of compression

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy compressed-size: amount of bytes after compression compression-rate: rate of compressed size R

[Qemu-devel] [PATCH v4 08/10] migration: fix calculating xbzrle_counters.cache_miss_rate

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong As Peter pointed out: | - xbzrle_counters.cache_miss is done in save_xbzrle_page(), so it's | per-guest-page granularity | | - RAMState.iterations is done for each ram_find_and_save_block(), so | it's per-host-page granularity | | An example is that when we migrate a 2M h

[Qemu-devel] [PATCH v4 05/10] migration: move handle of zero page to the thread

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong Detecting zero page is not a light work, moving it to the thread to speed the main thread up, btw, handling ram_release_pages() for the zero page is moved to the thread as well Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 96 ++

[Qemu-devel] [PATCH v4 06/10] migration: hold the lock only if it is really needed

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong Try to hold src_page_req_mutex only if the queue is not empty Reviewed-by: Dr. David Alan Gilbert Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- include/qemu/queue.h | 1 + migration/ram.c | 4 2 files changed, 5 insertions(+) diff --git a/include/qem

[Qemu-devel] [PATCH v4 07/10] migration: do not flush_compressed_data at the end of each iteration

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feeds new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all t

[Qemu-devel] [PATCH v4 03/10] migration: introduce save_zero_page_to_file

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong It will be used by the compression threads Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 40 ++-- 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index d631b9

[Qemu-devel] [PATCH v4 01/10] migration: do not wait for free thread

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong Instead of putting the main thread to sleep state to wait for free compression thread, we can directly post it out as normal page that reduces the latency and uses CPUs more efficiently A parameter, compress-wait-thread, is introduced, it can be enabled if the user really wa

[Qemu-devel] [PATCH v4 04/10] migration: drop the return value of do_compress_ram_page

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong It is not used and cleans the code up a little Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 26 +++--- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 49ace30614..e463

[Qemu-devel] [PATCH v4 00/10] migration: compression optimization

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v4: These changes are based on the suggestion from Peter Eric. 1) improve qapi's grammar 2) move calling flush_compressed_data to migration_bitmap_sync() 3) rename 'handle_pages' to 'target_page_count' Note: there is still no clear way to fix handling the error

[Qemu-devel] [PATCH v4 02/10] migration: fix counting normal page for compression

2018-08-21 Thread guangrong . xiao
From: Xiao Guangrong The compressed page is not normal page Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 1 - 1 file changed, 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index ae9e83c2b6..d631b9a6fe 100644 --- a/migration/ram.c +++ b/migration/ra

[Qemu-devel] [PATCH v3 08/10] migration: handle the error condition properly

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong ram_find_and_save_block() can return negative if any error hanppens, however, it is completely ignored in current code Signed-off-by: Xiao Guangrong --- migration/ram.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/migration/ram.c

[Qemu-devel] [PATCH v3 06/10] migration: hold the lock only if it is really needed

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong Try to hold src_page_req_mutex only if the queue is not empty Reviewed-by: Dr. David Alan Gilbert Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- include/qemu/queue.h | 1 + migration/ram.c | 4 2 files changed, 5 insertions(+) diff --git a/include/qem

[Qemu-devel] [PATCH v3 10/10] migration: show the statistics of compression

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy compressed-size: amount of bytes after compression compression-rate: rate of compressed size S

[Qemu-devel] [PATCH v3 09/10] migration: fix calculating xbzrle_counters.cache_miss_rate

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong As Peter pointed out: | - xbzrle_counters.cache_miss is done in save_xbzrle_page(), so it's | per-guest-page granularity | | - RAMState.iterations is done for each ram_find_and_save_block(), so | it's per-host-page granularity | | An example is that when we migrate a 2M h

[Qemu-devel] [PATCH v3 04/10] migration: drop the return value of do_compress_ram_page

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong It is not used and cleans the code up a little Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 26 +++--- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index 49ace30614..e463

[Qemu-devel] [PATCH v3 02/10] migration: fix counting normal page for compression

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong The compressed page is not normal page Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 1 - 1 file changed, 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index ae9e83c2b6..d631b9a6fe 100644 --- a/migration/ram.c +++ b/migration/ra

[Qemu-devel] [PATCH v3 03/10] migration: introduce save_zero_page_to_file

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong It will be used by the compression threads Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 40 ++-- 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index d631b9

[Qemu-devel] [PATCH v3 05/10] migration: move handle of zero page to the thread

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong Detecting zero page is not a light work, moving it to the thread to speed the main thread up, btw, handling ram_release_pages() for the zero page is moved to the thread as well Signed-off-by: Xiao Guangrong --- migration/ram.c | 96 +

[Qemu-devel] [PATCH v3 07/10] migration: do not flush_compressed_data at the end of each iteration

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feeds new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all t

[Qemu-devel] [PATCH v3 01/10] migration: do not wait for free thread

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong Instead of putting the main thread to sleep state to wait for free compression thread, we can directly post it out as normal page that reduces the latency and uses CPUs more efficiently A parameter, compress-wait-thread, is introduced, it can be enabled if the user really wa

[Qemu-devel] [PATCH v3 00/10] migration: compression optimization

2018-08-07 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v3: Thanks to Peter's comments, the changes in this version are: 1) make compress-wait-thread be true on default to keep current behavior 2) save the compressed-size instead of reduced size and fix calculating compression ratio 3) fix calculating xbzrle_count

[Qemu-devel] [PATCH v2 7/8] migration: hold the lock only if it is really needed

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong Try to hold src_page_req_mutex only if the queue is not empty Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Xiao Guangrong --- include/qemu/queue.h | 1 + migration/ram.c | 4 2 files changed, 5 insertions(+) diff --git a/include/qemu/queue.h b/include/qem

[Qemu-devel] [PATCH v2 6/8] migration: move handle of zero page to the thread

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong Detecting zero page is not a light work, moving it to the thread to speed the main thread up Signed-off-by: Xiao Guangrong --- migration/ram.c | 112 +++- 1 file changed, 78 insertions(+), 34 deletions(-) diff --git a/mi

[Qemu-devel] [PATCH v2 8/8] migration: do not flush_compressed_data at the end of each iteration

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feeds new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all t

[Qemu-devel] [PATCH v2 5/8] migration: drop the return value of do_compress_ram_page

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong It is not used and cleans the code up a little Signed-off-by: Xiao Guangrong --- migration/ram.c | 26 +++--- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index ce6e69b649..5aa624b3b9 100644 --- a/mig

[Qemu-devel] [PATCH v2 4/8] migration: introduce save_zero_page_to_file

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong It will be used by the compression threads Signed-off-by: Xiao Guangrong --- migration/ram.c | 40 ++-- 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index e68b0e6dec..ce6e69b649 100644

[Qemu-devel] [PATCH v2 1/8] migration: do not wait for free thread

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong Instead of putting the main thread to sleep state to wait for free compression thread, we can directly post it out as normal page that reduces the latency and uses CPUs more efficiently A parameter, compress-wait-thread, is introduced, it can be enabled if the user really wa

[Qemu-devel] [PATCH v2 3/8] migration: show the statistics of compression

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy reduced-size: amount of bytes reduced by compression compression-rate: rate of compressed size

[Qemu-devel] [PATCH v2 2/8] migration: fix counting normal page for compression

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong The compressed page is not normal page Signed-off-by: Xiao Guangrong --- migration/ram.c | 1 - 1 file changed, 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index 0ad234c692..1b016e048d 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1903,7 +1903,6

[Qemu-devel] [PATCH v2 0/8] migration: compression optimization

2018-07-19 Thread guangrong . xiao
From: Xiao Guangrong Thanks to Peter's suggestion, i split the long series (1) and this is the first part. I am not sure if Dave is happy to @reduced-size, will change immediately if it's objected. :) Changelog in v2: 1) introduce a parameter to make the main thread wait for free thread thre

[Qemu-devel] [PATCH 12/12] migration: use lockless Multithread model for decompression

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the lockless multithread model Signed-off-by: Xiao Guangrong --- migration/ram.c | 381 ++-- 1 file changed, 175 insertions(+), 206 deletions(-) diff --git a/migration/ram.c b/migration/ram.

[Qemu-devel] [PATCH 10/12] migration: introduce lockless multithreads model

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Current implementation of compression and decompression are very hard to be enabled on productions. We noticed that too many wait-wakes go to kernel space and CPU usages are very low even if the system is really free The reasons are: 1) there are two many locks used to do sy

[Qemu-devel] [PATCH 11/12] migration: use lockless Multithread model for compression

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Adapt the compression code to the lockless multithread model Signed-off-by: Xiao Guangrong --- migration/ram.c | 412 ++-- 1 file changed, 161 insertions(+), 251 deletions(-) diff --git a/migration/ram.c b/migration/ram.

[Qemu-devel] [PATCH 08/12] migration: do not flush_compressed_data at the end of each iteration

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feed new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all th

[Qemu-devel] [PATCH 07/12] migration: hold the lock only if it is really needed

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Try to hold src_page_req_mutex only if the queue is not empty Signed-off-by: Xiao Guangrong --- include/qemu/queue.h | 1 + migration/ram.c | 4 2 files changed, 5 insertions(+) diff --git a/include/qemu/queue.h b/include/qemu/queue.h index 59fd1203a1..ac418efc4

[Qemu-devel] [PATCH 09/12] ring: introduce lockless ring buffer

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong It's the simple lockless ring buffer implement which supports both single producer vs. single consumer and multiple producers vs. single consumer. Many lessons were learned from Linux Kernel's kfifo (1) and DPDK's rte_ring (2) before i wrote this implement. It corrects some

[Qemu-devel] [PATCH 01/12] migration: do not wait if no free thread

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Instead of putting the main thread to sleep state to wait for free compression thread, we can directly post it out as normal page that reduces the latency and uses CPUs more efficiently Signed-off-by: Xiao Guangrong --- migration/ram.c | 34 +++-

[Qemu-devel] [PATCH 04/12] migration: introduce migration_update_rates

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong It is used to slightly clean the code up, no logic is changed Signed-off-by: Xiao Guangrong --- migration/ram.c | 35 ++- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index dd1283dd45..ee0

[Qemu-devel] [PATCH 06/12] migration: do not detect zero page for compression

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Detecting zero page is not a light work, we can disable it for compression that can handle all zero data very well Signed-off-by: Xiao Guangrong --- migration/ram.c | 44 +++- 1 file changed, 23 insertions(+), 21 deletions(-) diff -

[Qemu-devel] [PATCH 02/12] migration: fix counting normal page for compression

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong The compressed page is not normal page Signed-off-by: Xiao Guangrong --- migration/ram.c | 1 - 1 file changed, 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index 0caf32ab0a..dbf24d8c87 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1432,7 +1432,6

[Qemu-devel] [PATCH 05/12] migration: show the statistics of compression

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Then the uses can adjust the parameters based on this info Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy reduced-size: amount of bytes reduc

[Qemu-devel] [PATCH 03/12] migration: fix counting xbzrle cache_miss_rate

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Sync up xbzrle_cache_miss_prev only after migration iteration goes forward Signed-off-by: Xiao Guangrong --- migration/ram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index dbf24d8c87..dd1283dd45 100644 --- a/migr

[Qemu-devel] [PATCH 00/12] migration: improve multithreads for compression and decompression

2018-06-04 Thread guangrong . xiao
From: Xiao Guangrong Background -- Current implementation of compression and decompression are very hard to be enabled on productions. We noticed that too many wait-wakes go to kernel space and CPU usages are very low even if the system is really free The reasons are: 1) there are two ma

[Qemu-devel] [PATCH v2] migration: introduce decompress-error-check

2018-05-03 Thread guangrong . xiao
From: Xiao Guangrong QEMU 2.13 enables strict check for compression & decompression to make the migration more robust, that depends on the source to fix the internal design which triggers the unexpected error conditions To make it work for migrating old version QEMU to 2.13 QEMU, we introduce th

[Qemu-devel] [PATCH] migration: fix saving normal page even if it's been compressed

2018-04-28 Thread guangrong . xiao
From: Xiao Guangrong Fix the bug introduced by da3f56cb2e767016 (migration: remove ram_save_compressed_page()), It should be 'return' rather than 'res' Sorry for this stupid mistake :( Signed-off-by: Xiao Guangrong --- migration/ram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) dif

[Qemu-devel] [PATCH] migration: introduce decompress-error-check

2018-04-26 Thread guangrong . xiao
From: Xiao Guangrong QEMU 2.13 enables strict check for compression & decompression to make the migration more robuster, that depends on the source to fix the internal design which triggers the unexpected error conditions To make it work for migrating old version QEMU to 2.13 QEMU, we introduce

[Qemu-devel] [PATCH v3 10/10] migration: remove ram_save_compressed_page()

2018-03-30 Thread guangrong . xiao
From: Xiao Guangrong Now, we can reuse the path in ram_save_page() to post the page out as normal, then the only thing remained in ram_save_compressed_page() is compression that we can move it out to the caller Reviewed-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Xiao Guang

[Qemu-devel] [PATCH v3 05/10] migration: introduce control_save_page()

2018-03-30 Thread guangrong . xiao
From: Xiao Guangrong Abstract the common function control_save_page() to cleanup the code, no logic is changed Reviewed-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Xiao Guangrong --- migration/ram.c | 174 +--- 1 file ch

[Qemu-devel] [PATCH v3 09/10] migration: introduce save_normal_page()

2018-03-30 Thread guangrong . xiao
From: Xiao Guangrong It directly sends the page to the stream neither checking zero nor using xbzrle or compression Reviewed-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Xiao Guangrong --- migration/ram.c | 50 ++ 1 file chan

[Qemu-devel] [PATCH v3 02/10] migration: stop compression to allocate and free memory frequently

2018-03-30 Thread guangrong . xiao
From: Xiao Guangrong Current code uses compress2() to compress memory which manages memory internally, that causes huge memory is allocated and freed very frequently More worse, frequently returning memory to kernel will flush TLBs and trigger invalidation callbacks on mmu-notification which int

[Qemu-devel] [PATCH v3 03/10] migration: stop decompression to allocate and free memory frequently

2018-03-30 Thread guangrong . xiao
From: Xiao Guangrong Current code uses uncompress() to decompress memory which manages memory internally, that causes huge memory is allocated and freed very frequently, more worse, frequently returning memory to kernel will flush TLBs So, we maintain the memory by ourselves and reuse it for eac

[Qemu-devel] [PATCH v3 07/10] migration: move calling control_save_page to the common place

2018-03-30 Thread guangrong . xiao
From: Xiao Guangrong The function is called by both ram_save_page and ram_save_target_page, so move it to the common caller to cleanup the code Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff

[Qemu-devel] [PATCH v3 08/10] migration: move calling save_zero_page to the common place

2018-03-30 Thread guangrong . xiao
From: Xiao Guangrong save_zero_page() is always our first approach to try, move it to the common place before calling ram_save_compressed_page and ram_save_page Reviewed-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Xiao Guangrong --- migration/ram.c | 105 +

[Qemu-devel] [PATCH v3 04/10] migration: detect compression and decompression errors

2018-03-30 Thread guangrong . xiao
From: Xiao Guangrong Currently the page being compressed is allowed to be updated by the VM on the source QEMU, correspondingly the destination QEMU just ignores the decompression error. However, we completely miss the chance to catch real errors, then the VM is corrupted silently To make the mi

[Qemu-devel] [PATCH v3 06/10] migration: move some code to ram_save_host_page

2018-03-30 Thread guangrong . xiao
From: Xiao Guangrong Move some code from ram_save_target_page() to ram_save_host_page() to make it be more readable for latter patches that dramatically clean ram_save_target_page() up Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 43 +++

[Qemu-devel] [PATCH v3 00/10] migration: improve and cleanup compression

2018-03-30 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v3: Following changes are from Peter's review: 1) use comp_param[i].file and decomp_param[i].compbuf to indicate if the thread is properly init'd or not 2) save the file which is used by ram loader to the global variable instead it is cached per decompressi

[Qemu-devel] [PATCH v3 01/10] migration: stop compressing page in migration thread

2018-03-30 Thread guangrong . xiao
From: Xiao Guangrong As compression is a heavy work, do not do it in migration thread, instead, we post it out as a normal page Reviewed-by: Wei Wang Reviewed-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Xiao Guangrong --- migration/ram.c | 32

[Qemu-devel] [PATCH v2 10/10] migration: remove ram_save_compressed_page()

2018-03-28 Thread guangrong . xiao
From: Xiao Guangrong Now, we can reuse the path in ram_save_page() to post the page out as normal, then the only thing remained in ram_save_compressed_page() is compression that we can move it out to the caller Reviewed-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Xiao Guang

[Qemu-devel] [PATCH v2 07/10] migration: move calling control_save_page to the common place

2018-03-28 Thread guangrong . xiao
From: Xiao Guangrong The function is called by both ram_save_page and ram_save_target_page, so move it to the common caller to cleanup the code Reviewed-by: Peter Xu Signed-off-by: Xiao Guangrong --- migration/ram.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff

[Qemu-devel] [PATCH v2 04/10] migration: detect compression and decompression errors

2018-03-28 Thread guangrong . xiao
From: Xiao Guangrong Currently the page being compressed is allowed to be updated by the VM on the source QEMU, correspondingly the destination QEMU just ignores the decompression error. However, we completely miss the chance to catch real errors, then the VM is corrupted silently To make the mi

[Qemu-devel] [PATCH v2 09/10] migration: introduce save_normal_page()

2018-03-28 Thread guangrong . xiao
From: Xiao Guangrong It directly sends the page to the stream neither checking zero nor using xbzrle or compression Reviewed-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Xiao Guangrong --- migration/ram.c | 50 ++ 1 file chan

[Qemu-devel] [PATCH v2 05/10] migration: introduce control_save_page()

2018-03-28 Thread guangrong . xiao
From: Xiao Guangrong Abstract the common function control_save_page() to cleanup the code, no logic is changed Reviewed-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Xiao Guangrong --- migration/ram.c | 174 +--- 1 file ch

[Qemu-devel] [PATCH v2 06/10] migration: move some code ram_save_host_page

2018-03-28 Thread guangrong . xiao
From: Xiao Guangrong Move some code from ram_save_target_page() to ram_save_host_page() to make it be more readable for latter patches that dramatically clean ram_save_target_page() up Signed-off-by: Xiao Guangrong --- migration/ram.c | 43 +++ 1 file ch

[Qemu-devel] [PATCH v2 00/10] migration: improve and cleanup compression

2018-03-28 Thread guangrong . xiao
From: Xiao Guangrong Changelog in v2: Thanks to the review from Dave, Peter, Wei and Jiang Biao, the changes in this version are: 1) include the performance number in the cover letter 2)add some comments to explain how to use z_stream->opaque in the patchset 3) allocate a internal buffer for p

[Qemu-devel] [PATCH v2 02/10] migration: stop compression to allocate and free memory frequently

2018-03-28 Thread guangrong . xiao
From: Xiao Guangrong Current code uses compress2() to compress memory which manages memory internally, that causes huge memory is allocated and freed very frequently More worse, frequently returning memory to kernel will flush TLBs and trigger invalidation callbacks on mmu-notification which int

[Qemu-devel] [PATCH v2 08/10] migration: move calling save_zero_page to the common place

2018-03-28 Thread guangrong . xiao
From: Xiao Guangrong save_zero_page() is always our first approach to try, move it to the common place before calling ram_save_compressed_page and ram_save_page Reviewed-by: Peter Xu Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Xiao Guangrong --- migration/ram.c | 105 +

  1   2   >