Re: [PATCH v2 08/11] mirror: Skip writing zeroes when target is already zero

2025-04-23 Thread Sunny Zhu
on Thu, 17 Apr 2025 13:39:13 -0500, Eric Blake wrote:
> When mirroring, the goal is to ensure that the destination reads the
> same as the source; this goal is met whether the destination is sparse
> or fully-allocated.  However, if the destination cannot efficiently
> write zeroes, then any time the mirror operation wants to copy zeroes
> from the source to the destination (either during the background over
> sparse regions when doing a full mirror, or in the foreground when the
> guest actively writes zeroes), we were causing the destination to
> fully allocate that portion of the disk, even if it already read as
> zeroes.
> 
> The effect is especially pronounced when the source is a raw file.
> That's because when the source is a qcow2 file, the dirty bitmap only
> visits the portions of the source that are allocated, which tend to be
> non-zero.  But when the source is a raw file,
> bdrv_co_is_allocated_above() reports the entire file as allocated so
> mirror_dirty_init sets the entire dirty bitmap, and it is only later
> during mirror_iteration that we change to consulting the more precise
> bdrv_co_block_status_above() to learn where the source reads as zero.
> 
> Remember that since a mirror operation can write a cluster more than
> once (every time the guest changes the source, the destination is also
> changed to keep up), we can't take the shortcut of relying on
> s->zero_target (which is static for the life of the job) in
> mirror_co_zero() to see if the destination is already zero, because
> that information may be stale.  Any solution we use must be dynamic in
> the face of the guest writing or discarding a cluster while the mirror
> has been ongoing.
> 
> We could just teach mirror_co_zero() to do a block_status() probe of
> the destination, and skip the zeroes if the destination already reads
> as zero, but we know from past experience that extra block_status()
> calls are not always cheap (tmpfs, anyone?), especially when they are
> random access rather than linear.  Use of block_status() of the source
> by the background task in a linear fashion is not our bottleneck (it's
> a background task, after all); but since mirroring can be done while
> the source is actively being changed, we don't want a slow
> block_status() of the destination to occur on the hot path of the
> guest trying to do random-access writes to the source.
> 
> So this patch takes a slightly different approach: any time we have to
> transfer the full image, we know that mirror_dirty_init() is _already_
> doing a pre-zero pass over the entire destination.  Therefore, if we
> track which clusters of the destination are zero at any given moment,
> we don't have to do a block_status() call on the destination, but can
> instead just refer to the zero bitmap associated with the job.
> 
> With this patch, if I create a raw sparse destination file, connect it
> with QMP 'blockdev-add' while leaving it at the default "discard":
> "ignore", then run QMP 'blockdev-mirror' with "sync": "full", the
> destination remains sparse rather than fully allocated.
> 
> Signed-off-by: Eric Blake 
> ---
>  block/mirror.c | 70 ++
>  1 file changed, 65 insertions(+), 5 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 234e3a55e60..4770d87abf6 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -73,6 +73,7 @@ typedef struct MirrorBlockJob {
>  size_t buf_size;
>  int64_t bdev_length;
>  unsigned long *cow_bitmap;
> +unsigned long *zero_bitmap;
>  BdrvDirtyBitmap *dirty_bitmap;
>  BdrvDirtyBitmapIter *dbi;
>  uint8_t *buf;
> @@ -408,15 +409,33 @@ static void coroutine_fn mirror_co_read(void *opaque)
>  static void coroutine_fn mirror_co_zero(void *opaque)
>  {
>  MirrorOp *op = opaque;
> -int ret;
> +bool write_needed = true;
> +int ret = 0;
> 
>  op->s->in_flight++;
>  op->s->bytes_in_flight += op->bytes;
>  *op->bytes_handled = op->bytes;
>  op->is_in_flight = true;
> 
> -ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
> -   op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
> +if (op->s->zero_bitmap) {
> +unsigned long end = DIV_ROUND_UP(op->offset + op->bytes,
> + op->s->granularity);
> +assert(QEMU_IS_ALIGNED(op->offset, op->s->granularity));
> +assert(QEMU_IS_ALIGNED(op->bytes, op->s->granularity) ||
> +   op->offset + op->bytes == op->s->bdev_length);
> +if (find_next_zero_bit(op->s->zero_bitmap, end,
> +   op->offset / op->s->granularity) == end) {
> +write_needed = false;
> +}
> +}
> +if (write_needed) {
> +ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
> +   op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
> +}
> +if (ret >= 0 && op->s->zero_bitmap) {
> +bitmap_se

Re: [PATCH v2 08/11] mirror: Skip writing zeroes when target is already zero

2025-04-23 Thread Eric Blake
On Thu, Apr 24, 2025 at 12:42:45AM +0800, Sunny Zhu wrote:
> on Thu, 17 Apr 2025 13:39:13 -0500, Eric Blake wrote:
> > When mirroring, the goal is to ensure that the destination reads the
> > same as the source; this goal is met whether the destination is sparse
> > or fully-allocated.  However, if the destination cannot efficiently
> > write zeroes, then any time the mirror operation wants to copy zeroes
> > from the source to the destination (either during the background over
> > sparse regions when doing a full mirror, or in the foreground when the
> > guest actively writes zeroes), we were causing the destination to
> > fully allocate that portion of the disk, even if it already read as
> > zeroes.
> > 
> > @@ -452,12 +474,21 @@ static unsigned mirror_perform(MirrorBlockJob *s, 
> > int64_t offset,
> > 
> >  switch (mirror_method) {
> >  case MIRROR_METHOD_COPY:
> > +if (s->zero_bitmap) {
> > +bitmap_clear(s->zero_bitmap, offset / s->granularity,
> > + DIV_ROUND_UP(bytes, s->granularity));
> > +}
> >  co = qemu_coroutine_create(mirror_co_read, op);
> >  break;
> >  case MIRROR_METHOD_ZERO:
> > +/* s->zero_bitmap handled in mirror_co_zero */
> >  co = qemu_coroutine_create(mirror_co_zero, op);
> >  break;
> >  case MIRROR_METHOD_DISCARD:
> > +if (s->zero_bitmap) {
> > +bitmap_clear(s->zero_bitmap, offset / s->granularity,
> > + DIV_ROUND_UP(bytes, s->granularity));
> > +}
> >  co = qemu_coroutine_create(mirror_co_discard, op);
> >  break;
> >  default:
> > 
> 
> If we have performed the skip-zero operation, it should not be constrained
> by mirror job bandwidth limits. Therefore, it is preferable to exclude it
> from rate limiting.

Indeed, that makes sense.  And it may impact the iotests: test 194
should have a smaller amount of bytes transferred, due to skipping
zeroes, so I may need to hoist the filtering that I added in the later
patch for iotest mirror-sparse into common code.

>   
>   bool skip_write_zero = false;
> 
>   io_bytes = mirror_perform(s, offset, io_bytes, mirror_method, 
> &skip_write_zero);
>   if (skip_write_zero || (mirror_method != MIRROR_METHOD_COPY && 
> write_zeroes_ok)) {
>   io_bytes_acct = 0;
>   } ..
>

Thanks; that's helpful.  I'll incorporate it into v3.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v2 08/11] mirror: Skip writing zeroes when target is already zero

2025-04-17 Thread Stefan Hajnoczi
On Thu, Apr 17, 2025 at 01:39:13PM -0500, Eric Blake wrote:
> When mirroring, the goal is to ensure that the destination reads the
> same as the source; this goal is met whether the destination is sparse
> or fully-allocated.  However, if the destination cannot efficiently
> write zeroes, then any time the mirror operation wants to copy zeroes
> from the source to the destination (either during the background over
> sparse regions when doing a full mirror, or in the foreground when the
> guest actively writes zeroes), we were causing the destination to
> fully allocate that portion of the disk, even if it already read as
> zeroes.
> 
> The effect is especially pronounced when the source is a raw file.
> That's because when the source is a qcow2 file, the dirty bitmap only
> visits the portions of the source that are allocated, which tend to be
> non-zero.  But when the source is a raw file,
> bdrv_co_is_allocated_above() reports the entire file as allocated so
> mirror_dirty_init sets the entire dirty bitmap, and it is only later
> during mirror_iteration that we change to consulting the more precise
> bdrv_co_block_status_above() to learn where the source reads as zero.
> 
> Remember that since a mirror operation can write a cluster more than
> once (every time the guest changes the source, the destination is also
> changed to keep up), we can't take the shortcut of relying on
> s->zero_target (which is static for the life of the job) in
> mirror_co_zero() to see if the destination is already zero, because
> that information may be stale.  Any solution we use must be dynamic in
> the face of the guest writing or discarding a cluster while the mirror
> has been ongoing.
> 
> We could just teach mirror_co_zero() to do a block_status() probe of
> the destination, and skip the zeroes if the destination already reads
> as zero, but we know from past experience that extra block_status()
> calls are not always cheap (tmpfs, anyone?), especially when they are
> random access rather than linear.  Use of block_status() of the source
> by the background task in a linear fashion is not our bottleneck (it's
> a background task, after all); but since mirroring can be done while
> the source is actively being changed, we don't want a slow
> block_status() of the destination to occur on the hot path of the
> guest trying to do random-access writes to the source.
> 
> So this patch takes a slightly different approach: any time we have to
> transfer the full image, we know that mirror_dirty_init() is _already_
> doing a pre-zero pass over the entire destination.  Therefore, if we
> track which clusters of the destination are zero at any given moment,
> we don't have to do a block_status() call on the destination, but can
> instead just refer to the zero bitmap associated with the job.
> 
> With this patch, if I create a raw sparse destination file, connect it
> with QMP 'blockdev-add' while leaving it at the default "discard":
> "ignore", then run QMP 'blockdev-mirror' with "sync": "full", the
> destination remains sparse rather than fully allocated.
> 
> Signed-off-by: Eric Blake 
> ---
>  block/mirror.c | 70 ++
>  1 file changed, 65 insertions(+), 5 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature