Re: [PATCH v2 08/11] mirror: Skip writing zeroes when target is already zero
on Thu, 17 Apr 2025 13:39:13 -0500, Eric Blake wrote: > When mirroring, the goal is to ensure that the destination reads the > same as the source; this goal is met whether the destination is sparse > or fully-allocated. However, if the destination cannot efficiently > write zeroes, then any time the mirror operation wants to copy zeroes > from the source to the destination (either during the background over > sparse regions when doing a full mirror, or in the foreground when the > guest actively writes zeroes), we were causing the destination to > fully allocate that portion of the disk, even if it already read as > zeroes. > > The effect is especially pronounced when the source is a raw file. > That's because when the source is a qcow2 file, the dirty bitmap only > visits the portions of the source that are allocated, which tend to be > non-zero. But when the source is a raw file, > bdrv_co_is_allocated_above() reports the entire file as allocated so > mirror_dirty_init sets the entire dirty bitmap, and it is only later > during mirror_iteration that we change to consulting the more precise > bdrv_co_block_status_above() to learn where the source reads as zero. > > Remember that since a mirror operation can write a cluster more than > once (every time the guest changes the source, the destination is also > changed to keep up), we can't take the shortcut of relying on > s->zero_target (which is static for the life of the job) in > mirror_co_zero() to see if the destination is already zero, because > that information may be stale. Any solution we use must be dynamic in > the face of the guest writing or discarding a cluster while the mirror > has been ongoing. > > We could just teach mirror_co_zero() to do a block_status() probe of > the destination, and skip the zeroes if the destination already reads > as zero, but we know from past experience that extra block_status() > calls are not always cheap (tmpfs, anyone?), especially when they are > random access rather than linear. Use of block_status() of the source > by the background task in a linear fashion is not our bottleneck (it's > a background task, after all); but since mirroring can be done while > the source is actively being changed, we don't want a slow > block_status() of the destination to occur on the hot path of the > guest trying to do random-access writes to the source. > > So this patch takes a slightly different approach: any time we have to > transfer the full image, we know that mirror_dirty_init() is _already_ > doing a pre-zero pass over the entire destination. Therefore, if we > track which clusters of the destination are zero at any given moment, > we don't have to do a block_status() call on the destination, but can > instead just refer to the zero bitmap associated with the job. > > With this patch, if I create a raw sparse destination file, connect it > with QMP 'blockdev-add' while leaving it at the default "discard": > "ignore", then run QMP 'blockdev-mirror' with "sync": "full", the > destination remains sparse rather than fully allocated. > > Signed-off-by: Eric Blake > --- > block/mirror.c | 70 ++ > 1 file changed, 65 insertions(+), 5 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index 234e3a55e60..4770d87abf6 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -73,6 +73,7 @@ typedef struct MirrorBlockJob { > size_t buf_size; > int64_t bdev_length; > unsigned long *cow_bitmap; > +unsigned long *zero_bitmap; > BdrvDirtyBitmap *dirty_bitmap; > BdrvDirtyBitmapIter *dbi; > uint8_t *buf; > @@ -408,15 +409,33 @@ static void coroutine_fn mirror_co_read(void *opaque) > static void coroutine_fn mirror_co_zero(void *opaque) > { > MirrorOp *op = opaque; > -int ret; > +bool write_needed = true; > +int ret = 0; > > op->s->in_flight++; > op->s->bytes_in_flight += op->bytes; > *op->bytes_handled = op->bytes; > op->is_in_flight = true; > > -ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes, > - op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0); > +if (op->s->zero_bitmap) { > +unsigned long end = DIV_ROUND_UP(op->offset + op->bytes, > + op->s->granularity); > +assert(QEMU_IS_ALIGNED(op->offset, op->s->granularity)); > +assert(QEMU_IS_ALIGNED(op->bytes, op->s->granularity) || > + op->offset + op->bytes == op->s->bdev_length); > +if (find_next_zero_bit(op->s->zero_bitmap, end, > + op->offset / op->s->granularity) == end) { > +write_needed = false; > +} > +} > +if (write_needed) { > +ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes, > + op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0); > +} > +if (ret >= 0 && op->s->zero_bitmap) { > +bitmap_se
Re: [PATCH v2 08/11] mirror: Skip writing zeroes when target is already zero
On Thu, Apr 24, 2025 at 12:42:45AM +0800, Sunny Zhu wrote: > on Thu, 17 Apr 2025 13:39:13 -0500, Eric Blake wrote: > > When mirroring, the goal is to ensure that the destination reads the > > same as the source; this goal is met whether the destination is sparse > > or fully-allocated. However, if the destination cannot efficiently > > write zeroes, then any time the mirror operation wants to copy zeroes > > from the source to the destination (either during the background over > > sparse regions when doing a full mirror, or in the foreground when the > > guest actively writes zeroes), we were causing the destination to > > fully allocate that portion of the disk, even if it already read as > > zeroes. > > > > @@ -452,12 +474,21 @@ static unsigned mirror_perform(MirrorBlockJob *s, > > int64_t offset, > > > > switch (mirror_method) { > > case MIRROR_METHOD_COPY: > > +if (s->zero_bitmap) { > > +bitmap_clear(s->zero_bitmap, offset / s->granularity, > > + DIV_ROUND_UP(bytes, s->granularity)); > > +} > > co = qemu_coroutine_create(mirror_co_read, op); > > break; > > case MIRROR_METHOD_ZERO: > > +/* s->zero_bitmap handled in mirror_co_zero */ > > co = qemu_coroutine_create(mirror_co_zero, op); > > break; > > case MIRROR_METHOD_DISCARD: > > +if (s->zero_bitmap) { > > +bitmap_clear(s->zero_bitmap, offset / s->granularity, > > + DIV_ROUND_UP(bytes, s->granularity)); > > +} > > co = qemu_coroutine_create(mirror_co_discard, op); > > break; > > default: > > > > If we have performed the skip-zero operation, it should not be constrained > by mirror job bandwidth limits. Therefore, it is preferable to exclude it > from rate limiting. Indeed, that makes sense. And it may impact the iotests: test 194 should have a smaller amount of bytes transferred, due to skipping zeroes, so I may need to hoist the filtering that I added in the later patch for iotest mirror-sparse into common code. > > bool skip_write_zero = false; > > io_bytes = mirror_perform(s, offset, io_bytes, mirror_method, > &skip_write_zero); > if (skip_write_zero || (mirror_method != MIRROR_METHOD_COPY && > write_zeroes_ok)) { > io_bytes_acct = 0; > } .. > Thanks; that's helpful. I'll incorporate it into v3. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Re: [PATCH v2 08/11] mirror: Skip writing zeroes when target is already zero
On Thu, Apr 17, 2025 at 01:39:13PM -0500, Eric Blake wrote: > When mirroring, the goal is to ensure that the destination reads the > same as the source; this goal is met whether the destination is sparse > or fully-allocated. However, if the destination cannot efficiently > write zeroes, then any time the mirror operation wants to copy zeroes > from the source to the destination (either during the background over > sparse regions when doing a full mirror, or in the foreground when the > guest actively writes zeroes), we were causing the destination to > fully allocate that portion of the disk, even if it already read as > zeroes. > > The effect is especially pronounced when the source is a raw file. > That's because when the source is a qcow2 file, the dirty bitmap only > visits the portions of the source that are allocated, which tend to be > non-zero. But when the source is a raw file, > bdrv_co_is_allocated_above() reports the entire file as allocated so > mirror_dirty_init sets the entire dirty bitmap, and it is only later > during mirror_iteration that we change to consulting the more precise > bdrv_co_block_status_above() to learn where the source reads as zero. > > Remember that since a mirror operation can write a cluster more than > once (every time the guest changes the source, the destination is also > changed to keep up), we can't take the shortcut of relying on > s->zero_target (which is static for the life of the job) in > mirror_co_zero() to see if the destination is already zero, because > that information may be stale. Any solution we use must be dynamic in > the face of the guest writing or discarding a cluster while the mirror > has been ongoing. > > We could just teach mirror_co_zero() to do a block_status() probe of > the destination, and skip the zeroes if the destination already reads > as zero, but we know from past experience that extra block_status() > calls are not always cheap (tmpfs, anyone?), especially when they are > random access rather than linear. Use of block_status() of the source > by the background task in a linear fashion is not our bottleneck (it's > a background task, after all); but since mirroring can be done while > the source is actively being changed, we don't want a slow > block_status() of the destination to occur on the hot path of the > guest trying to do random-access writes to the source. > > So this patch takes a slightly different approach: any time we have to > transfer the full image, we know that mirror_dirty_init() is _already_ > doing a pre-zero pass over the entire destination. Therefore, if we > track which clusters of the destination are zero at any given moment, > we don't have to do a block_status() call on the destination, but can > instead just refer to the zero bitmap associated with the job. > > With this patch, if I create a raw sparse destination file, connect it > with QMP 'blockdev-add' while leaving it at the default "discard": > "ignore", then run QMP 'blockdev-mirror' with "sync": "full", the > destination remains sparse rather than fully allocated. > > Signed-off-by: Eric Blake > --- > block/mirror.c | 70 ++ > 1 file changed, 65 insertions(+), 5 deletions(-) Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature