Re: [PATCH v2 0/6] nbd: reduce max_block restrictions

2020-04-02 Thread Vladimir Sementsov-Ogievskiy
01.04.2020 19:52, no-re...@patchew.org wrote: Patchew URL: https://patchew.org/QEMU/20200401150112.9557-1-vsement...@virtuozzo.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can

Re: [PATCH for-5.0? v3] qemu-img: Report convert errors by bytes, not sectors

2020-04-02 Thread Philippe Mathieu-Daudé
On 4/2/20 3:57 PM, Eric Blake wrote: Various qemu-img commands are inconsistent on whether they report status/errors in terms of bytes or sector offsets. The latter is confusing (especially as more places move to 4k block sizes), so let's switch everything to just use bytes everywhere. One

Re: [PATCH for-5.0? v4 0/7] Tighten qemu-img rules on missing backing format

2020-04-02 Thread Eric Blake
On 3/12/20 2:28 PM, Eric Blake wrote: v3 was here: https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg01730.html In v4: - old patch 1 was reworked into new patch 1-3, with stricter rules on which backing formats are accepted [Kevin] - patch 4 is new: amend is handled differently from

Re: [PATCH v10 10/14] iotests: add hmp helper with logging

2020-04-02 Thread John Snow
On 4/1/20 8:40 AM, Max Reitz wrote: > On 31.03.20 19:39, Kevin Wolf wrote: >> Am 31.03.2020 um 19:23 hat John Snow geschrieben: >>> >>> >>> On 3/31/20 6:21 AM, Max Reitz wrote: On 31.03.20 02:00, John Snow wrote: > Minor cleanup for HMP functions; helps with line length and

Re: [PATCH for-5.0] aio-posix: fix test-aio /aio/event/wait with fdmon-io_uring

2020-04-02 Thread Cole Robinson
On 4/2/20 10:54 AM, Stefan Hajnoczi wrote: > When a file descriptor becomes ready we must re-arm POLL_ADD. This is > done by adding an sqe to the io_uring sq ring. The ->need_wait() > function wasn't taking pending sqes into account and therefore > io_uring_submit_and_wait() was not being

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Kevin Wolf
Am 02.04.2020 um 18:47 hat Kevin Wolf geschrieben: > Am 02.04.2020 um 17:40 hat Dietmar Maurer geschrieben: > > > Can you reproduce the problem with my script, but pointing it to your > > > Debian image and running stress-ng instead of dd? > > > > yes > > > > > If so, how long does > > > it

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Kevin Wolf
Am 02.04.2020 um 17:40 hat Dietmar Maurer geschrieben: > > Can you reproduce the problem with my script, but pointing it to your > > Debian image and running stress-ng instead of dd? > > yes > > > If so, how long does > > it take to reproduce for you? > > I sometimes need up to 130 iterations

Re: test-aio failure with liburing

2020-04-02 Thread Stefan Hajnoczi
On Wed, Mar 25, 2020 at 01:12:49PM -0400, Cole Robinson wrote: > Using qemu.git master with liburing-devel installed. 100% reproducible > test failure for me > > $ uname -r > 5.6.0-0.rc5.git0.2.fc32.x86_64 > $ rpm -q liburing > liburing-0.5-1.fc32.x86_64 > > $ ./tests/test-aio > # random seed:

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Dietmar Maurer
> It does looks more like your case because I now have bs.in_flight == 0 > and the BlockBackend of the scsi-hd device has in_flight == 8. yes, this looks very familiar. > Of course, this still doesn't answer why it happens, and I'm not sure if we > can tell without adding some debug code. > >

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Dietmar Maurer
> Can you reproduce the problem with my script, but pointing it to your > Debian image and running stress-ng instead of dd? yes > If so, how long does > it take to reproduce for you? I sometimes need up to 130 iterations ... Worse, I thought several times the bug is gone, but then it

Re: [PATCH v4 2/3] replication: acquire aio context before calling job_cancel_sync

2020-04-02 Thread Stefan Reiter
On 02/04/2020 14:41, Max Reitz wrote: On 01.04.20 10:15, Stefan Reiter wrote: job_cancel_sync requires the job's lock to be held, all other callers already do this (replication_stop, drive_backup_abort, blockdev_backup_abort, job_cancel_sync_all, cancel_common). I think all other callers come

[PATCH for-5.0] aio-posix: fix test-aio /aio/event/wait with fdmon-io_uring

2020-04-02 Thread Stefan Hajnoczi
When a file descriptor becomes ready we must re-arm POLL_ADD. This is done by adding an sqe to the io_uring sq ring. The ->need_wait() function wasn't taking pending sqes into account and therefore io_uring_submit_and_wait() was not being called. Polling for cqes failed to detect fd readiness

Re: [PATCH v4 1/3] job: take each job's lock individually in job_txn_apply

2020-04-02 Thread Stefan Reiter
On 02/04/2020 14:33, Max Reitz wrote: On 01.04.20 10:15, Stefan Reiter wrote: All callers of job_txn_apply hold a single job's lock, but different jobs within a transaction can have different contexts, thus we need to lock each one individually before applying the callback function. Similar to

RE: [PATCH for-5.0] xen-block: Fix double qlist remove

2020-04-02 Thread Paul Durrant
> -Original Message- > From: Anthony PERARD > Sent: 02 April 2020 14:08 > To: qemu-de...@nongnu.org > Cc: qemu-sta...@nongnu.org; Anthony PERARD ; > Stefano Stabellini > ; Paul Durrant ; Stefan Hajnoczi > ; Kevin > Wolf ; Max Reitz ; > xen-de...@lists.xenproject.org; qemu- >

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Kevin Wolf
Am 02.04.2020 um 14:14 hat Kevin Wolf geschrieben: > Am 02.04.2020 um 11:10 hat Dietmar Maurer geschrieben: > > > It seems to fix it, yes. Now I don't get any hangs any more. > > > > I just tested using your configuration, and a recent centos8 image > > running dd loop inside it: > > > > #

Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN

2020-04-02 Thread Richard W.M. Jones
On Thu, Apr 02, 2020 at 08:41:31AM -0500, Eric Blake wrote: > On 4/2/20 3:38 AM, Richard W.M. Jones wrote: > >For the case I care about (long running virt-v2v conversions with an > >intermittent network) we don't expect that nbdkit will be killed nor > >gracefully shut down. Instead what we

[PATCH for-5.0? v3] qemu-img: Report convert errors by bytes, not sectors

2020-04-02 Thread Eric Blake
Various qemu-img commands are inconsistent on whether they report status/errors in terms of bytes or sector offsets. The latter is confusing (especially as more places move to 4k block sizes), so let's switch everything to just use bytes everywhere. One iotest is impacted. Signed-off-by: Eric

Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN

2020-04-02 Thread Eric Blake
On 4/2/20 8:33 AM, Eric Blake wrote: Then, what about "SHOULD wait until no inflight requests"? We don't do it either.. Should we? qemu as server doesn't send NBD_ESHUTDOWN.  It probably should (the way nbdkit does), but that's orthogonal to qemu as client responding to NBD_ESHUTDOWN.

Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN

2020-04-02 Thread Eric Blake
On 4/2/20 3:38 AM, Richard W.M. Jones wrote: On Wed, Apr 01, 2020 at 05:38:41PM -0500, Eric Blake wrote: I was trying to test qemu's reconnect-delay parameter by using nbdkit as a server that I could easily make disappear and resume. A bit of experimenting shows that when nbdkit is abruptly

Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN

2020-04-02 Thread Eric Blake
On 4/2/20 1:41 AM, Vladimir Sementsov-Ogievskiy wrote: 02.04.2020 1:38, Eric Blake wrote: I was trying to test qemu's reconnect-delay parameter by using nbdkit as a server that I could easily make disappear and resume.  A bit of experimenting shows that when nbdkit is abruptly killed (SIGKILL),

[PATCH for-5.0] xen-block: Fix double qlist remove

2020-04-02 Thread Anthony PERARD
Commit a31ca6801c02 ("qemu/queue.h: clear linked list pointers on remove") revealed that a request was removed twice from a list, once in xen_block_finish_request() and a second time in xen_block_release_request() when both function are called from xen_block_complete_aio(). But also, the

Re: [PATCH v4 3/3] backup: don't acquire aio_context in backup_clean

2020-04-02 Thread Max Reitz
On 01.04.20 10:15, Stefan Reiter wrote: > All code-paths leading to backup_clean (via job_clean) have the job's > context already acquired. The job's context is guaranteed to be the same > as the one used by backup_top via backup_job_create. > > Since the previous logic effectively acquired the

Re: [PATCH for-5.0 v2] qemu-img: Report convert errors by bytes, not sectors

2020-04-02 Thread Eric Blake
On 4/1/20 5:11 PM, no-re...@patchew.org wrote: Patchew URL: https://patchew.org/QEMU/20200401180436.298613-1-ebl...@redhat.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably

Re: [PATCH v4 0/3] Fix some AIO context locking in jobs

2020-04-02 Thread Kevin Wolf
Am 01.04.2020 um 10:15 hat Stefan Reiter geschrieben: > Contains three seperate but related patches cleaning up and fixing some > issues regarding aio_context_acquire/aio_context_release for jobs. Mostly > affects blockjobs running for devices that have IO threads enabled AFAICT. > > This is

Re: [PATCH v4 2/3] replication: acquire aio context before calling job_cancel_sync

2020-04-02 Thread Max Reitz
On 01.04.20 10:15, Stefan Reiter wrote: > job_cancel_sync requires the job's lock to be held, all other callers > already do this (replication_stop, drive_backup_abort, > blockdev_backup_abort, job_cancel_sync_all, cancel_common). I think all other callers come directly from QMP, though, so they

Re: [PATCH v4 1/3] job: take each job's lock individually in job_txn_apply

2020-04-02 Thread Max Reitz
On 01.04.20 10:15, Stefan Reiter wrote: > All callers of job_txn_apply hold a single job's lock, but different > jobs within a transaction can have different contexts, thus we need to > lock each one individually before applying the callback function. > > Similar to job_completed_txn_abort this

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Kevin Wolf
Am 02.04.2020 um 11:10 hat Dietmar Maurer geschrieben: > > It seems to fix it, yes. Now I don't get any hangs any more. > > I just tested using your configuration, and a recent centos8 image > running dd loop inside it: > > # while dd if=/dev/urandom of=testfile.raw bs=1M count=100; do sync;

Re: [PATCH for-5.0] vpc: Don't round up already aligned BAT sizes

2020-04-02 Thread Philippe Mathieu-Daudé
On 4/2/20 11:36 AM, Kevin Wolf wrote: As reported on Launchpad, Azure apparently doesn't accept images for upload that are not both aligned to 1 MB blocks and have a BAT size that matches the image size exactly. As far as I can tell, there is no real reason why we create a BAT that is one entry

[PATCH for-5.0] vpc: Don't round up already aligned BAT sizes

2020-04-02 Thread Kevin Wolf
As reported on Launchpad, Azure apparently doesn't accept images for upload that are not both aligned to 1 MB blocks and have a BAT size that matches the image size exactly. As far as I can tell, there is no real reason why we create a BAT that is one entry longer than necessary for aligned image

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Dietmar Maurer
> > Do you also run "stress-ng -d 5" indied the VM? > > I'm not using the exact same test case, but something that I thought > would be similar enough. Specifically, I run the script below, which > boots from a RHEL 8 CD and in the rescue shell, I'll do 'dd if=/dev/zero > of=/dev/sda' This test

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Dietmar Maurer
> It seems to fix it, yes. Now I don't get any hangs any more. I just tested using your configuration, and a recent centos8 image running dd loop inside it: # while dd if=/dev/urandom of=testfile.raw bs=1M count=100; do sync; done With that, I am unable to trigger the bug. Would you mind

Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN

2020-04-02 Thread Richard W.M. Jones
On Wed, Apr 01, 2020 at 05:38:41PM -0500, Eric Blake wrote: > I was trying to test qemu's reconnect-delay parameter by using nbdkit > as a server that I could easily make disappear and resume. A bit of > experimenting shows that when nbdkit is abruptly killed (SIGKILL), > qemu detects EOF on

Re: [PATCH 1/6] scripts/coccinelle: add error-use-after-free.cocci

2020-04-02 Thread Markus Armbruster
Peter Maydell writes: > On Thu, 2 Apr 2020 at 07:55, Markus Armbruster wrote: >> Peter Maydell writes: >> > I use this thing maybe once a month at most, more likely once >> > every three months, and the documentation is notoriously >> > impenetrable. I really really don't want to have to start

Re: [PATCH 1/6] scripts/coccinelle: add error-use-after-free.cocci

2020-04-02 Thread Peter Maydell
On Thu, 2 Apr 2020 at 07:55, Markus Armbruster wrote: > Peter Maydell writes: > > I use this thing maybe once a month at most, more likely once > > every three months, and the documentation is notoriously > > impenetrable. I really really don't want to have to start looking in it > > and

Re: [PATCH v2 00/22] Fix error handling during bitmap postcopy

2020-04-02 Thread Vladimir Sementsov-Ogievskiy
Ping! It's a fix, but not a degradation and I'm afraid too big for 5.0. Still, I think I should ping it anyway. John, I'm afraid, that this all is for your branch :) 17.02.2020 18:02, Vladimir Sementsov-Ogievskiy wrote: Original idea of bitmaps postcopy migration is that bitmaps are non

Re: [PATCH 1/6] scripts/coccinelle: add error-use-after-free.cocci

2020-04-02 Thread Markus Armbruster
Peter Maydell writes: > On Wed, 1 Apr 2020 at 15:44, Markus Armbruster wrote: >> Peter Maydell writes: >> > On Wed, 1 Apr 2020 at 06:07, Markus Armbruster wrote: >> > But then as a coccinelle script author I need to know which of >> > the options I needed are standard, which are

Re: bdrv_drained_begin deadlock with io-threads

2020-04-02 Thread Dietmar Maurer
> > But, IMHO the commit is not the reason for (my) bug - It just makes > > it easier to trigger... I can see (my) bug sometimes with 4.1.1, although > > I have no easy way to reproduce it reliable. > > > > Also, Stefan sent some patches to the list to fix some of the problems. > > > >

Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN

2020-04-02 Thread Vladimir Sementsov-Ogievskiy
02.04.2020 1:38, Eric Blake wrote: I was trying to test qemu's reconnect-delay parameter by using nbdkit as a server that I could easily make disappear and resume. A bit of experimenting shows that when nbdkit is abruptly killed (SIGKILL), qemu detects EOF on the socket and manages to reconnect

[PATCH v18 1/4] qcow2: introduce compression type feature

2020-04-02 Thread Denis Plotnikov
The patch adds some preparation parts for incompatible compression type feature to qcow2 allowing the use different compression methods for image clusters (de)compressing. It is implied that the compression type is set on the image creation and can be changed only later by image conversion, thus

[PATCH v18 0/4] qcow2: Implement zstd cluster compression method

2020-04-02 Thread Denis Plotnikov
v18: * 04: add quotes to all file name variables [Vladimir] * 04: add Vladimir's comment according to "qemu-io write -s" option issue. v17: * 03: remove incorrect comment in zstd decompress [Vladimir] * 03: remove "paraniod" and rewrite the comment on decompress [Vladimir]

[PATCH v18 4/4] iotests: 287: add qcow2 compression type test

2020-04-02 Thread Denis Plotnikov
The test checks fulfilling qcow2 requiriements for the compression type feature and zstd compression type operability. Signed-off-by: Denis Plotnikov Reviewed-by: Vladimir Sementsov-Ogievskiy --- tests/qemu-iotests/287 | 167 + tests/qemu-iotests/287.out

[PATCH v18 2/4] qcow2: rework the cluster compression routine

2020-04-02 Thread Denis Plotnikov
The patch enables processing the image compression type defined for the image and chooses an appropriate method for image clusters (de)compression. Signed-off-by: Denis Plotnikov Reviewed-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Alberto Garcia --- block/qcow2-threads.c | 71

[PATCH v18 3/4] qcow2: add zstd cluster compression

2020-04-02 Thread Denis Plotnikov
zstd significantly reduces cluster compression time. It provides better compression performance maintaining the same level of the compression ratio in comparison with zlib, which, at the moment, is the only compression method available. The performance test results: Test compresses and